Oracle DBA, IT Manager, or Something Else

14 02 2016

February 14, 2016

In September 2014 I posted an article on this blog where I mentioned that I am the IT Manager and an Oracle Database administrator at a company that specializes in welding and machining very large metal parts – in the article I attempted a bit of humor while trying to explain a few items.  I have a small confession to make, I have been the only IT person at the company for the last 38 months.  As you might imagine, working 11+ hour days for six years (yes, six years is longer than 38 months) was nothing short of a never ending fun experience, and sometimes seemingly without an ending.

In the last month I put together a somewhat interesting document.  As some readers of this blog probably know, I have read a few books about Oracle Database, programming, Microsoft Exchange, server administration, and a few other topics (my personal technical book library six years ago), and have published a few articles about some of those topics on this blog.  My Oracle Database knowledge is reasonably good, I suppose (after probably reading roughly 30 books on the subject and contributing to another), but that is only one piece of a complex jigsaw puzzle of knowledge in the IT field.  So, what is the interesting document that I created as I neared by 16th anniversary with the company?
What are the skills required for the job of IT Manager and Oracle DBA, or what exactly does my job encompass (since it is easy to assume that I am just the person that hands out the printer ink cartridges when the change me light flashes 🙂 ).  I am not yet sure what exactly the interesting document will be used for, but it might be used for ISO documentation purposes.  I split the document into two sections: skills for the person who is able to sit upright in a chair, successfully breathe at regular intervals, and occasionally blink once for no, or blink twice for yes (Essential Skills); skills in addition to the essential skills that are actually needed for the person with the role of IT Manager and Oracle DBA at the company (Additional Skills).

Essential Skills:

  • Database: Oracle Database 11.2 management: emphasis on RMAN backup, RMAN recovery, RMAN cloning, Oracle specific SQL syntax including analytics (experience with Oracle Database 10.2, 11.1, or 12.1 should be sufficient)
  • Email: Microsoft Exchange Server 2013 management (Exchange Server 2007 or 2010 experience is helpful, but not sufficient unless the person is willing to read a couple of books)
  • Programming: Visual Basic 6.0 (MRP for Visual 6.5.2, touchscreen Windows 8.1 tablet projects, and many other custom programs were written in this language), VB.Net, C# (MRP for Visual 8.0.0 was written in C#), and Zebra ZPL (used for creating part labels for receiving, shipping, and moving)
  • ERP: Visual Manufacturing 6.5.2 support
  • Phone system: Mitel 5000 with digital, analog, VOIP circuits, paging, and fax reception (note that the Mitel 3000 series is significantly different, while the old Inter-Tel phone system shares similar programming configuration options). Person should know how to change phone programming to change names, reset passwords, reset voicemail boxes, punch down digital and analog phone circuits, proper phone system shut down procedure.
  • Barcode: Previously worked with or created code 3 of 9 barcodes, code 128 barcodes, and UPC barcodes. Experience with Symbol, Motorola, or Zebra mobile computers (primarily MC3090 and MC3190).
  • Microsoft Office: Experience with Microsoft Office versions 2007 and 2010 (experience with versions 2013 and 2016 recommended for future upgrades)
  • Server operating systems: Windows Server 2012, Windows Server 2008 R2, Windows Server 2003 R2, Red Hat Linux, Fedora Core 22 Linux, BusyBox v1.16 (Synology NAS)
  • Desktop operating systems: Windows 10 Pro (for future rollout), Windows 8.1 Pro (tablets only), Windows 7 Pro (32 bit and 64 bit), Windows Vista Business (32 bit), Windows XP Pro (primarily shop floor computers, but also Burn Shop supervisor and UT inspector)
  • Printer support: Basic maintenance and troubleshooting of various Dell, Zebra, Canon, Ricoh/Lanier/Gestetner (advanced maintenance covered by printer repair company) printers, and Xerox (advanced maintenance covered by printer repair company) large format plotter/scanner.
  • Wiring: Cat 5e and Cat 6 network cable wiring and termination, serial cable wiring, phone wiring into model 66 blocks, fiber optic connections with pre-made cable.
  • Help Desk: Capable of translating complex computer terminology into common English, able to walk end users through procedures over the phone and in person, emphasis on customer service (speed of response and problem resolution).
  • Firewall and VPN: Familiar with SonicWall firewall and VPN configuration.
  • EDI: Experience with TrustedLink EDI and common EDI documents (810, 830, 850, 856, etc.).
  • Network: Experience managing gigabit network switches, wireless access points, and VLANs.

Additional Skills:

  • Database: Person has extensive Oracle performance tuning experience that is relevant to databases without the Tuning or Diagnostic packs. Person has achieved Oracle Certified Master status (Oracle Certified Professional is a very light-weight certification by comparison), person is a member of the OakTable Network, or person is an Oracle Ace Director.
  • Email: Person planned and executed an upgrade from Microsoft Exchange 5.5 to Exchange 2007 or later, or Exchange 2007 to 2013 or 2016). Person has experience with spam filtering products such as SpamStopsHere or MessageLabs. Person has written programs to submit email using an SMTP interface.  Experience working with self-signed security certificates and third party certificates.
  • Programming: Experience writing Microsoft Office macros (Excel, Word, Power Point, Outlook, Access), experience writing VBScript (Visual Manufacturing macro language, AutoCAD macro language), experience writing GCODE (used by CNC machines), experience writing PowerShell scripts (used by Microsoft Exchange 2007 and later), experience writing HTML (used by Web Reports, Report Server, K&M website), experience writing Javascript and PHP (used by Web Reports, K&M website), experience writing Linux and DOS batch scripts (used by K&M website and backup software), experience creating ASP and ASP.Net websites (Web Reports), experience maintaining programs created by other software developers
  • ERP: Experience upgrading Visual Manufacturing, experience utilizing Infor’s COM Object and/or Infor’s API Toolkit (for programmatically creating transactions in Visual), experience with the underlying database object model used by Visual Manufacturing (knowledge of specific tables, columns, indexes, security model). Experience filing support requests with Infor.
  • Phone system: Experience with T1 DCS and T1 PRI circuits (type of T1 circuit currently used at K&M), extensive programming and diagnostic experience with a Mitel 5000 PBX including troubleshooting of offsite voice over IP phones that traverse firewalls. Experience creating and/or maintaining SMDR call logging through TCP/IP and/or serial connection.
  • Barcode: Experience troubleshooting Zebra barcode printers; experience troubleshooting barcode scanning problems; able to describe how code 3 of 9 barcodes are encoded and write a computer program to locate and decode a code 3 of 9 barcode on a scanned page; experience programming and backing up Symbol, Motorola, or Zebra mobile computers (primarily MC3090 and MC3190)
  • Common commercial programs: Able to troubleshoot, install, and configure license managers for a variety of primarily engineering focused programs. Programs include AutoCAD 2014, PTC Creo Parametric, MasterCAM, PC-DMIS, Vericut, Verisurf, Shadow Protect, Trend Micro Enterprise Security Suite, Sage Fixed Assets, TrustedLink EDI, Microsoft Office, Adobe Acrobat Standard Edition, Visual Manufacturing, Oracle Database
  • Servers: Extensive experience administering and configuring Windows servers; experience with group policy in Active Directory; experience with a variety of Linux based systems (primarily Red Hat derived); experience with Synology NAS units (configuring as domain members, network time sources, Windows SMB shares, FTP targets, Nagios network monitoring); experience with WordPress configuration, utilization and security (for the K&M website); experience with SSH (TelNet, TightVNC, Putty, Remote Desktop, GoToMyPC); experience specifying server hardware configurations (including RAM, RAID levels, operating systems, redundancy); experience diagnosing and repairing Dell server hardware. Experience maintaining four hour support contracts for servers. Experience configuring Fail2Ban, and software firewalls (iptables, firewalld, Windows firewall service).
  • Desktops: Extensive experience troubleshooting hardware problems (replace bad power supply, identify faulty motherboard, upgrade memory, replace other components including hard drives and DVD drives). Troubleshoot attached devices including check scanners, postage meters, printers, mice/keyboards, wedge barcode scanners. Clean computers of spyware and viruses with and without the assistance of virus/spyware scanning software (educate users about virus warning signs to avoid infection).
  • Printer support: Program printers for network connectivity, and install as network managed printers controlled by Windows Server 2012. Perform diagnostics and repair of printers, copiers, and all-in-one devices that are not under warranty (such as replacing the print head in Zebra printers or installing maintenance kit parts). Manage printer ink and toner supplies with predictive re-ordering of supplies.
  • Wiring: Understand when fiber optic and/or shielded cable must be used rather than typical Cat 5e or Cat 6 cable, with the ability to terminate fiber optic cables. Experience wiring and troubleshooting paging amplifiers, page delay units, and speakers.
  • Help Desk: Experience converting vague support requests into actionable assistance. Capable of providing precise and fluid written and verbal responses. Able to proactively train fellow employees. Able to write clear documentation for procedures and custom computer programs.
  • Firewall and VPN: Experience configuring and maintaining SonicWall NSA series firewalls: redundant Internet routes with fail-over, multiple static IP addresses per port, firewall log monitoring (to identify attempted hacking attacks and Internet stability issues), opening and restricting ports or IP addresses, etc. Experience configuring and maintaining SonicWall SSL VPN servers: authentication through Active Directory attached RADIUS server, creating client-side connection and disconnection scripts, establishing remote connection policies through Active Directory, troubleshooting remote connectivity of computers and IP phones.
  • EDI: Experience using Trusted Link EDI software to communicate with remote EDI servers using the SFTP protocol and WinSCP scripts. Experience creating and maintaining export and import functionality between Visual Manufacturing and the Trusted Link EDI software using map flat files, raw EDI, and custom developed software that reads and creates transactions in Visual Manufacturing. Experience scripting in Trusted Link EDI. Experience programmatically printing EDI documents to PDF files (using BlackIce print driver software) with filenames that are derived from the contents of the EDI printed documents.
  • Network: Experience diagnosing and resolving connectivity issues using Wireshark (or similar packet capture tool), experience with various command line diagnostic utilities (nslookup, netstat, nbtstat, dig, traceroute). Experience recommending and configuring core and remote network switches, as well as troubleshooting managed and unmanaged network switches and routers (HP, Dell, and Cisco-Linksys).  Experience ordering, configuring, and supporting Internet access delivered through fiber optic, cable TV providers, DSL providers, and wireless providers. Experience working with multimode fiber optic with GBIC and stand-alone fiber to RJ45 converters. Experience working with the Nagios network monitoring utility (compiled to run on Synology NAS units).
  • Inter-departmental support: Able to assist engineering with reconfiguring Fanuc and Tosnic CNC controls, and troubleshoot serial cable connections to support GCODE program transfers with CNC machines. Able to understand basic procedures of Quality, Engineering, Purchasing, and Accounting to facilitate process and efficiency improvements.

No, my job position does not require that I be an expert at everything that I touch, so long as I never make a mistake at anything that I touch…  ORA-00060: Deadlock Detected.





Characterset Changes

9 10 2015

October 9, 2015

In 2006 I did a bit of experimentation with an Oracle characterset that supports Unicode (AL32UTF8 if I remember correctly) using Oracle Database 10.2.0.2.  That test, performed on a desktop computer with a bit less than 1GB of RDRAM (Rambus) memory, was a huge failure, and not just because the computer did not have enough memory to meet the minimum requirements for Oracle Database 10.2.0.2.  Sure, the database access was slower with a characterset that sometimes used one byte to store a single character, two bytes for other characters, and three bytes to store each of the remaining characters.  There was another, more serious problem.  I put together this article mostly as a reminder to myself to recall what I tested 9.5 years ago.

I am in the process of testing an upgrade to the most recent version of the ERP package used by the company where I work – in addition to being an Oracle DBA for the company, I am also the IT Manager and in-house expert for the ERP system.  Reading through the ERP installation documentation, I found the following:

“It is highly recommended that you perform the upgrade procedures in a test environment before you upgrade your production environment. The upgrade procedure potentially involves deleting your existing database instance. Make sure you understand the ramifications of deleting the database instance before you upgrade your production environment…  If you intend to use the same database instance name with your 8.0.0 installation as you did with your previous VISUAL installation, delete your existing database instance. Do not delete your existing instance until after you have verified that the data has been exported properly…”

“Create a new Oracle Instance. In 8.0.0, the AL32UTF8 character set and the Al16UTF16 universal national character set are used. You must create a new Oracle instance to specify these character sets.”

So, the ERP company would like for me to export the production WE8MSWIN1252 characterset (AL16UTF16 national characterset) database using Datapump, DROP DATABASE, create a new database with the AL32UTF8 characterset (AL16UTF16 national characterset), import the database using Datapump, and then start the ERP upgrade conversion utility.  The statement “you must create a new Oracle instance to specify these character sets,” was a bit of a red flag.  Something is not right with that statement.

If I followed those directions, something would certainly take a bite out of some of the data in a few critical tables:

SELECT
  *
FROM
  NLS_DATABASE_PARAMETERS
ORDER BY
  PARAMETER;
 
PARAMETER                VALUE
------------------------ ----------------------------
NLS_CALENDAR             GREGORIAN
NLS_CHARACTERSET         WE8MSWIN1252
NLS_COMP                 BINARY
NLS_CURRENCY             $
NLS_DATE_FORMAT          DD-MON-RR
NLS_DATE_LANGUAGE        AMERICAN
NLS_DUAL_CURRENCY        $
NLS_ISO_CURRENCY         AMERICA
NLS_LANGUAGE             AMERICAN
NLS_LENGTH_SEMANTICS     BYTE
NLS_NCHAR_CHARACTERSET   AL16UTF16
NLS_NCHAR_CONV_EXCP      FALSE
NLS_NUMERIC_CHARACTERS   .,
NLS_RDBMS_VERSION        11.2.0.2.0
NLS_SORT                 BINARY
NLS_TERRITORY            AMERICA
NLS_TIMESTAMP_FORMAT     DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT  DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT          HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT       HH.MI.SSXFF AM TZR

NLS_LENGTH_SEMANTICS: BYTE – so why is that a problem?

In this particular ERP system, which is designed for use by manufacturers, one of the critical tables in the database is named PART – there are quite a number of defined foreign keys in the database that point back to this particular table.  What do the columns in this table look like before the conversion?

SQL> DESC PART
 Name                    Null?    Type
 ----------------------- -------- ------------
 ID                      NOT NULL VARCHAR2(30)
 DESCRIPTION                      VARCHAR2(40)
 STOCK_UM                NOT NULL VARCHAR2(15)
 PLANNING_LEADTIME       NOT NULL NUMBER
 ORDER_POLICY            NOT NULL CHAR(1)
 ORDER_POINT                      NUMBER(14,4)
 SAFETY_STOCK_QTY                 NUMBER(14,4)
 FIXED_ORDER_QTY                  NUMBER(14,4)
...

Where is the potential problem with exporting everything, dropping the database, creating a new database with a different characterset, and then importing the data.  When I tested the procedure in 2006 I found that one or two rows would not re-import into the PART table due to the values of some columns being too long.  Too long?  Funny things (or not so funny things) happen when the foreign keys are created after the table data import if one or two rows referenced by other tables are not in the PART table.  It took me a couple of minutes to recreate the SQL statement that I set up in 2006 to see what caused the import problem – that SQL statement is shown below:

SELECT
  ID,
  DESCRIPTION,
  LENGTHB(DESCRIPTION) CUR_BYTE_LEN,
  LENGTHB(CONVERT(DESCRIPTION, 'AL32UTF8', 'WE8MSWIN1252')) NEW_BYTE_LEN
FROM
  PART
WHERE
  LENGTHB(CONVERT(DESCRIPTION, 'AL32UTF8', 'WE8MSWIN1252'))>LENGTHB(DESCRIPTION);

The above SQL statement instructs Oracle Database to take a database column’s values (DESCRIPTION), consider the text stored in that column to be in the WE8MSWIN1252 characterset, translate the column values to the AL32UTF8 characterset, and then compare the byte lengths of the translated column values with the byte lengths of the original column.  If there are differences, the primary key value (ID), DESCRIPTION, and column lengths of the DESCRIPTION column before and after conversion are output.  The output was slightly different using Oracle SQL Developer, SQL*Plus on a Windows client, and SQL*Plus on a Synology NAS connected by a Windows Telnet session, so I thought that I would show screen captures of the output from all three utilities.

Oracle SQL Developer:
CharactersetsSQLDeveloper

SQL*Plus on a Windows client:
CharactersetsWindowsSQLPlus

SQL*Plus on a Synology NAS connected by a Windows Telnet session:
CharactersetsSynologySQLPlus

The DESCRIPTION column output of Oracle SQL Developer matches what appears in the ERP system.  The degree character (°), center line/diameter character (Ø – might also be considered the empty set character), and registered trademark character (®) require a single byte for storage in the WE8MSWIN1252 characterset, but two bytes for storage in the AL32UTF8 characterset.  Once translated to the AL32UTF8 characterset, the description for part ID LINED021274 will be too long to fit into the maximum 40 byte long DESCRIPTION column.  Several utilities exist for scanning a database to identify similar characterset conversion issues.

The Oracle Database 11.2 documentation lists three ways to change the characterset of the database:

  • Migrating Character Data Using a Full Export and Import
  • Migrating a Character Set Using the CSALTER Script
  • Migrating Character Data Using the CSALTER Script and Selective Imports

I started wondering if it really was necessary, in this case, to change the database characterset?  I examined the table creation script that the ERP system installer would use to create a new database – below is a portion of the script to create the PART table:

CREATE TABLE PART(
 ID NVARCHAR2(30) NOT NULL,
 DESCRIPTION NVARCHAR2(40),
 STOCK_UM NVARCHAR2(15) NOT NULL,
 PLANNING_LEADTIME INTEGER DEFAULT 0  NOT NULL,
 ORDER_POLICY NCHAR(1) DEFAULT ' '  NOT NULL,
 ORDER_POINT NUMBER(14,4),
 ORDER_UP_TO_QTY NUMBER (14, 4),
 SAFETY_STOCK_QTY NUMBER(14,4),
 FIXED_ORDER_QTY NUMBER(14,4),
...

Well, that table creation script would generate a bit different table when compared with the DESC PART output that I posted above – the VARCHAR2 columns would be created as NVARCHAR2, and the CHAR columns would be created as NCHAR.  So, the new version of the ERP system will be working with NVARCHAR2 and NCHAR columns, which are already setup to use the AL16UTF16 national characterset.

I setup a test version of the database using the same Oracle Database version, and instructed the ERP conversion software to upgrade from version 6.5.2 to version 8.0 (temporarily passing the ERP database through versions 6.5.3, 6.5.4, 7.0, 7.1.1, and 7.1.2).  I measured the amount of redo generated to make certain that I would not run out of disk space in the fast (flash) recovery area during the production database upgrade (because I still had the test database in noarchivelog mode, I simply checked the redo size statistic in V$SYSSTAT at various points during the conversion).

The redo generated to upgrade from version 6.5.2 through version 7.1.2 was about 1.75 times the byte size of the tables and indexes in the ERP schema in the database.  The redo generated to upgrade from version 7.1.2 to 8.0 (the next version) required approximately 27.06 times the byte size of the tables and indexes in the ERP schema in the database!  To put those numbers in perspective with simple numbers, assume that the byte size of the tables and indexes in the ERP schema in the database is about 100GB.  The upgrade from version 6.5.2 to 7.1.2 will generate approximately 175GB of redo, so there must be at least that much space in the fast (flash) recovery area for the archived redo logs generated, if that is where the archived redo logs are placed.  The first RMAN backup might also backup the archived redo logs, and if those backups are also stored in the fast (flash) recovery area there will be a need for at least an additional 175GB in the fast (flash) recovery area.  For the upgrade from version 7.1.2 to 8.0 approximately 2706GB of redo would be generated, and again the first RMAN backup might also include those archived redo logs.

What is happening during the 7.1.2 to 8.0 upgrade?  The 7.1.2 to 8.0 upgrade finds all of the CHAR and VARCHAR2 columns in all SYSADM (schema owner) tables, and for each table, sorts those columns alphabetically, creates a new column for one of those columns (with a _NEW suffix in the column name) that is defined as NCHAR or NVARCHAR2, executes a SQL statement to set the new column value to be the same as the original column value, drops the original column, and renames the new column so that it matches the old column name.  This process is then repeated for each of the CHAR and VARCHAR2 columns in the table, before moving onto the next table.  This is likely NOT the ideal way to change the column data types as this method generates enormous amounts of redo due to the number of times each data block is modified (and also likely leads to a lot of row chaining), which if you are correctly running the database in archived redo log mode, could very well completely fill your fast (flash) recovery area several times over with the archived redo logs.  The approach implemented also causes the tables’ primary key columns to be scattered though the table definition – with Oracle Database it is faster to access the primary key columns during a full table scan if those columns are the first columns in the table definition (reference and the follow up article).

There are two other methods that I can think of right now that might have been used to reduce the amount of redo generated and preserve the column order in the tables during the 7.1.2 to 8.0 upgrade (I wonder if the ERP vendor would be willing to modify their approach?).  They could have done something like this to simply change all of the column definitions for a single table all at once in a single ALTER TABLE command:

ALTER TABLE MYCUSTOM.AUDIT_SHEET MODIFY(
  WORKORDER_TYPE NCHAR(1),
  WORKORDER_BASE_ID NVARCHAR2(30),
  WORKORDER_LOT_ID NVARCHAR2(3),
  WORKORDER_SPLIT_ID NVARCHAR2(3),
  WORKORDER_SUB_ID NVARCHAR2(3),
  PART_ID NVARCHAR2(30),
  CUSTOMER_ID NVARCHAR2(15)
  ...);

A second option is to create a new table (with a _NEW suffix in the name) with all of the columns from the original table (just with CHAR changed to NCHAR and VARCHAR2 changed to NVARCHAR2), perform a statement such as INSERT INTO PART_NEW SELECT * FROM PART;, DROP PURGE the old table, then rename the new table as the old table.

Due to the method implemented, the converted PART table after the upgrade to version 8.0 appears something like this, where the primary key column (ID) is now the 68th column in the table definition:

SQL> DESC PART
 Name                      Null?    Type
 ------------------------- -------- --------------
 PLANNING_LEADTIME         NOT NULL NUMBER
 ORDER_POINT                        NUMBER(14,4)
 SAFETY_STOCK_QTY                   NUMBER(14,4)
 FIXED_ORDER_QTY                    NUMBER(14,4)
 DAYS_OF_SUPPLY                     NUMBER
...
 DEF_SLS_TAX_GRP_ID                 NVARCHAR2(15)
 DEMAND_HISTORY            NOT NULL NCHAR(1)
 DESCRIPTION                        NVARCHAR2(40)
 DETAIL_ONLY               NOT NULL NCHAR(1)
 DIMENSIONS_UM                      NVARCHAR2(15)
...
 HTS_CODE                           NVARCHAR2(20)
 ID                        NOT NULL NVARCHAR2(30)
 INSPECTION_REQD           NOT NULL NCHAR(1)

Other tables, such as the OPERATION table, have composite primary keys that are composed of character and number columns – after the conversion to 8.0 the numeric column (the last column in the composite primary key – WORKORDER_TYPE, WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID, SEQUENCE_NO) is listed first in the table definition, with the other columns of the primary key scattered by alphabetic order after the numeric and date columns.

Thoughts and opinions about the above?  Am I being overly critical?  I did not mention that it took about five or six tries for a successful ERP upgrade – two tries failed due to a bug in the conversion utility, and the other failed attempts were caused by the conversion utility stumbling over extensions (mostly logging triggers) that I added to the database over the years (but disabled before the ERP upgrade).

This blog is currently at 1,001,992 views, having crossed the 1,000,000 mark a couple of days ago.  I would like to take this time to say thanks to the readers of the blog for visiting, and thanks to the ERP vendor for giving me something a bit different to think about, even if it was something that I thought about almost a decade ago.





Hyper-Extended Oracle Performance Monitor 7.0 Beta

16 08 2015

August 16, 2015

Three and a half years ago I offered the Hyper-Extended Oracle Performance 6.0 Beta for download – that Beta version expired a year later.  This program has been somewhat of a pet project for the last 13 years (almost 10 years at the time of the previous Beta offering), so I was excited to read some of the feedback about the 6.0 Beta.  The minimal feedback had me wondering for a while, then the pet project was pushed to a dark corner for roughly three years.  I taught the pet a few new tricks on Windows 10, so I thought that I make the latest version available.  Unlike the previous Beta, Beta 7.0 does not have an expiration date.

NOTE August 17, 2015: The first two download links are working now – Wordpress was blocking the downloads.  Note that the first two downloads are actually compressed .zip files with a .doc extension – inside those .zip files are the files that are of interest.  Right-click the files, save the files to your computer, then rename the files to end with .zip.

  1. Program download (right-click Save As, then change the .doc extension to .zip): Hyper-ExtendedOraclePerformanceMonitor7.zip
  2. Documentation download (right-click Save As, then change the .doc extension to .zip – still incomplete, see old docs): Hyper-Extended Oracle Performance Monitor7.doc
  3. Old Documentation download: Hyper-Extended Oracle Performance Monitor3.doc
  4. Update August 20, 2015: If you have an old computer, you may need to put the MSCOMCTL.OCX file in your C:\Windows\SysWOW64 folder (on 64 bit Windows, or C:\Windows\System32 on 32 bit Windows).  You will then need to register (process) the file with REGSVR32 – see the command in the instructions below.  A recent version of MSCOMCTL.OCX may be downloaded here: MSCOMCTL.OCX (save the file, then rename the file as MSCOMCTL.OCX).

Requirements:

Windows 2000 through Windows 10, 32 bit Oracle Client, MDAC/ADO 2.8 (or greater – preinstalled starting with Windows XP), connection to the Oracle database by database SID (tnsnames.ora) using Oracle’s Oracle Provider for OLE DB (Oracle’s Oracle ODBC Driver is required for connections that are initiated by SYS – primarily for access to the Advanced Init Parameter functionality), Excel 2000-2013, Microsoft Grid control (provided in the download – put into the program’s folder), TIMED_STATISTICS set to TRUE. Most features require DBA permissions (SYSADM or SYS), or specific grants to views or packages. For example, sessions that use the DBMS Xplan and Trace functionality must have permission to execute ALTER SESSION, have execute permission on the DBMS_XPLAN package, and select permissions on V$SQL, V$SQL_PLAN, and V$SQL_PLAN_STATISTICS.  Configuring session tracing requires execute permission on the DBMS_SYSTEM package and/or the DBMS_MONITOR package.

Description:

The Hyper-Extended Oracle Performance Monitor provides a variety of functions to simplify working with and performance monitoring Oracle databases. Various types of simple data dictionary and database instance related reports may be generated in Excel by simply putting a check in a box on the program’s main window, and clicking the Report button. Generating DBMS Xplan output (using the raw information from a 10046 level 4/12 trace) with optional 10046/10053/other traces, creating data change logging triggers using the data dictionary, peeking at the hidden and non-hidden initialization parameters, and access to a quick Oracle keyword search are also a click away on the program’s main window. Performance monitoring and tracing tools for checking a cross-reference of the time model and wait event statistics (at the system and session levels), checking a cross-reference of the system statistics and wait event statistics (with drill-down to the session level), reviewing potentially high-load SQL statements and their execution plans, cross-referencing the current session waits with enqueues, and enabling various types of Oracle traces are also a click away on the program’s main window. Permanently recording of performance monitoring statistics in a Microsoft Access compatible database (Microsoft Access is not required) is also a click away on the program’s main window.

General Command Line Parameters:

  • -D   The Database instance SID to which the program should connect.
  • -U   The user name to be used for connecting to the database instance.
  • -P   The password to be used for connecting to the database instance.

Logging-Specific Command Line Parameters:

  • -LC 20   Specifies Force a Log Capture when CPU Usage Exceeds value to 20%
  • -LI 30   Specifies Force a Log Capture if No Log Captured in Minutes value to 30 minutes
  • -LB   Specifies the Force a Log Capture when a Blocking Lock is Detected value to checked
  • -LW  Specifies the Force a Log Capture when a Wait Reason is Detected value to checked
  • -LR  Specifies the Capture SQL Execution Statistics for Wait Reasons value to checked
  • -LD   Specifies the Capture Segment Change Statistics value to checked
  • -LO   Specifies the Capture Operating System and Time Model Statistics value to checked
  • -LH   Specifies the Capture High Load SQL Statement Statistics value to checked
  • -LT   Specifies the Capture High Load SQL Statement Text value to checked
  • -LP   Specifies the Capture High Load SQL Statement Plan value to checked
  • -LHC 60   Species the minimum CPU time that is considered high load to 60 seconds accum.
  • -LHE 90   Species the minimum elapsed time that is considered high load to 90 seconds accum.
  • -LS   Specifies that Smart Logging should begin as soon as the login completes
  • -LE 240   Specifies that Smart Logging should end after 240 minutes
  • -LQ   Specifies that the program should quit (end) when logging ends

Using the above command line parameters, you could create a program shortcut for each of the databases that you administer.  For example, create a shortcut, and set its Target to (assuming that the program is on the root of the F: drive, the Oracle SID is OR1122P, the username is myuser, the password for that account is pword, and the smart logging should default to logging when 10% of CPU utilization is observed):

"F:\Hyper-Extended Oracle Performance Monitor.exe" -D OR1122P -U myname -p pword -LC 10

What’s New:

The Hyper-Extended Oracle Performance Monitor runs on Windows 10, even though the touchpad on the Sony laptop sometimes stops working until the computer is put to sleep and then awoken.  The program should also work with Oracle Database 12c (and might still be compatible with Oracle Database 8i, although the Time Model Viewer will not work).

Hyper7HyperExtendedMainScreen

The Time Model Viewer main window is still essentially the same, showing the hierarchy of the time model statistics, with color-coded session-level time model statistics (color ranges from yellow to red to indicate the percentage of the total represented by the session).  The bottom of the window shows the system-wide wait events, and the sessions that contributed to those system level wait events:

Hyper7TimeModelViewer

When the Time Model Viewer is active, an Excel spreadsheet is created that shows the graphic history of the statistics for the 20 most recent time capture periods – if you find this view helpful, you might want to widen the DB Time and CPU chart:

Hyper7TimeModelViewerExcelCharts

The statistics also appear in numerical form on the Statistics tab in the Excel spreadsheet:

Hyper7TimeModelViewerExcelStatistics

The Real-Time Monitor is also essentially the same as in previous Beta versions:

Hyper7RealTimeMonitor

Like the Time Model Viewer, an Excel spreadsheet is created showing the graphical history of the statistics.  Which statistics?  Click a wait event or one of the white statistics boxes to add that statistic to the list for which charts are created:

Hyper7RealTimeMonitorExcelCharts

Previous Beta versions of the program wrote out a tab delimited file containing the statistic deltas after every 30 time capture periods (this version also creates such a file).  The Beta for version 7 writes those statistics to Excel after each time capture period to facilitate the chart creation:

Hyper7RealTimeMonitorExcelStatistics

Oracle’s OLEDB provider is used for connectivity, see this article for a description of how to fix the bug in the Oracle Database 11.2.0.3 Client installer that is related to the OLEDB provider (the problem may have been corrected in later releases).  I have not yet found a method for SYS to login using Oracle’s OLEDB provider when “AS SYSDBA” must be specified for the login to complete.  As such, Oracle’s ODBC driver is used when SYS tries to login and the login attempt using the OLEDB provider fails.  Oracle’s ODBC driver is also used when viewing the normally hidden parameters using the SYS login.  Because there could be multiple Oracle homes, you must create a 32 bit ODBC System DSN using the Oracle ODBC driver – name that DSN HYPEREXTEND:

Hyper7Login

Hyper7ODBC1

Hyper7ODBC2

Hyper7ODBC3

The ODBC connection driver information is used to login as the SYS user (AS SYSDBA) to view the Advanced Initialization Parameters:

Hyper7AdvancedInitParameters

Hyper-Extended Oracle Performance Monitor 7.0 Beta is also more compliant with restrictions placed on the computer by User Account Control (User Access Control), which prohibits programs from create folders in the root of the C:\ drive and writing files into that folder.  Version 7 instead creates the OracleLog folder in the current user’s profile (typically C:\Users\username\OracleLog\), and separates the files created by database.  The picture below shows a couple of Microsoft Access compatible logging database that were created, as well as some of the tab-delimited statistic files, and various other generated files:

Hyper7LoggingFolder

As mentioned above, the documentation is still incomplete, with several pages of the documentation containing nothing but pictures.  It takes a lot of effort and time  to put the documentation together, so I will try to piece it together over the next couple of months.  You might be able to refer to the older documentation if the pictures are not self-explanatory.  If you find the program useful, please leave a note.  If you need help understanding a program window or the generated output, leave a comment attached to this blog article and I will try to help.

Hyper7b3.zip (Save with a .zip extension, not a .doc extension.)





Install the Oracle Client on a Synology DiskStation DS1813+ or DS412+ for Nagios to Monitor Databases

26 07 2013

July 26, 2013

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

In the previous article of this series I provided steps to install and run the Nagios network monitoring utility on either a Synology DiskStation DS1813+ or a DS412+ for the purpose of pinging network devices to verify that the devices respond to ping requests.  That article may have seemed a bit out of place on a blog that is primarily intended to include notes about using Oracle Database.  In the previous article I hinted that there is a check_oracle plugin for Nagios, suggesting that a Synology DiskStation DS1813+ or DS412+ could be used to monitor Oracle databases.

This blog article does NOT use the check_oracle plugin.  There are a small handful of official Nagios plugins for Oracle Database, but this article does not use any of those plugins either.  Instead, I took a chance at crafting a custom Perl script to monitor Oracle databases, the first time I have ever used Perl.  I thought back to some of the earlier blog articles where I introduced a couple of Oracle Database monitoring scripts that were written in VBScript, including Working with Oracle’s Time Model Data 3 and Oracle Statistics Chart Viewer – I started seeing the possibilities of using Nagios to alert DBAs about potential issues.  Yet, I have never used Perl, and at the time did not even have a clue how to concatenate strings in the language (it turns out that there are more than two techniques).

Verify that Perl is installed on the DiskStation by accessing the Package Center in the DiskStation’s DSM web interface.  If there is an Install button under the Perl heading, that indicates that Perl is not yet installed – click the Install button and wait until that button changes to Installed.

InstallOracleClient-1

Let’s start by installing the Oracle Client on the DiskStation.  I will use the Oracle Instant Client version 11.2.0.3 (the 12.1.0.1 Instant Client throws a different error message than did the 11.2.0.3 and 11.1.0.7 clients, so I reverted back to the 11.2.0.3 version).  To download the Oracle Instant Client for Linux, visit this link.  You will need an OTN account to download the client files – I downloaded the files using a desktop computer, and then transferred the files to a custom created Config share that I set up on the DiskStation.  This article requires the Basic Instant Client package and the SQL*Plus add-on package for the Instant Client.

Connect to the DiskStation using Telnet as the root user (see the previous article for Telnet connection directions).  For consistency, the .zip files for the Oracle Instant Client will be copied to the downloads directory that was created in the previous article.  We will create an oracle directory in the downloads directory:

mkdir /volume1/downloads/oracle/

Next, the downloaded files for the Oracle Instant Client are copied from the config share that I created on the DiskStation (to allow transporting files from a desktop computer) to the /volume1/downloads/oracle directory that was just created, and then the Oracle Instant Client files are unzipped, which automatically creates the instantclient_11_2 directory:

cp /volume1/config/instantclient-basic-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cp /volume1/config/instantclient-sqlplus-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cd /volume1/downloads/oracle/
unzip -o instantclient-basic-linux-11.2.0.3.0.zip
unzip -o instantclient-sqlplus-linux-11.2.0.3.0.zip
cd instantclient_11_2

The Oracle Instant Client 11.2.0.3 (and 11.1.0.7) require a file named libaio.so.1 that is not installed by default on the DiskStation.  After searching the Internet for a while, I found a copy of that file in the /lib directory on a Red Hat Enterprise Linux 3 server.  Using a desktop computer, I copied that file to the DiskStation’s config share, as well as a copy of the tnsnames.ora file that includes the database connection information for the databases that must be monitored.  The libaio.so.1 file will be placed in the /volume1/downloads/oracle/instantclient_11_2 directory, and the tnsnames.ora file will be placed in the /etc directory (one of the directories that is automatically searched).

cp /volume1/config/libaio.so.1 .
cp /volume1/config/tnsnames.ora /etc

Let’s start up SQL*Plus:

./sqlplus /nolog
./sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory

An error…

ls
BASIC_README    adrci   glogin.sql   libclntsh.so.11.1  libocci.so.11.1  libocijdbc11.so  libsqlplusic.so  ojdbc6.jar  sqltest.sql  xstreams.jar
SQLPLUS_README  genezi  libaio.so.1  libnnz11.so        libociei.so      libsqlplus.so    ojdbc5.jar       sqlplus     uidrvci

The file mentioned in the error message exists in the correct directory.  That error is caused by an undefined environment variable.  Fixing the error and trying again:

LD_LIBRARY_PATH="/volume1/downloads/oracle/instantclient_11_2"
export LD_LIBRARY_PATH
./sqlplus /nolog

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 26 13:22:16 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL>

Try to connect to one of the databases that are defined in the tnsnames.ora file, and then try executing a simple SQL statement before exiting SQL*Plus (change testuser, password, and DBName as appropriate for your environment):

CONNECT testuser/password@DBName
Connected.
SELECT SYSDATE FROM DUAL;

SYSDATE
---------
26-JUL-13

EXIT

Now that we have verified that the Instant Client version of SQL*Plus works from the DiskStation, we will create a simple SQL script to verify that the Instant Client version of SQL*Plus is able to accept script names from a command line.  The script will be created in the same directory where the Oracle Instant Client is located (see the basic directions for using vi, as found in the previous article):

vi /volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The first line in the script instructs SQL*Plus to abort the execution of the script when an error is encountered; if the connection attempt fails, there is no point in attempting to execute any SQL statements that follow.  The second line connects to the database (change testuser, password, and DBName as appropriate for your environment).  The third line executes a simple SQL statement, selecting from a table (rather than the virtual table DUAL) – change the SQL statement to a valid statement for your database.  The last statement exits SQL*Plus once the SQL statement finishes executing:

WHENEVER SQLERROR EXIT SQL.SQLCODE
CONNECT testuser/password@DBName
SELECT ID, DESCRIPTION FROM T1 WHERE ID LIKE '8X%';
EXIT

Try using SQL*Plus to execute the script:

./sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The results from the SQL statement should display in the Telnet window, and then the normal DiskStation prompt should appear.  If the database instance was not running at the time of the execution, you would see something like this rather than the results of the SQL statement:

ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Process ID: 0
Session ID: 0 Serial number: 0

Nagios apparently expects its plugins to return one of four exit codes, as explained in this article, to determine whether or not the checked object is behaving as expected:

Exit Code Status
        0 OK
        1 WARNING
        2 CRITICAL
        3 UNKNOWN

The exit function in Perl permits returning a number, so that is one problem solved.  I want the Perl plugin to accept a command line parameter from Nagios so that I am able to check more than one database using the same Perl script – the Nagios command line parameter will specify the script to execute.  This article explains how to receive a command line parameter from Nagios, so that is a second problem that is solved.

Let’s put together a simple (OK, not simple for me) script that executes a SQL*Plus script that is specified by Nagios (the sqltest.sql file that was just created), and then returns 0 if no ORA- type error message is returned when the Oracle Instant Client’s SQL*Plus executes a script.  If an ORA- type error message is returned, then the script’s exit code is set to 2 and the ORA- errors are returned to Nagios.  Nagios apparently only accepts one line of output from the script, reading what is written by Perl’s print command.  As we saw earlier, the script may return multiple nested ORA- type errors if the database instance is down, so we need these multiple error messages to appear on one line of output from the Perl script.

vi /opt/libexec/check_oracle.pl

This is the script that I crafted after spending, quite literally, hours days searching the Internet for Perl command syntax that would execute on the DiskStation:

#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(:config no_ignore_case);
my $output = "";
my ($script);

my $result = GetOptions(
 "s|script=s"         => \$script,
  );

$ENV{"LD_LIBRARY_PATH"} = "/volume1/downloads/oracle/instantclient_11_2/";

my $ExitValue = 0;
print "Test Script: " . $script . ":  ";
open my $in, "/volume1/downloads/oracle/instantclient_11_2/sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/$script |";

while (my $line = <$in>)
  {
  if ($line =~ /^(ORA-\d{5})/)
    {
    #$line =~ s/\v//g;
    chomp($line);
    $output = $output . $line . " ; ";
    $ExitValue = 2;
    }
  }

close($in);
print $output;
exit($ExitValue);

The script defines the LD_LIBRARY_PATH environment variable, so the Nagios user should (hopefully) have no problems executing the Perl script.  There are apparently a half-dozen different ways to remove the end of line characters from the SQL*Plus output, chomp($line); worked for my test, but $line =~ s/\v//g; did not work quite as expected.

Save the script and exit vi.  Next, we need to make the Perl script executable:

chmod +x /opt/libexec/check_oracle.pl

In the earlier article we had to modify a couple of the Nagios configuration files – this time we will need to modify some of the same files, starting with the commands.cfg file.

vi /opt/etc/objects/commands.cfg

Locate the ‘check_ping’ command definition that we modified in the earlier article.  Below the } character for that command definition, add the following, which will tell Nagios that our Perl script exists, and that the script command should be recognized by Nagios as check_oracle_ch:

# check_oracle_ch command definition by Charles Hooper
define command{
        command_name    check_oracle_ch
        command_line    $USER1$/check_oracle.pl -s $ARG1$
        }

Save the commands.cfg file and exit vi.  Let’s create another Nagios configuration file that will be used to list the Oracle Databases (and their scripts) that will be checked:

vi /opt/etc/objects/oracle.cfg

On a new line in the oracle.cfg file, add the following:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server123       ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST   ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest.sql    ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Additionally, we need to instruct Nagios that there is now an additional configuration file to read when starting.  Save the oracle.cfg file and exit vi.  Execute the following command:

vi /opt/etc/nagios.cfg

Below the cfg_file=/opt/etc/objects/switch.cfg line in the file (added in the previous article), add the following line:

cfg_file=/opt/etc/objects/oracle.cfg

Save the nagios.cfg file and exit vi.  Since we have modified the Nagios configuration, we must restart Nagios.  Find the first Nagios process ID, then kill that process (as was described in the previous article):

ps

kill 2683

Verify that Nagios does not return an error message when reading the configuration files:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are reported, then start Nagios:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Nagios should be writing log entries to a file name nagios.log (as defined in the nagios.cfg file).  The tail command permits viewing the last few (100 in this case) lines from a specified file:

tail -n 100 /opt/var/nagios.log

If Perl is not installed, Nagios may send an email with the following in the Additional Info section:

(Return code of 127 is out of bounds - plugin may be missing)

If the database instance is down, Nagios should send an email, possibly with the following in the Additional Info section (note that the script name is listed first – a different script will be used to test each database):

Test Script: sqltest.sql:  ORA-01034: ORACLE not available : ORA-27101: shared memory realm does not exist :

When the database instance is running, and the SQL statement(s) in the script executes successfully for the first time, Nagios should send an email with just the test script listed in the Additional Info section:

Test Script: sqltest.sql:

Now that the we have confirmed that Nagios and the Perl script work OK together, create additional SQL scripts in the /volume1/downloads/oracle/instantclient_11_2/ directory to connect to the other databases to be monitored (strongly consider changing the permissions for these files, since the files include usernames and passwords for your databases).  For each of those SQL scripts, create an additional service entry in the oracle.cfg file.  For example, if a script named sqltest42.sql is created to test a database on server42, the service entry might be created as shown below:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server42        ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST42 ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest42.sql  ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Don’t forget to verify the nagios configuration before restarting Nagios.

Anyone up to the challenge of rewriting my Working with Oracle’s Time Model Data 3 script in Perl so that the script will work on the DiskStation?





Install Nagios on a Synology DiskStation DS1813+ or DS412+

19 07 2013

July 19, 2013 (Modified July 27, 2013, July 28, 2013, November 19, 2013, August 12, 2015)

(Forward to the Next Post in the Series)

—-

Update July 27, 2013:

  • I now have Nagios running on an ARM based Synology DiskStation DS212+.  Most of the steps are the same as outlined below, however there are a few additional errors that must be addressed (see below additional steps).
  • All of the ./configure commands should have included –prefix=/opt (rather than –prefix=/usr/local or completely omitting that parameter).  That change eliminates the need to copy the Nagios plugins to the correct location.  Possibly related, the -i parameter was unnecessary for the snmp and Nagios plugins make and make install commands when the ./configure command included the –prefix=/opt prefix.
  • The wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20×86%20Linux%203.2.11&#8230; download step for the gcc compiler is apparently unnecessary, at least on the Synology DiskStation DS212+ (see below).

—-

This article describes how to compile and run Nagios on a Synology DiskStation DS1813+ (64 bit) or Synology DiskStation DS412+ (32 bit, the 32 bit steps should also apply to the DS1812+) NAS, both of which utilize Intel Atom processors (cat /proc/cpuinfo indicates that the DS412+ is using a 2.13GHz Atom D2700, while the DS1813+ is using a 2.13GHz Atom D2701), and utilize the DSM 4.2 operating system.  Not all Synology DiskStation NAS devices use Intel based CPUs – some of the less expensive DiskStations use ARM type processors (see this link to determine the type of CPU installed in a specific DiskStation).  It may be possible to produce a working version of Nagios on NAS devices that do not have Intel 32 bit or 64 bit processors, but I have not yet fully tested the procedure.

Warning: A lot of what follows is based on experimentation, with the end goal of having Nagios running on a Synology DiskStation having the ability to ping devices on the network or the Internet, with an email sent to an administrator when a device stops responding to ping requests, and to send a second email when the device resumes responding to ping requests.  This functionality represents a small fraction of Nagios’ capabilities through the use of plugins.  File paths vary from one Linux distribution to the next, so that adds a bit of challenge to make certain that the files are placed in the required directory.  Copying a file to the wrong directory may temporarily disable the DiskStation and require the reinstallation of the Synology DSM operating system.  The directions below are not final, and quite likely do not represent the most efficient approaches to accomplish the end goal – but the directions will hopefully be “close enough to correct” to allow the average reader of this blog to ping and send email alerts from a DiskStation.

I have relied on the free Nagios network monitoring solution since 2002 to provide an early warning of problems associated with network attached equipment including servers, production floor computers, switches, printers, wireless access points, IP cameras, Internet connection stability, etc.  While I rely on Nagios’ alerting system, I am not an expert at configuring the Nagios network monitoring system; the Nagios configuration documentation may be downloaded here.

First, make certain that the Telnet Service (or SSH Service if that is preferred) is enabled on the DiskStation.  In the DiskStation’s Control Panel, click Terminal.

InstallNagiosDiskStation1

Place a checkmark next to Enable Telnet service (if the item is not already checked), and then click the Apply button.

InstallNagiosDiskStation2

Verify that the computer that you intend to use has a Telnet client.  For Windows 7, access the Programs link in the Control Panel, and then click the Turn Windows features on or off link.  Make certain that there is a checkmark next to Telnet Client, then click the OK button.

InstallNagiosDiskStation3

Open a command line (in Windows, Start – Run – type  cmd  and press the Enter key).  On the command line, type telnet followed by either the name of the DiskStation or the IP address of the DiskStation, then press the Enter key.  When prompted for a username, type root and press the Enter key.  Type the admin user’s password (that is used to access the DSM interface in a web browser) and press the Enter key.

InstallNagiosDiskStation4

The command line on the DiskStation is very similar to the command line on a Unix or Linux computer, and is somewhat similar to a Windows command line or MS-DOS command line (use / rather than \, use ls rather than dir, use vi rather than edit):

InstallNagiosDiskStation5

We first need to add ipkg support to the DiskStation, detailed directions may be viewed at this link.  The exact directions may be different for other DiskStation models, but the following directions work for both the DS1813+ and DS412+ (note that all files downloaded from the Internet will be placed on volume1 in the downloads directory – copy and paste the lines to the Telnet session, one line at a time):

cd /volume1
mkdir downloads
cd downloads
wget http://ipkg.nslu2-linux.org/feeds/optware/syno-i686/cross/unstable/syno-i686-bootstrap_1.2-7_i686.xsh
chmod +x syno-i686-bootstrap_1.2-7_i686.xsh
sh syno-i686-bootstrap_1.2-7_i686.xsh

The vi editor is used on the DiskStation to modify files; that vi editor is a bit challenging to use at first sight, so you may need help with a couple of basic commands (see this quick reference for other commands).  The commands in vi are case sensitive (i is not the same as I).  When a file is opened, press the i key on the keyboard to allow making changes to the file (such as typing commands, or deleting commands).  When finished making changes to the file press the Esc key.  Once the Esc key is pressed, type ZZ to save the changed file and quit, or :q! to quit without saving the changes.

Next, we must modify the file that establishes the environment for the root user, when that user connects to the DiskStation.  This change is needed as part of the ipkg installation.  Edit the .profile file used by the root user:

vi /root/.profile

Add a # character in front of the two lines that contain the word PATH, then save the file (see the brief directions above to switch between command and insert mode in vi):

InstallNagiosDiskStation6

Next, reboot the DiskStation by clicking the Restart button in the Synology DSM interface (note: it should be possible to type reboot in the Telnet interface, however the DiskStation locked up the one time I attempted to execute that command).

InstallNagiosDiskStation7

Once the DiskStation reboots, reconnect to the DiskStation using Telnet, connecting as the root user, just as was done earlier.

The ipkg command should now work on the command line.  First, request that an updated list of available packages is downloaded, then display that list of packages:

ipkg update
ipkg list

Next, download a couple of packages that will be used by the Nagios network monitoring tool.  Note that using ipkg to install packages is a lot easier than compiling source code, so have fun with the ipkg utility.  When installing the optware-devel package, an error may appear stating that there is an incompatibility between wget and wget-ssl – just ignore that error for now.

ipkg update wget-ssl
ipkg install optware-devel
ipkg install gcc
ipkg install libtool
ipkg install mysql

Next, we need to compile a file and copy a couple of files:

cd /opt/share/libtool/libltdl/
./configure --prefix=/opt
make all
make install

cp /usr/syno/apache/modules/mod_ext_filter.so /opt/libexec/mod_ext_filter.so
cp /usr/syno/apache/modules/*.* /opt/libexec/

Now, install the Apache package:

ipkg install apache

If an error message is displayed on screen about mod_ext_filter.so, then modify the /opt/etc/apache2/httpd.conf file and add a # in front of the line LoadModule ext_filter_module libexec/mod_ext_filter.so and save the file.  Re-execute the ipkg install apache command (note that the up arrow on the keyboard may be pressed to quickly retype one of the previously executed commands).

InstallNagiosDiskStation8

Using the DiskStation’s Control Panel, create a nagios group and a nagcmd group (the nagcmd group probably will not be used for anything specific).  These groups do not require any special DiskStation permissions.

InstallNagiosDiskStation9

Using the DiskStation’s Control Panel, create a nagios user and add that user to the nagios and nagcmd groups.  The nagios user does not require any specific DiskStation permissions.

Next, switch back to the Telnet session, download the Nagios source code, and compile the source code:

DiskStation DS212+ Notes:

The following ./configure call was used on the DS212+:

./configure --prefix=/opt --with-command-group=nagios --disable-nanosleep --enable-nanosleep=no

The ./configure aborted with the following error message:

checking for pthread_create in -lpthread... no
checking for pthread_mutex_init in -lpthread... no
checking for pthread_create in -lpthreads... no
checking for pthread_create in -llthread... no
checking if we need -pthread for threads... no
checking for library containing nanosleep... no
Error: nanosleep() needed for timing operations.

The test that threw the error is located roughly 63% of the way through the configure file (on roughly line 5635).  If the exit 1 line in the configure file is commented out, then the configure step will complete.  However, the make all command will then fail with the following error messages:

/volume1/downloads/nagios/base/nebmods.c:363: undefined reference to `dlclose'
nebmods.o: In function `neb_load_module':
/volume1/downloads/nagios/base/nebmods.c:218: undefined reference to `dlopen'
/volume1/downloads/nagios/base/nebmods.c:249: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:266: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:299: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:225: undefined reference to `dlerror'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v2@GLIBC_PRIVATE'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v2@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v1@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v1@GLIBC_PRIVATE'
collect2: ld returned 1 exit status
make[1]: *** [nagios] Error 1
make[1]: Leaving directory `/volume1/downloads/nagios/base'
make: *** [all] Error 2

After a bit of searching on the Internet, I found a page that suggested making the following changes (note that I unsuccessfully tried a couple of other steps that may have also partially corrected the issue):

mkdir /opt/arm-none-linux-gnueabi/lib_disabled
mv /opt/arm-none-linux-gnueabi/lib/libpthread* /opt/arm-none-linux-gnueabi/lib_disabled

cp /lib/libpthread.so.0 /opt/arm-none-linux-gnueabi/lib/
cd /opt/arm-none-linux-gnueabi/lib/
ln -s libpthread.so.0 libpthread.so
ln -s libpthread.so.0 libpthread-2.5.so

After making the above changes, I was able to run the configure and make all commands without receiving an error.

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz
tar xzf nagios-3.5.0.tar.gz
cd nagios
./configure --prefix=/opt --with-command-group=nagios
make all
make install
make install-init
make install-config
make install-commandmode

We apparently need to copy a couple of files to different locations at this point:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Undo the changes that were earlier made to the /root/.profile file, where # characters were added in front of any line that contained the word PATH.  Remove those # characters and save the file:

vi /root/.profile

(This part still needs some fine tuning to make the web interface work with Nagios.)  Edit the Nagios Makefile and change the line beginning with HTTPD_CONF to show HTTPD_CONF=/opt/etc/apache2/conf.d  Then save the file.

cd /volume1/downloads/nagios
vi Makefile

InstallNagiosDiskStation10

Execute the following command:

make install-webconf

Create a nagiosadmin user for the web administration, specify a password when prompted:

htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Update November 19, 2013:

GabrielM reported in a comment below that it may be necessary to specify the full path to the htpasswd program:

/usr/syno/apache/bin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Install a couple of additional ipkg packages that will be used by Nagios (the last package adds a ping utility that may be used by Nagios – the security permissions on the DiskStation prevent non-root users from using the built-in ping utility):

ipkg install openssl
ipkg install openssl-dev
ipkg install sendmail
ipkg install inetutils

A step that may or may not be required is to download a functioning C++ compiler (some of the commands below point to files provided with the C++ compiler) – it appears that there should already be a compiler on the DiskStation at this point (in /opt/bin), so the successful completion of this task of downloading a usable C++ compiler might not be required.

DiskStation DS212+ Notes:

These wget and tar steps were completely skipped on the DS212+

For the DiskStation DS1813+ 64 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Cedarview%29/gcc420_glibc236_x64_cedarview-GPL.tgz
tar zxpf gcc420_glibc236_x64_cedarview-GPL.tgz -C /usr/local/

For the DiskStation DS412+ 32 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.2%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Bromolow%29/gcc421_glibc236_x86_bromolow-GPL.tgz
tar zxpf gcc421_glibc236_x86_bromolow-GPL.tgz -C /usr/local/

Now the net-snmp source code is downloaded and extracted:

DiskStation DS212+ Notes:

The ./configure call on the DS212 (might also work on the other DiskStation models):

./configure –prefix=/opt

The make call threw several errors, including:

/bin/sh: arm-none-linux-gnueabi-ld: not found
make[2]: *** [../blib/arch/auto/NetSNMP/default_store/default_store.so] Error 127

Before running the make command on the DS212+, execute the following command:

ln -s /opt/bin/ld /opt/bin/arm-none-linux-gnueabi-ld

The -i parameter may be omitted when running the make and make install commands.

cd /volume1/downloads
wget http://sourceforge.net/projects/net-snmp/files/net-snmp/5.7.2/net-snmp-5.7.2.tar.gz
tar xzf net-snmp-5.7.2.tar.gz
cd net-snmp-5.7.2

For the DiskStation DS1813+ 64 bit, execute the following to compile the net-snmp source (note that this command uses the compiler that was downloaded):

env CC=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-gcc \
LD=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ld \
RANLIB=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ranlib \
CFLAGS="-I/usr/local/x86_64-linux-gnu/include" \
LDFLAGS="-L/usr/local/x86_64-linux-gnu/lib" \
./configure --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux --prefix=/usr/local

For the DiskStation DS412+ 32 bit, execute the following to compile the net-snmp source (note: I could not use any of the different compilers that I tried downloading due to the compilers crashing with one of two error messages, so this command uses the compiler in /opt/bin):

env CC=/opt/bin/i686-linux-gnu-gcc \
LD=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ld \
RANLIB=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-linux-gnu/lib" \
./configure --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local

Several prompts will appear on the screen when either of the two commands is executed.  I entered the following for the prompts:

Default version of SNMP to use (3): 3
System Contact Information: (Enter)
System Location (Unknown): (Enter)
Location to write logfile (/var/log/snmpd.log): /opt/var/snmpd.log
Location to write persistent information (/var/net-snmp): (Enter)

Two additional commands to execute:

make -i
make install -i

Now we need to download the source code for the Nagios plugins (check_apt, check_breeze, check_by_ssh, check_clamd, check_cluster, check_dhcp, check_disk, check_disk_smb, check_dns, check_dummy, check_file_age, check_flexlm, check_ftp, check_http, check_icmp, check_ide_smart, check_ifoperstatup, check_ifstatus, check_imap, check_ircd, check_jabber, check_ldap, check_ldaps, check_load, check_log, check_mailq, check_mrtg, check_mrtgtraf, check_mysql, check_mysql_query, check_nagios, check_nntp, check_nntps, check_nt, check_ntp, check_ntp_peer, check_ntp_time, check_nwstat, check_oracle, check_overcr, check_ping, check_pop, check_procs, check_real, check_rpc, check_sensors, check_simap, check_smtp, check_snmp, check_spop, check_ssh, check_ssmtp, check_swap, check_tcp, check_time, check_udp, check_ups, check_users, check_wave) that allow Nagios to perform various monitoring tasks:

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
tar xzf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16/

Update November 19, 2013:

GabrielM reported in a comment below that the occasionally changing “current version” of the Nagios plugins makes it difficult to download the plugins from the source shown above.  If you open the http://prdownloads.sourceforge.net/sourceforge/nagiosplug/ web page in a web browser, the web browser will be redirected to http://sourceforge.net/projects/nagiosplug/files/ which contains the following statement:

“The Nagios Plugins are no longer distributed via SourceForge. For downloads and other information, please visit: https://www.nagios-plugins.org/
Source: README.md, updated 2013-10-01″

If you follow that link and then click the Download heading at the top of the page, there should be a link on the page that allows access to the current version of the Nagios plugins.  That link is currently: https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

The command that GabrielM provided should work:

wget https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

DiskStation DS212+ Notes:

The following configure, make, and make install commands were used:

./configure --prefix=/opt --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w"
make
make install

For the DiskStation DS1813+ 64 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux
make -i 
make install -i

For the DiskStation DS412+ 32 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local
make -i 
make install -i

Copy the Nagios plugins to the location expected by Nagios:

DiskStation DS212+ Notes:

The plugins were installed in the correct location on the DS212+

cp /usr/local/nagios/libexec/*.* /opt/libexec
cp /usr/local/nagios/libexec/* /opt/libexec
cp /usr/local/libexec/check_* /opt/libexec

Update November 19, 2013:

GabrielM reported in a comment below that the third command above may fail.  Depending on the compile options used, the first two commands or the third command may fail.  The first two commands are intended to accomplish the same task as the third command; the first two commands or the last command are expected to fail, but all three commands should not fail.  I should have explained this potential area of concern better.

Copy the Nagios startup script to the correct location so that Nagios will automatically start when the DiskStation is rebooted:

cp /usr/local/etc/rc.d/nagios /opt/etc/init.d/S81nagios

Verify that the ownership of the nagios directory is set correctly:

DiskStation DS212+ Notes:

The file is actually in the /opt/bin directory, so use this command instead:

chown nagios:nagios /opt/bin/nagios/nagios -R
chown nagios:nagios /usr/local/nagios -R

In addition to the main /opt/etc/nagios.cfg Nagios file, there are several other configuration files that are potentially used by Nagios (defined in the nagios.cfg file):

/opt/etc/objects/commands.cfg
/opt/etc/objects/contacts.cfg
/opt/etc/objects/timeperiods.cfg
/opt/etc/objects/templates.cfg
/opt/etc/objects/localhost.cfg
/opt/etc/objects/windows.cfg
/opt/etc/objects/server.cfg
/opt/etc/objects/switch.cfg
/opt/etc/objects/printer.cfg

We need to make a couple of adjustments in the  /opt/etc/objects/commands.cfg file.

vi /opt/etc/objects/commands.cfg

Change the ‘notify-host-by-email’ command definition section as follows:

define command{
    command_name notify-host-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘notify-service-by-email’ command definition section as follows:

define command{
    command_name notify-service-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘check_ping’ command definition section as follows (feel free to read the documentation for check_ping and specify different values):

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000,25% -c 5000,90% -p 3 
        }

Save the file and exit vi.

At this point, the Nagios network monitoring utility will likely experience an error similar to the following when attempting to send an alert email:

output=collect: Cannot write ./dfr6BFFPC7027203 (bfcommit, uid=1026, gid=25): Permission denied

Execute the following commands, which should fix the above problem:

chmod g+w /opt/var/spool/clientmqueue
chmod 444 /opt/etc/mail/*.cf
chmod 7555 /opt/sbin/sendmail

We will need to use su to test the execution of various commands as the nagios user.  Without this fix (described here), you might see the following error message:

su: warning: cannot change directory to /var/services/homes/nagios: No such file or directory su: /sbin/nologin: No such file or directory

Enter the following commands:

mkdir /var/services/homes
mkdir /var/services/homes/nagios
chown nagios:nagios /var/services/homes/nagios -R
vi /etc/passwd

Locate the line in the passwd file for the Nagios user.  Near the end of the line, /sbin/nologin should appear.  Replace that text with /bin/ash then save and exit vi.

Verify that the Nagios user is able to execute the check_ping plugin.  Replace MyDeviceHere with either an IP address or a network device name that is on your network:

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the ping command (called by check_ping) is not able to resolve a network device name, and the fully qualified dns name was not specified (MyDeviceHere.MyDomainHere.com), edit the /etc/resolv.conf file:

vi /etc/resolv.conf

On a new line in the file, add the following line (replacing MyDomainHere.com with your dns domain name for the network):

search MyDomainHere.com

Verify that sendmail works for the Nagios user.  At the prompt that appears, type a short message, press the Enter key, type a period, then press the Enter key again – replace MyEmailAddressHere@MyDomainHere.com with your email address):

su - nagios -c "/opt/sbin/sendmail -vt MyEmailAddressHere@MyDomainHere.com"

—-

It is important to always verify the Nagios configuration before starting (or restarting after a configuration change) Nagios.  To verify the configuration type the following:

/opt/bin/nagios -v /opt/etc/nagios.cfg

To start up Nagios as a background task (daemon), execute the following:

/opt/bin/nagios -d /opt/etc/nagios.cfg

To stop Nagios that is executing as a background task, type (or better yet ps | grep ‘nagios’   ):

ps

InstallNagiosDiskStation11

Then search though the list of processes for the first line that shows /opt/bin/nagios -d /opt/etc/nagios.cfg.  The number at the left of that line, 31152 in this case, is used to stop Nagios.  To stop Nagios, type the following (replace 31152 with the number shown on your screen):

kill 31152

Side note: I tried installing quite a few different C++ compilers that supposedly work with the Synology DSM (see here).  As such, I had to find a way to remove a directory, that directory’s subdirectories, and files.  The following command will completely remove the /usr/local/i686-linux-gnu directory, should the need arise:

rm -rf /usr/local/i686-linux-gnu

At this point, Nagios will hopefully run as a background task, and it should be able to ping and send email alerts.  However, if you were following the above directions, we have not yet instructed Nagios which devices to monitor, and to whom the alert emails should be sent.  The next step is to define the email contacts by modifying the /opt/etc/objects/contacts.cfg file (see the documentation for assistance):

vi /opt/etc/objects/contacts.cfg

After setting up the contacts, we should probably tell Nagios which devices to monitor.  If there are a lot of devices on your network to be monitored, you might find that using Microsoft Excel rather than vi to create the object definitions makes the task more manageable.  Set up a simple worksheet with four columns.  Column A will be used to specify the short host_name for the object to be monitored.  Column B will be used to specify the alias (long description for the object).  Column C will be used to either specify the IP address for the device or the network name for the device.  Column D will be used to identify the group to which the object belongs and the file name to which the definition is saved (the Excel macro supports the following groups: ap, camera, computer, external, other, printer, server, switch).

InstallNagiosDiskStation13

The Excel macro is set up to read a tab delimited file, rather than reading the object description directly from the Excel worksheet.  Highlight all of the rows in the worksheet except for the top header row, and press Ctrl C (or edit – Copy) to copy the definitions to the Windows clipboard in tab delimited format.  Start Notepad (Start – Run – Notepad), and then press Ctrl V (or edit – Paste) to paste the tab delimited object descriptions into Notepad.  The Excel macro code expects the text file to be saved as nagioshosts.txt.

The Excel macro code follows (I image that not many computers still have a second floppy drive installed, so change the B:\Hardware Documentation\Synology\ path as appropriate for your environment):

Private Sub cmdProcessText_Click()
    Dim intFileNumRead As Integer
    Dim intFileNumAP As Integer
    Dim intFileNumCamera As Integer
    Dim intFileNumComputer As Integer
    Dim intFileNumExternal As Integer
    Dim intFileNumOther As Integer
    Dim intFileNumPrinter As Integer
    Dim intFileNumServer As Integer
    Dim intFileNumSwitch As Integer
    Dim intFileNumWrite As Integer

    Dim strLine As String
    Dim strItem() As String

    intFileNumRead = FreeFile
    Open "B:\Hardware Documentation\Synology\nagioshosts.txt" For Input As #intFileNumRead

    intFileNumAP = FreeFile
    Open "B:\Hardware Documentation\Synology\ap.cfg" For Output As intFileNumAP
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "# ap.cfg - lists the wireless access points to be monitored"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "define hostgroup{"; Chr(10);
    Print #intFileNumAP, "        hostgroup_name  ap                      ; The name of the hostgroup"; Chr(10);
    Print #intFileNumAP, "        alias           Local Access Points       ; Long name of the group"; Chr(10);
    Print #intFileNumAP, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumCamera = FreeFile
    Open "B:\Hardware Documentation\Synology\camera.cfg" For Output As intFileNumCamera
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "# camera.cfg - lists the IP cameras to be monitored"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "define hostgroup{"; Chr(10);
    Print #intFileNumCamera, "        hostgroup_name  camera                  ; The name of the hostgroup"; Chr(10);
    Print #intFileNumCamera, "        alias           Local IP Cameras          ; Long name of the group"; Chr(10);
    Print #intFileNumCamera, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumComputer = FreeFile
    Open "B:\Hardware Documentation\Synology\computer.cfg" For Output As intFileNumComputer
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "# computer.cfg - lists the shop floor computers to be monitored"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "define hostgroup{"; Chr(10);
    Print #intFileNumComputer, "        hostgroup_name  computer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumComputer, "        alias           Domain Computers          ; Long name of the group"; Chr(10);
    Print #intFileNumComputer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumExternal = FreeFile
    Open "B:\Hardware Documentation\Synology\external.cfg" For Output As intFileNumExternal
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "# external.cfg - lists the devices external to the LAN network to be monitored"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "define hostgroup{"; Chr(10);
    Print #intFileNumExternal, "        hostgroup_name  external               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumExternal, "        alias           Monitored devices External to the Network ; Long name of the group"; Chr(10);
    Print #intFileNumExternal, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumOther = FreeFile
    Open "B:\Hardware Documentation\Synology\other.cfg" For Output As intFileNumOther
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "# other.cfg - lists the miscellaneous devices to be monitored"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "define hostgroup{"; Chr(10);
    Print #intFileNumOther, "        hostgroup_name  other                 ; The name of the hostgroup"; Chr(10);
    Print #intFileNumOther, "        alias           Miscellaneous Devices ; Long name of the group"; Chr(10);
    Print #intFileNumOther, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumPrinter = FreeFile
    Open "B:\Hardware Documentation\Synology\printer.cfg" For Output As intFileNumPrinter
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "# printer.cfg - lists the printer devices to be monitored"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "define hostgroup{"; Chr(10);
    Print #intFileNumPrinter, "        hostgroup_name  printer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumPrinter, "        alias           Printers and Copiers  ; Long name of the group"; Chr(10);
    Print #intFileNumPrinter, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumServer = FreeFile
    Open "B:\Hardware Documentation\Synology\server.cfg" For Output As intFileNumServer
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "# server.cfg - lists the servers to be monitored"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "define hostgroup{"; Chr(10);
    Print #intFileNumServer, "        hostgroup_name  server               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumServer, "        alias           Server and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumServer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumSwitch = FreeFile
    Open "B:\Hardware Documentation\Synology\switch.cfg" For Output As intFileNumSwitch
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "# switch.cfg - lists the network equipment type devices to be monitored"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "define hostgroup{"; Chr(10);
    Print #intFileNumSwitch, "        hostgroup_name  switch               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumSwitch, "        alias           Switche and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumSwitch, "        }"; Chr(10); Chr(10); Chr(10);

    Do While Not (EOF(intFileNumRead))
        Line Input #intFileNumRead, strLine
        strItem = Split(strLine, vbTab)
        'strItem(0) = host_name
        'strItem(1) = alias
        'strItem(2) = address
        'strItem(3) = hostgroups
        Select Case strItem(3)
            Case "ap"
                intFileNumWrite = intFileNumAP
            Case "camera"
                intFileNumWrite = intFileNumCamera
            Case "computer"
                intFileNumWrite = intFileNumComputer
            Case "external"
                intFileNumWrite = intFileNumExternal
            Case "other"
                intFileNumWrite = intFileNumOther
            Case "printer"
                intFileNumWrite = intFileNumPrinter
            Case "server"
                intFileNumWrite = intFileNumServer
            Case "switch"
                intFileNumWrite = intFileNumSwitch
        End Select

        Print #intFileNumWrite, "define host{"; Chr(10);
        Select Case strItem(3)
            Case "ap"
                Print #intFileNumWrite, "        use             ap              ; Inherit default values from a template"; Chr(10);
            Case "camera"
                Print #intFileNumWrite, "        use             camera          ; Inherit default values from a template"; Chr(10);
            Case "computer"
                Print #intFileNumWrite, "        use             computer        ; Inherit default values from a template"; Chr(10);
            Case "external"
                Print #intFileNumWrite, "        use             external        ; Inherit default values from a template"; Chr(10);
            Case "other"
                Print #intFileNumWrite, "        use             other           ; Inherit default values from a template"; Chr(10);
            Case "printer"
                Print #intFileNumWrite, "        use             printer         ; Inherit default values from a template"; Chr(10);
            Case "server"
                Print #intFileNumWrite, "        use             server          ; Inherit default values from a template"; Chr(10);
            Case "switch"
                Print #intFileNumWrite, "        use             switch          ; Inherit default values from a template"; Chr(10);
        End Select
        Print #intFileNumWrite, "        host_name       "; strItem(0); "         ; The name we're giving to this device"; Chr(10);
        Print #intFileNumWrite, "        alias           "; strItem(1); "         ; A longer name associated with the device"; Chr(10);
        Print #intFileNumWrite, "        address         "; strItem(2); "         ; IP address of the device"; Chr(10);
        Print #intFileNumWrite, "        hostgroups      "; strItem(3); "         ; Host groups this device is associated with"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);

        Print #intFileNumWrite, "define service{"; Chr(10);
        Print #intFileNumWrite, "        use                     generic-service ; Inherit values from a template"; Chr(10);
        Print #intFileNumWrite, "        host_name               "; strItem(0); "        ; The name of the host the service is associated with"; Chr(10);
        Print #intFileNumWrite, "        service_description     PING            ; The service description"; Chr(10);
        Print #intFileNumWrite, "        check_command           check_ping!3000,25%!5000,90%    ; The command used to monitor the service"; Chr(10);
        Print #intFileNumWrite, "        normal_check_interval   5               ; Check the service every 5 minutes under normal conditions"; Chr(10);
        Print #intFileNumWrite, "        retry_check_interval    1               ; Re-check the service every minute until its final/hard state is determined"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);
    Loop

    Close #intFileNumRead
    Close #intFileNumAP
    Close #intFileNumCamera
    Close #intFileNumComputer
    Close #intFileNumExternal
    Close #intFileNumOther
    Close #intFileNumPrinter
    Close #intFileNumServer
    Close #intFileNumSwitch
End Sub

The files that are created use Unix/Linux standard line feed end of line marker characters, rather than the Windows standard carriage return/line feed combination characters.  As such, opening the generated files using Notepad is not advised.  Copy the generated files back to the /opt/etc/objects/ path on the DiskStation (copy the files to a Shared Folder on the DiskStation, then use the cp command to copy the files from the share location to /opt/etc/objects/ – the Shared Folders are typically created as a subdirectory in the /volume1/ directory).

If you decided to use some of the non-standard Nagios group names (as I did), those non-standard group names must be defined in the /opt/etc/objects/templates.cfg file:

vi /opt/etc/objects/templates.cfg

A portion of the additional entries that I made in this file include the following:

define host{
       name                    ap      ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, access points are monitored round t
       check_interval          5               ; Actively check the access point every 5 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each access point 10 times (max)
       check_command           check_ping      ; Default command to check if access points are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   30              ; Resend notifications every 30 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              ap ; Host groups that access points should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

define host{
       name                    camera  ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, cameras are monitored round t
       check_interval          60              ; Actively check the device every 60 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each device 10 times (max)
       check_command           check_ping      ; Default command to check if device are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   240             ; Resend notifications every 240 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              camera ; Host groups that cameras should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

Nagios will not know that it should read the additional configuration files until it is told to do so by modifying the /opt/etc/nagios.cfg file.

vi /opt/etc/nagios.cfg

Add the following lines to the nagios.cfg file:

# Charles Hooper's object types
cfg_file=/opt/etc/objects/ap.cfg
cfg_file=/opt/etc/objects/camera.cfg
cfg_file=/opt/etc/objects/computer.cfg
cfg_file=/opt/etc/objects/external.cfg
cfg_file=/opt/etc/objects/other.cfg
cfg_file=/opt/etc/objects/printer.cfg
cfg_file=/opt/etc/objects/server.cfg
cfg_file=/opt/etc/objects/switch.cfg

We have made a large number of changes to the configuration files, so it is important to verify that there are no errors in the configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are found in the configuration, terminate (kill) nagios and then restart as described above.

—-

Update July 28, 2013:

When attempting to start Nagios in daemon mode (/opt/bin/nagios -d /opt/etc/nagios.cfg) I encountered a couple of problems related to permissions for the Nagios user.  The nagios process was not listed when I used the ps command.  I then tried executing the following commands:

touch /opt/var/nagios.log
chown nagios:nagios /opt/var/nagios.log

Nagios was then able to start in daemon mode, but wrote messages similar to the following in the /opt/var/nagios.log file:

[1375058364] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058364] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058364] Bailing out due to errors encountered while attempting to daemonize… (PID=11451)
[1375058656] Nagios 3.5.0 starting… (PID=12936)
[1375058656] Local time is Sun Jul 28 20:44:16 EDT 2013
[1375058656] LOG VERSION: 2.0
[1375058656] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058656] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058656] Bailing out due to errors encountered while attempting to daemonize… (PID=12936)
[1375060107] Error: Unable to create temp file for writing status data: Permission denied
[1375060117] Error: Unable to create temp file for writing status data: Permission denied
[1375060127] Error: Unable to create temp file for writing status data: Permission denied
[1375060137] Error: Unable to create temp file for writing status data: Permission denied
[1375060147] Error: Unable to create temp file for writing status data: Permission denied
[1375060157] Error: Unable to create temp file for writing status data: Permission denied

I tried to set the permissions for a couple of other files, only to find another long list of Permission denied messages:

touch /opt/var/objects.cache
touch /opt/var/nagios.lock
touch /opt/var/nagios.tmp
chown nagios:nagios /opt/var/objects.cache
chown nagios:nagios /opt/var/nagios.lock
chown nagios:nagios /opt/var/nagios.tmp

I then recalled that I had seen similar messages on the DiskStation DS412+.  I then tried a different approach, creating a nagios directory in the /opt/var directory, creating a couple of subdirectories in that directory, and then assigning nagios as the owner of that directory structure:

mkdir /opt/var/nagios
mkdir /opt/var/nagios/archives
mkdir /opt/var/nagios/spool
mkdir /opt/var/nagios/spool/checkresults
chown nagios:nagios /opt/var/nagios -R
vi /opt/etc/nagios.cfg

In the nagios.cfg file, I made the following changes:

log_file=/opt/var/nagios/nagios.log
status_file=/opt/var/nagios/status.dat
lock_file=/opt/var/nagios/nagios.lock
temp_file=/opt/var/nagios/nagios.tmp
log_archive_path=/opt/var/nagios/archives
check_result_path=/opt/var/nagios/spool/checkresults
state_retention_file=/opt/var/nagios/retention.dat
debug_file=/opt/var/nagios/nagios.debug

After saving the file and exiting vi, I restarted Nagios in daemon mode.  Reading the last 100 lines of the Nagios log file is now accomplished with this command:

tail -n 100 /opt/var/nagios/nagios.log

—-

There are a lot of seemingly interesting Nagios plugins, including check_oracle (I believe that this plugin requires the Oracle client to be installed – good luck with that install).  On one of the DiskStations the check_snmp plugin did not compile, while on the other DiskStation the check_http plugin did not compile.

It might be interesting to see what solutions readers are able to develop from the above starting point.  The above information is the result of many hours of experimentation as well as a couple minutes reading through sections of the Nagios documentation (it reads like the Oracle Database documentation, so it should be an easy read once I am in the right mood) and hopelessly scanning the ‘net for information about obscure error messages.  Have fun, and try not to put the DiskStation out of service due to a mistaken file copy.

Update November 19, 2013:

Installing an updated version of the Synology DSM operating system may temporarily disable Nagios.  Make backups of all Nagios confirguration files (copying the files with the cp command to a directory in /volume1 is generally safe) before installing different versions of the Synology DSM operating system.

The DSM 4.3 operating system installation apparently removed the /var/services/homes directory.  That directory removal makes it impossible for the Nagios user to login to run various commands.  I assume that the removal of the homes directory is intentional, so a work around for that problem:

mkdir /var/services/home
mkdir /var/services/home/nagios
chown nagios:nagios /var/services/home/nagios -R
vi /etc/passwd

In the /etc/passwd file, change all /homes/ entries to /home/ then save and exit vi.

The installation of the different DSM version (including versions before 4.3) will likely also replace/remove the libltdl.* files located in /opt/local/lib and /usr/lib, so we need to copy those files back into the correct directories:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Once the above items are copied, try executing the check_ping command as the nagios user (replace MyDeviceHere with either an IP address or the name of a device on your network).

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the DiskStation reports that the check_ping command was not found, then copy that file back to the /opt/libexec/ directory.  If the above command was successful, try verifying the Nagios configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If the verification was successful, start Nagios as a daemon:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Execute the ps command and verify that the above command is listed in the running processes:

ps

Finally, verify that Nagios is still set to start automatically as a daemon:

ls /opt/etc/init.d/S81nagios

If a file is listed when the above command is executed, then Nagios should now be fully repaired.

Update August 12, 2015:

You may at some point need to verify that a http web server is online.  If you execute the following command (replace http://www.mydomain.com with an actual web server for a domain to be monitored):

/opt/libexec/check_http -H www.mydomain.com

You may see one of the following errors:

/opt/libexec/check_http: error while loading shared libraries: libssl.so.0.9.8: cannot open shared object file: No such file or directory
/opt/libexec/check_http: error while loading shared libraries: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory

The problem is likely caused by two missing symbolic links.  The following commands worked on both the DS1813+ and the DS412+:

ln -s /usr/lib/libssl.so.1.0.0 /usr/lib/libssl.so.0.9.8
ln -s /lib/libcrypto.so.1.0.0 /usr/lib/libcrypto.so.0.9.8

If instead you see this error message, then there is likely a problem with the compiled Nagios plugins (the DS1813+ was missing this plugin, so I copied it from the DS412+):

-ash: /opt/libexec/check_http: not found

To use check_http to monitor a website status I had to create a new host template because Nagios continued to use the check_ping command to check the website status, even though I had tried to override the default template value in the service entry for the website (this might be a bug, or I might have configured something wrong).  Add the following lines to the /opt/etc/objects/templates.cfg file below the external host definition section:

define host{
        name                    external-http   ; The name of this host template
        use                     generic-host    ; Inherit default values from the generic-host template
        check_period            24x7            ; By default, Windows servers are monitored round the clock
        check_interval          5               ; Actively check the server every 5 minutes
        retry_interval          1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts      1               ; Check each server 10 times (max)
        check_command           check_http      ; Default command to check if servers are "alive"
        notification_period     24x7            ; Send notification out at any time - day or night
        notification_interval   30              ; Resend notifications every 30 minutes
        notification_options    d,r             ; Only send notifications for specific host states
        contact_groups          admins          ; Notifications get sent to the admins by default
        hostgroups              external        ; Host groups that web servers should be a member of
        register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
        }

Next, define the websites to be monitored in the /opt/etc/objects/external.cfg file.  For example, to monitor the http://www.oracle.com website, add the following lines:

define host{
        use             external-http        ; Inherit default values from a template
        host_name       Oracle_website       ; The name we're giving to this device
        alias           Oracle Website       ; A longer name associated with the device
        address         www.oracle.com       ; Host address of the device
        hostgroups      external             ; Host groups this device is associated with
        }
 
define service{
        use                     generic-service ; Inherit values from a template
        host_name               Oracle_website  ; The name of the host the service is associated with
        service_description     HTTP            ; The service description
        check_command           check_http      ; The command used to monitor the service
        normal_check_interval   3               ; Check the service every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-check the service every minute until its final/hard state is determined
        }

It appears that Nagios might be expecting an IP address rather than a host name (such as http://www.oracle.com), so you may also need to modify the /opt/etc/objects/commands.cfg file, changing the -I to -H in the check_http command definition, as shown below:

# 'check_http' command definition
define command{
        command_name    check_http
        command_line    $USER1$/check_http -H $HOSTADDRESS$ $ARG1$
        }

When making changes to the Nagios configuration, always use the verify command to make certain that syntax errors are not present in the configuration files.

/opt/bin/nagios -v /opt/etc/nagios.cfg

To restart Nagios, use the ps command to locate the first process entry with “/opt/bin/nagios -d /opt/etc/nagios.cfg” in its COMMAND column:

ps | grep 'nagios'

If that line contains the number 3692 in the first column, execute the following:

kill 3692

Then restart Nagios in the background (daemon mode)

/opt/bin/nagios -d /opt/etc/nagios.cfg

I never bothered to determine how to make the Nagios monitoring status web pages work with the Synology DS412+, DS1813+, and DS212+.  I thought that I would see what steps would be required to make that functionality work on a Synology DS415+ (note that the following steps work nearly exactly the same on the Synology DS412+ and DS1813). The Synology DSM operating system utilizes port 5000 for web traffic.  If you add the WordPress package to the Synology, that package uses the standard port 80 for web traffic.  If you followed the above steps for installing Nagios, you installed a second copy of the Apache web server on the Synology that uses port 8000 for web traffic.  If your Synology has an IP address of 192.168.1.60, then you would use the following website address to access the second web server on the Synology: http://192.168.1.60:8000/nagios/

The Nagios monitoring status web pages, once set up, will appear as a directory of that second web server, as shown below:

nagiosds415_web_status

The value of the Nagios monitoring status web pages becomes apparent fairly quickly by reviewing the above screen capture.  The above Tactical Overview shows that there are 18 monitored devices that are up, one monitored device that is down, and another device that is in the process of being checked.  The links at the left provide additional information about the monitored devices.  Let’s configure the second copy of Apache on the Synology to support the Nagios monitoring status web pages.

The second copy of Apache uses the configuration file /opt/etc/apache2/httpd.conf.  Edit that file:

vi /opt/etc/apache2/httpd.conf

Add the following directly below the # Supplemental configuration heading, which is near the end of the file:

# Added by per Charles Hooper's Nagios installation guide for Synology DS415+
Include etc/apache2/conf.d/nagios.conf
Include etc/apache2/conf.d/php.conf

Save the file and exit vi.  Next, we need to create the htpasswd.users file in the correct location.  In the above steps, we executed the following command:

/opt/sbin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

That htpasswd.users file is expected to be in the /opt/etc directory.  Execute the following command to create the file in the correct directory (you will be prompted for a password for the nagiosadmin user):

/opt/sbin/htpasswd -c /opt/etc/htpasswd.users nagiosadmin

The Nagios monitoring status web pages require PHP support to be added to the second Apache installation.  Execute the following commands to install PHP:

/opt/bin/ipkg update
/opt/bin/ipkg list
/opt/bin/ipkg install php
/opt/bin/ipkg install php-apache

Next, we need to modify the /opt/etc/apache2/conf.d/nagios.conf file that was created during the Nagios installation.

vi /opt/etc/apache2/conf.d/nagios.conf

Below the heading <Directory “/opt/share”> add the following line:

   DirectoryIndex index.php

Save the file and exit vi.  Next, we need to adjust the php.ini file on the Synology:

vi /opt/etc/php.ini

In the file (near line 108), locate the following line:

output_buffering = Off

Change that line to show:

output_buffering = 8192

Locate the following line (near line 248) in the file:

memory_limit = 8M

Change that line to show:

memory_limit = 128M

Save the file and exit vi.  There may be a second php.ini file located in the /etc/php directory, make the same change to that file, then save the file and exit vi:

vi /etc/php/php.ini

Perform a graceful restart of the second copy of Apache:

/opt/sbin/apachectl -k graceful

Note:

The above command executed without a problem on the DS1813+, however, the DS412+ displayed one of the following two error messages:

httpd: Syntax error on line 102 of /opt/etc/apache2/httpd.conf: Cannot load /opt/libexec/mod_dav.so into server: /opt/libexec/mod_dav.so: undefined symbol: syno_recv_usr2
httpd: Syntax error on line 108 of /opt/etc/apache2/httpd.conf: Cannot load /opt/libexec/mod_dav_fs.so into server: /opt/libexec/mod_dav_fs.so: undefined symbol: HomeIsExist

When each error appeared, I modified the /opt/etc/apache2/httpd.conf file to put a # symbol in front of the offending line.  Doing so permitted the apachectl command to execute without error, but the errors may be a sign that the Nagios status pages will not work correctly.

Try to access the Nagios status monitoring pages in a web browser (replace 192.168.1.60 with the IP address of your Synology)  http://192.168.1.60:8000/nagios/

You should be prompted to enter a username and password.  Enter nagiosadmin for the username, and enter the password for that user that you created when executing the /opt/sbin/htpasswd command earlier.  If the web page does not display, take a look at the last 100 lines of the Apache error log to see if any clues are provided:

tail -n 100 /opt/var/apache2/log/error_log

Note:

The DS1813+ showed the following in the error_log – if you see this error, the error may go away after completing the steps that follow:

File does not exist: /opt/share/www/nagios

Some of the interesting error messages that I experienced including the following:

[Sun Jan 04 14:07:55 2015] [error] [client 192.168.1.218] (2)No such file or directory: Could not open password file: /opt/etc/htpasswd.users
 
[Sun Jan 04 14:23:04 2015] [error] [client 192.168.1.218] Directory index forbidden by Options directive: /opt/share/
 
[Sun Jan 04 15:07:09 2015] [error] [client 192.168.1.218] File does not exist: /opt/share/<, referer: http://192.168.1.60:8000/nagios/
 
[Sun Jan 04 17:53:06 2015] [notice] child pid 15126 exit signal Segmentation fault (11)
 
[Sun Jan 11 09:41:57 2015] [error] [client 192.168.1.213] script not found or unable to stat: /opt/sbin/statusmap.cgi, referer: http://192.168.1.60:8000/nagios/side.php

The last of the above errors still remains, the file /opt/sbin/statusmap.cgi does not exist.  For now, I will let someone else find a solution for that problem.

Note:

If you saw a login prompt, but the login was unsuccessful, check the Apache error log:

tail -n 100 /opt/var/apache2/log/error_log

On the DS1813+, I saw the following error:

No such file or directory: Could not open password file: /opt/etc/htpasswd.users

The above error indicates that the htpasswd.users file was expected to be found in a different location.  To work around this issue, I simply created the file in the location specified in the error log:

/opt/sbin/htpasswd -c /opt/etc/htpasswd.users nagiosadmin

If you find that a monitored device is down, using the web interface it is possible to disable future checking of the device, as well as alter a number of other notification options.  Unfortunately, clicking any of the options under the Host Commands heading will result in a permission error.

nagiosds415_change_notifications

To fix the permission error:

Using the Synology Control Panel interface, create a new group named www – this group requires no specific Synology permissions.  Next, using the Synology Control Panel interface, create a user named apache – make this user a member of the nagios, users, and www groups.  This user requires no specific Synology permissions.  Then change the username and group under which Apache executes:

vi /opt/etc/apache2/httpd.conf

In the httpd.conf file, locate the following two lines:

User nobody
Group #-1

Change those lines to appears as follows:

User apache
Group www

Save the file and exit vi.  Gracefully restart the second copy of Apache:

/opt/sbin/apachectl -k graceful

Refresh the Nagios monitoring status web page – the options under the Host Commands heading should no longer return an error.





Value of Improving One’s Knowledge

24 03 2013

March 24, 2013

As I type this blog article I am in the middle of reading a second book on the topic of Windows Server 2008 R2.  I bought the books several months ago, possibly even a year or two ago, and just had not found the time to invest in reading yet another dry IT-related 1,500 page book.  I would much prefer to read an exciting Oracle Database book, but those books seem to be in short supply (and I really do need to finish these two Windows Server 2008 R2 books so that I can move onto the Windows Server 2012 books, the Exchange Server 2013 books, and the Oracle Database 12c books that will be released eventually).

Three years ago I put together an article about finding the motivation to keep reading through various IT books, and offered a suggested reading list for people interested specifically in various aspects of Oracle Database (totally unrelated, but I recall that the word aspect was a pet peeve of one of my college English professors… he would say, “What is an aspect?” – I probably have used that word about five times since then).  I think that my list of suggested books is still a decent list.  However, Kyle Hailey recently put together a related article, “Where to begin with Oracle and SQL” that is excellent, and a bit more up to date than my list – the only book on Kyle’s list that I have not read is written by Dan Tow.

So, for people working in the information technology (IT) disciplines, how much effort do you put into improving your knowledge, or just maintaining that knowledge?  Do you actively try to break things, in the hope of discovering a solution?  Do you participate in discussion forums, such as the OTN forums or Usenet?  Do you read magazines, books, and various online articles?  Or, do you simply continue to perform the same actions that you have for the last five, 10, or 20 years (if the RULE worked just fine back then, why change)?

I try to read at least a couple of IT focused books a year.  For many of the Oracle Database specific books read since mid-2008 I have tried to write formal reviews of those books, posted to Amazon.com.  One of those reviews stretched to 35.5 typewritten pages!  Early yesterday morning a reader left the following comment attached to that review:

Mar 22, 2013 11:43:09 PM PDT SavvyShopper says:
Charles, you seem Very Intelligent(?) and should then Write a Book, fixing all these issues, than writing so much … 🙂 How do you get so much time , if you are really “Working” ??

Oddly, the person deleted their comment a mere two hours after I posted a reply.  My first reaction to the comment was “must spending time expanding one’s knowledge be mutually exclusive to being employed?”  That was not the first time that someone using Amazon left a similar comment attached to one of my reviews:

Mar 31, 2011 12:37:49 PM PDT Wanda A. Cadogan says:
And don’t you ever wonder how guys like Mr. Hooper have so much time to write a 24 page review?

Is it really that hard to believe that guys like Mr. Hooper take notes (sometimes extensive notes) while reading technical publications?

I noticed that SavvyShopper had previously reviewed a couple of Oracle Database related books, as well as a couple of woodworking products.  Quite a while ago I toyed a bit with woodworking (a bit in woodworking…. cuts both ways), as a matter of fact I thought that I was headed to college to become a woodworking instructor.  So, I possibly have an odd connection with SavvyShopper, might this be an opportunity for a teachable moment?  I noticed too that in one of his reviews that SavvyShopper appeared to be struggling with the concept that a softwood can in fact be harder than a hardwood.  I am a little disappointed that SavvyShopper deleted his comment after all the effort that I put into formulating a reply:

SavvyShopper,

Thank you for leaving your comment.

Some of my book reviews, especially when there are many errors in the book being reviewed, tend to be very long.  I read computer related books to improve my knowledge, whether that means that I am simply reminded how something works, or I am learning something entirely new.   Investing effort (and occasionally an excessive amount of time) into improving my skill set is important to me, and I hope that it is important to other people in the IT related professions.  The Oracle Database book reviews that I post to Amazon are mostly a by-product of that self-improvement process.

I am still happily employed at the same place where I have worked for more than a decade as an IT professional.  Making time for expanding one’s computer skills through reading sometimes means sacrificing time spent with hobbies or other outside of work activities.  I believe that it is that sacrifice that is an important ingredient in the formula for long-term gainful employment.

Woodworking is a hobby of mine, and judging by your reviews on Amazon, woodworking may also be one of your hobbies.  I see that you gave a thin kerf Forrest Woodworker II saw blade a 2 out of 5 rating (coincidentally the same rating that I gave to this book), primarily because you found that the blade did not impress you when cutting pine and redwood while building one project.  Pine, much more than redwood, leaves deposits of pitch on the face and sides of saw blade teeth (kerosene (or Simple Green) and an old toothbrush will remove the pitch deposits, but I think that there are commercial spray on products on the market now that serve the same purpose).  This buildup of pitch effectively reduces the saw blade’s ability to cut smoothly.  The thin kerf body of the saw blade that you reviewed is less resistant to flexing in a heavy cut, so combined with the pitch build up on the blade, you probably did experience poor quality cuts.  If you mounted the saw blade on an inexpensive saw with a weak/bad bearing setup, that will also explain the poor quality cut that you received.  If you want a treat, mount a regular kerf Woodworker II on an older, properly tuned 3HP or 5HP Unisaw (or Powermatic Model 66) and witness the glass smooth cuts in (even three inch thick) hardwood such as poplar, oak, walnut, or hard maple.

Getting back on topic of your comment, I currently have very little extra time to put into writing.  While probably not your intention, I do not see a book that summarizes mistakes found in other books and the accompanying corrections as selling more than a couple of dozen copies; based on some Oracle book authors remarkable ability to abuse the DMCA (Digital Millennium Copyright Act), such a book could also become a legal nightmare for the publishing company.  I have been approached a couple of times to see if I had an interest in writing a book on the topic of Oracle Database.  At this time, I simply cannot make that commitment (writer’s block, lack of available free time, required reading to stay current in the computer-related field, wanting to spend more time working with non-computer related routers and bits, etc.).

A couple of years ago I contributed to an Oracle Database specific book with fellow OakTable Network members.  The technical standards of the members of that group are extremely high (do a Google search, if you are interested).  As such, I would estimate that I spent roughly eight hours per page writing, testing, proofreading, rewriting, proofreading, testing, rewriting, and proofreading again.  It was quite an experience, especially when Oracle Corporation announced the general availability of a new major version release a day or two before the first draft of the section of the book had to be submitted to the publisher – more testing, rewriting, and proofreading of the book section followed after that first draft was submitted.

Thank you again for leaving the comment.

So, do you go out of your way to learn something new every day, or are you more likely to “let it roll” for five, 10, or 20 years?  For the record, I have a rather large stack of unread woodworking magazines (most of the unread magazines range from 0 to 7 years old) sitting on the shelf that are waiting for the time when spending time expanding one’s knowledge is mutually exclusive to being employed (in the IT industry).

A new logo in development – resawn from 1 inch x 2 inch quarter sawn oak.

Bookmatch Arrangement #1:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #2:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #3:

OLYMPUS DIGITAL CAMERA

The New Logo:

OLYMPUS DIGITAL CAMERA





What does it Mean when a Select Statement in Oracle is using 100% CPU?

14 02 2013

February 14, 2013

A couple of days ago I noticed that an interesting set of search keywords were used to access this blog.  The search keywords:

What does it mean when a select statement in Oracle is using 100% cpu

I had two thoughts when I first saw that set of search keywords:

  • Well Done!
  • What an inconsiderate Oracle developer!

I feel a bit conflicted about the two first thought answers.  Think about the above bullet points for a moment.

I was interested enough in the keywords that were phrased in the form of a question, that I went though the search keywords used to access this blog for the last two weeks, and found the following related search keywords:

  • ora-01861 Oracle causing cpu 100% usage
  • Oracle one session 100% cpu
  • Oracle cpu utilisation 100%, is it a good or bad sign
  • Oracle database server cpu utilization is 100

Interesting, and possibly thought provoking.  I recalled an article that I wrote on this blog roughly three years ago.

Possible related thoughts of a temporarily rogue DBA:

  1. I fixed your problem without changing the execution plan… take a look at the EXPLAIN PLAN output for yourself.
  2. I fixed your problem without changing the execution plan… the server is more efficient now that we have consolidated three more database instances onto the server.
  3. I fixed your problem without changing the execution plan… that PGA_AGGREGATE_TARGET parameter was clear on the other side of 100 MB, so I set it to 10 MB.
  4. I fixed your problem without changing the execution plan… a group of the guys in the office found a way to load the game Doom on the server.
  5. I fixed your problem without changing the execution plan… I took half of the hard drives out of the array and moved the server to a 10 Mb hub.

Took a close look at the bulletpointed search terms.  If someone were to ask you one of the following questions, how would you respond if you were earnestly attempting to help the person?

  1. What does it mean when a Select statement in Oracle is using 100% CPU?
  2. The server admin reported that the Oracle database server cpu utilization hits 100% too frequently, what should I do about the problem?
  3. Oracle is throwing an ORA-01861 error, and that is causing 100% CPU usage.  What should I do?

Share your thoughts.

Summary of Suggestions/Causes Shared by Readers (Last Updated February 14, 2013):

  • Intended or unintended Cartesian product between row sources – does the query have a DISTINCT clause.  (Rodger)
  • Intended or unintended data type conversions – is the data model storing numbers or date values in VARCHAR2 columns.  (Jeremy Kendrick)




Connecting to an Oracle Database with Visual Basic 6.0 on Windows 8 64 Bit

25 11 2012

November 25, 2012 (Modified December 7, 2012, May 22, 2015, May 11, 2016)

Compatibility problems?  Visual Basic 6.0, released in 1998, is not officially compatible with Windows 8 Pro 64 bit… or Windows 7, or Windows Vista.  But I still like the language a lot for its simplicity, rapid development, and significant pre-existing code base within my company.  Of course, Oracle Database 11.2.0.3 and the Oracle Client 11.2.0.3 are not officially supported on Windows 8, so maybe the quest is an exercise in futility.

Roughly a month ago a thread appeared on the OTN forums asking how to connect Visual Basic 2010 to Oracle Database Personal Edition on Windows 7 Pro 64 bit running on the same computer.  Several people offered very good advice to guide the OP.  The OP eventually asked how to connect Visual Basic 6 to Oracle Database Personal Edition running on the same Windows 7 Pro 64 bit computer.  As I mentioned, Visual Basic 6 is not officially compatible with Windows 7, and the fact that it is a 32 bit application running on a 64 bit Windows computer means that the 32 bit Oracle Client must also be installed on the computer.

Visual Basic 6.0 (note that you should install service pack 6 for Visual Basic 6, even if the installer locks up at the very end) will run just fine on Windows 7 Pro 64 bit (and probably on Windows 8 Pro 64 bit also), although drawing form objects is a little slow.  The automatic Windows updates may also pose an issue.  From time to time it may be necessary to re-register the 32 bit MSCOMCTL.OCX file to avoid error messages when opening Visual Basic projects.  To re-register that file, from a Windows command prompt (Run – cmd.exe) , enter the following:

cd \windows\syswow64
regsvr32 MSCOMCTL.OCX

Using Visual Basic 6.0, or any development environment for that matter, requires that certain components be selected for installation when installing the Oracle Client.  At a minimum, Oracle ODBC Driver 11.2.0.x.0 (for ODBC type connections), and Oracle Provider for OLE DB 11.2.0.x.0 (for OLE DB type connections) must be installed with the Oracle Client.

Let’s set up a simple Visual Basic project to demonstrate connecting to Oracle Database 11.2.0.3 running on the same Windows 8 computer.  First, when installing Visual Basic 6, you may want to install and register a couple of additional uncommonly used controls.  Those controls are located in the \COMMON\TOOLS\VB\CONTROLS folder on the Visual Basic install CD.  On a 64 bit computer, the controls should be copied to the C:\Windows\Syswow64 folder and then registered in that location using regsvr32 as demonstrated above with the MSCOMCTL.OCX file.  In some cases, a license file must also be imported into the Windows registry – those files have a .REG extension in the same location on the CD:

On to building the project.  First, we need to add a reference to allow the project to use ADO and the Oracle Provider for OLE DB that was installed with the Oracle Client.  From the Project menu, select References…:

Next, we need to add the most recent version of the Microsoft ActiveX Data Objects Library, version 6.1 for Windows 7 and Windows 8, version 6.0 for Vista, or version 2.8 for Windows XP.  After selecting the correct version, click the OK button:

Now let’s add a couple of additional controls to the project.  I first started using the Microsoft Grid Control in Visual Basic 2.0, so for old time’s sake let’s add that control to the project.  We may also want to add a status bar to the project’s form, so let’s also add Microsoft Windows Common Controls 6.0 (SP6).  Then click the OK button:

Now, draw the form controls on the form.  We need three command buttons named cmdLateBinding, cmdEarlyBinding, and cmdQueryDatabase, a grid control named grdOutput, and optionally a status bar named stbStatus with its Style property set to Simple:

Let’s add the code to the Late Binding (cmdLateBinding) button (this is essentially the same code that I provided in the OTN thread, and does not require the addition of the Microsoft ActiveX Data Objects Library in the references for the project – so this code is a good simple test to make certain that everything is installed correctly):

Dim intResult
Dim strDatabase
Dim strUserName
Dim strPassword
Dim strSQL
Dim dbDatabase
Dim snpData

'On Error Resume Next

strDatabase = "MyDB" 'From tnsnames.ora
strUserName = "MyUserID"
strPassword = "MyPassword"

Set snpData = CreateObject("ADODB.Recordset")
Set dbDatabase = CreateObject("ADODB.Connection")

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";"
dbDatabase.Open

If (dbDatabase.State = 1) And (Err = 0) Then
    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  SYSDATE CURRENT_DATE" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  DUAL"

    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        If Not (snpData.EOF) Then
            Do While Not (snpData.EOF)
                MsgBox snpData("current_date")

                snpData.MoveNext
            Loop
        End If
        snpData.Close
    End If
Else
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
End If

dbDatabase.Close
Set snpData = Nothing
Set dbDatabase = Nothing

Let’s run the project and click the Late Binding button to see what happens.  We should expect to see some sort of connection failure message, unless there is an entry in the tnsnames.ora for MyDB:

Well, that error message certainly was unexpected (the OP in the OTN thread may have encountered the same error message)!  Let’s click on the Debug button:

ADODB cannot find the OraOLEDB.Oracle provider.  I am sure that I installed that component when installing the 32 bit Oracle Client on the computer.  The Visual Basic code that was added to the Late Binding command button is very generic.  In fact, as written the code can be placed in a plain text file with a .VBS extension and executed as a VBScript file with either the wscript or cscript command.  Copy the code from the Late Binding command button into a plain text file (start the Notepad program and paste the code), then save the file as “LateBinding.vbs” (including the quotes).

Now to test the script that was just created.  Open a Windows command prompt (Windows key and R, type cmd, press the Enter key).  Change to the folder where the LateBinding.vbs file was saved, then type:

wscript LateBinding.vbs

Note that this time the computer indicated an ORA-12154: TNS:could not resolve the connect identifier error, rather than a Provider could not be found error – we were hoping to obtain the same ORA-12154 error as we did when working in Visual Basic 6.  What changed?  Well, this is a 64 bit computer, so the 64 bit wscript program was used, which used the 64 bit Oracle client (actually the 64 bit Oracle Database home files).

Let’s try again, this time with the 32 bit version of the wscript program.  To execute the script with the 32 bit wscript, execute the following:
c:\windows\SysWOW64\wscript LateBinding.vbs

Notice that the error now appears as Provider could not be found, which is the same error message that was observed in Visual Basic 6.  We reproduced the problem!  Now how do we fix it?  I suppose that we should check with the Oracle Client installer to verify that the Oracle Provider for OLE DB 11.2.0.x.0 component was in fact installed.  Let’s re-run the Oracle installer to check the Inventory of the installed components.  Just open the Start menu, click the Oracle – OraClient11g_home1 folder, then expand Oracle Installation Products, then click Universal Installer:

Uh, sure.  Windows 8 and its tablet-like Start menu… good luck finding the Universal Installer for the Oracle Client. (Edit November 26, 2012: I was trying to be humorous here.  If you have a touch screen, you can swipe your finger up from the bottom of the screen to reveal the option to display all items that normally appear on the pre-Windows 8 Start menu, separated by folder/Oracle home.  If you do not have a touch screen, hold down the Windows key and press the W key while in this new Windows 8 start screen to display the same list of items separated by folder/Oracle home.)  I always change the base install folder to C:\Oracle when installing Oracle components, so to start up the Universal Installer for the Oracle Client, I can just run:
C:\oracle\product\11.2.0\client_1\oui\bin\setup.exe

Oracle Provider for OLE DB 11.2.0.3.0 is in the list of installed components (click the Installed Products button in the Oracle Installer to see the above list), so why doesn’t the Oracle Provider for OLE DB from the 32 bit client work as expected?  That component requires that the OraOLEDB11.DLL file is found in the expected location, in my case, here:
C:\Oracle\product\11.2.0\client_1\bin\OraOLEDB11.DLL

Yes, the file is in that location.  Is the Oracle Client 11.2.0.3 just not compatible with Windows 8 Pro?  The Oracle Client 11.2.0.3 seemed to work as expected on a computer that was upgraded from Windows 7 Pro to Windows 8 Pro, including the OLE DB functionality, so why will that functionality NOT work on a new computer with Windows 8 Pro installed?

While comparing the Windows registry entries on a Windows 7 Pro 64 bit computer with the Windows registry entries on the Windows 8 Pro computer where the OLE DB functionality did not work, I noticed a problem – missing entries in the Windows 8 Pro’s Windows registry.  From the Windows 7 Pro 64 bit computer, I exported the three missing Windows registry sections:

3F63C36E-51A3-11D2-BB7D-00C04FA30080.reg:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}]
@="OraOLEDB.Oracle"
"OLEDB_SERVICES"=dword:ffffffff

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors]
@="Extended Error Lookup Service"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\OLE DB Provider]
@="Oracle Provider for OLE DB"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ProgID]
@="OraOLEDB.Oracle.1"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\TypeLib]
@="{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.Oracle"

3FC8E6E4-53FF-11D2-BB7D-00C04FA30080.reg:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\ProdID]
@="OraOLEDB.ErrorLookup.1"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.ErrorLookup.1"

0BB9AFD1-51A1-11D2-BB7D-00C04FA30080.reg (just the win32 section was missing):

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\TypeLib\{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}\1.0\win32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"

After verifying that the Oracle Client’s home on the Windows 8 Pro computer was in fact located in C:\Oracle\product\11.2.0\client_1 (it may not be on your computer, change the above registry files as necessary for your computer – use two \\ characters for each \ character that normally appears in the path to the Oracle Client’s home), I imported the registry entries into the Windows 8 Pro computer’s registry by double-clicking each of the files that were exported from the Windows 7 Pro computer.

The result after importing the registry entries and re-running the script with the 32 bit wscript:

Well, that is not good, another error message.  But wait, that was the same error message displayed when the 64 bit wscript was used to execute the script file.  Maybe we are making progress.  Let’s fix the LateBinding.vbs script file so that the strDatabase variable is set to a valid database name found in the 32 bit client’s tnsnames.ora file (I will use a database named sample, which has the Oracle sample schema loaded), the strUserName variable is set to a valid database username in the database (I will use the sh user), and the strPassword variable is set to the correct password for the username.

Let’s try executing the script again:

Oh, another error message.  ORA-28000: the account is locked.  I guess that some Oracle rules still apply.  Start up a SQL*Plus session in another Windows command prompt, connect to the database as the SYS AS SYSDBA user, and unlock the SH account (edit November 26, 2012: note that this account should probably be locked again once we finish experimenting):

ALTER USER SH ACCOUNT UNLOCK;

Let’s try executing the script again:

It Worked!  The 32 bit version of wscript on Windows 8 Pro 64 bit was able to connect to the 11.2.0.3 64 bit database running on the same computer and select the current date and time from the database.  But, does the Late Binding example code in Visual Basic 6 work?  Let’s check:

Nice!

[Headache Induced Rant] You may notice a nearly 3 hour time difference between the time output by wscript and the time output by Visual Basic 6.  That is not an error – I spent nearly three hours trying to put this blog article together in WordPress (total time putting together this article was close to 5 hours) on the Windows 8 computer AFTER successfully connecting to the database using the 32 bit version of wscript.  The Sony laptop designer who thought it would be a good idea to place the laptop’s left and right mouse buttons underneath the corners of the touch pad and then thought that it would be a good idea to enable zoom control when one finger is barely touching the location of the left mouse button while the other finger is attempting to scroll up and down the WordPress editor’s page should be taken out behind the building and slapped a couple of times.  I quite literally had to undo an unintended page zoom very close to 50 times while putting together this blog article.  The darn laptop has a touch screen… if I really want to zoom the screen, I will either manually select the zoom feature in the application or put two fingers on the screen and then separate those fingers.  (Edit November 26, 2012: Sony does a good job of burying the option to disable this touch pad feature – it is found in the same application that disables the touch pad’s tap to left-click “feature”.)

And which Microsoft developer thought that it would be a great idea to have a large clock pop up on screen in the lower left (taking away the typing entry focus from the application that was being used) on occasion when someone is trying to type on the computer?  That developer should also be taken out behind the building and slapped. [/Headache Induced Rant]

The above example code used early late (edit: July 24, 2014) binding, which defines variables as variants and then later assigns object types to the variables.  That technique, which now does work and is compatible with wscript, tends to result in poor performance.  Poor performance should be considered an application bug, so let’s build an example that uses early binding (note that the References mentioned earlier in this article must have been selected).  Add the following code to the Early Binding button (note that the code is attempting to connect to the sample database as sh user):

Dim intResult As Integer
Dim strSQL As String
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String

Dim snpData As ADODB.Recordset
Dim dbDatabase As ADODB.Connection

On Error Resume Next

Set dbDatabase = New ADODB.Connection

strDatabase = "sample" '"MyDB" 'From tnsnames.ora
strUserName = "sh" '"MyUserID"
strPassword = "sh" '"MyPassword"

'Connect to the database, specify that the default array fetch size should be 100 rows
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"
dbDatabase.ConnectionTimeout = 40
dbDatabase.CursorLocation = adUseClient
dbDatabase.Open

If (dbDatabase.State <> 1) Or (Err <> 0) Then
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
Else
    Set snpData = New ADODB.Recordset

    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  COUNT(*) NUM_ROWS" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  CUSTOMERS"
    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        If Not (snpData.EOF) Then
            MsgBox "Number of rows in the Customers table: " & Format(snpData("num_rows")), vbInformation
        Else
            MsgBox "No rows returned from the query.", vbCritical
        End If
        snpData.Close
    Else
        MsgBox "The query could not be executed.", vbCritical
    End If
End If

'When finished, clean up
If dbDatabase.State = 1 Then
    dbDatabase.Close
End If

Set snpData = Nothing
Set dbDatabase = Nothing

Let’s see the result of executing the above code:

Wow, 55,500 rows in that sample table, that is a lot of rows.

Finally, let’s add the code to the Query Database button.  This code will use early binding, bind variables, and present the result rows in the grid control.

Dim i As Integer
Dim intResult As Integer
Dim strSQL As String
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String
Dim strOut As String

Dim snpData As ADODB.Recordset
Dim comData As ADODB.Command

On Error Resume Next

Set dbDatabase = New ADODB.Connection

strDatabase = "sample" '"MyDB" 'From tnsnames.ora
strUserName = "sh" '"MyUserID"
strPassword = "sh" '"MyPassword"

'Connect to the database, specify that the default array fetch size should be 100 rows
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"
dbDatabase.ConnectionTimeout = 40
dbDatabase.CursorLocation = adUseClient
dbDatabase.Open

If (dbDatabase.State <> 1) Or (Err <> 0) Then
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
Else
    Set comData = New ADODB.Command
    With comData
        strSQL = "SELECT" & vbCrLf
        strSQL = strSQL & "  CUST_ID," & vbCrLf
        strSQL = strSQL & "  CUST_FIRST_NAME," & vbCrLf
        strSQL = strSQL & "  CUST_LAST_NAME," & vbCrLf
        strSQL = strSQL & "  CUST_GENDER," & vbCrLf
        strSQL = strSQL & "  CUST_YEAR_OF_BIRTH," & vbCrLf
        strSQL = strSQL & "  CUST_MARITAL_STATUS," & vbCrLf
        strSQL = strSQL & "  CUST_STREET_ADDRESS," & vbCrLf
        strSQL = strSQL & "  CUST_POSTAL_CODE," & vbCrLf
        strSQL = strSQL & "  CUST_CITY," & vbCrLf
        strSQL = strSQL & "  CUST_STATE_PROVINCE," & vbCrLf
        strSQL = strSQL & "  COUNTRY_ID" & vbCrLf
        strSQL = strSQL & "FROM" & vbCrLf
        strSQL = strSQL & "  CUSTOMERS" & vbCrLf
        strSQL = strSQL & "WHERE" & vbCrLf
        strSQL = strSQL & "  CUST_FIRST_NAME= ?"

        .CommandText = strSQL
        .CommandType = adCmdText
        .CommandTimeout = 30
        .Parameters.Append .CreateParameter("first_name", adVarChar, adParamInput, 20, "")
        .ActiveConnection = dbDatabase
    End With

    comData("first_name") = "Charles"
    Set snpData = comData.Execute

    If Not (snpData Is Nothing) Then
        If snpData.State = 1 Then
            If Not (snpData.EOF) Then
                grdOutput.Cols = snpData.Fields.Count
                grdOutput.FixedRows = 0
                grdOutput.FixedCols = 0
                grdOutput.Rows = 1

                strOut = ""
                For i = 0 To snpData.Fields.Count - 1
                    strOut = strOut & snpData(i).Name & vbTab
                Next i
                grdOutput.AddItem strOut 'Add the new heading row
                grdOutput.RemoveItem 0 'Remove whatever is on the first row

                Do While Not (snpData.EOF)
                    strOut = ""
                    strOut = strOut & Format(snpData("cust_id")) & vbTab
                    If Not (IsNull(snpData("cust_first_name"))) Then
                        strOut = strOut & snpData("cust_first_name") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_last_name"))) Then
                        strOut = strOut & snpData("cust_last_name") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_gender"))) Then
                        strOut = strOut & snpData("cust_gender") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_year_of_birth"))) Then
                        strOut = strOut & Format(snpData("cust_year_of_birth")) & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_marital_status"))) Then
                        strOut = strOut & snpData("cust_marital_status") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_street_address"))) Then
                        strOut = strOut & snpData("cust_street_address") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_postal_code"))) Then
                        strOut = strOut & snpData("cust_postal_code") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_city"))) Then
                        strOut = strOut & snpData("cust_city") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_state_province"))) Then
                        strOut = strOut & snpData("cust_state_province") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("country_id"))) Then
                        strOut = strOut & snpData("country_id") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    grdOutput.AddItem strOut 'Add the new data row to the grid control

                    snpData.MoveNext
                Loop
            Else
                MsgBox "No rows were returned by the query.", vbInformation
            End If
        Else
            MsgBox "The query could not be executed.", vbCritical
        End If
    Else
        MsgBox "The query could not be executed.", vbCritical
    End If
    dbDatabase.Close
End If

If grdOutput.Rows > 1 Then
    grdOutput.FixedRows = 1
End If

Set snpData = Nothing
Set dbDatabase = Nothing

What does the output look like when the above code is executed?

Nice!  Now the problem.  It is a very bad habit to repeatedly connect to and disconnect from the database, and such a habit may result in a variety of problems, including performance problems.  Ideally, the code to connect to the database should be relocated to a different part of the program, possibly the Form_Load subroutine.

Added December 7, 2012:

I was able to reproduce the above mentioned problem on a Windows 7 Pro computer when the 11.2.0.3 client was installed (without first installing the 11.2.0.1 or 11.2.0.2 client).  Even after importing the above mentioned registry entries, the program (and the VBScript) reported that the “Provider cannot be found”.  As mentioned, after verifying that the above indicated Oracle components were installed, I imported the following registry script (save as a plain text file, rename with a .reg extension, then double-click the file), which combines the three above mentioned registry scripts into a single file:

Windows Registry Editor Version 5.00
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}]
@="OraOLEDB.Oracle"
"OLEDB_SERVICES"=dword:ffffffff
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors]
@="Extended Error Lookup Service"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\OLE DB Provider]
@="Oracle Provider for OLE DB"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ProgID]
@="OraOLEDB.Oracle.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\TypeLib]
@="{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.Oracle"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\ProdID]
@="OraOLEDB.ErrorLookup.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.ErrorLookup.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\TypeLib\{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}\1.0\win32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"

The missing element that I neglected to mention in the original version of this article is that the Oracle 11.2.0.3 Client installer apparently forgets to register the OraOLEDB11.DLL file that is in the Oracle home.  The simple work-around for this problem is to manually register the DLL file, with a command similar to the following executed at a Windows command line (replace C:\Oracle\product\11.2.0\client_1 with the correct location of the Oracle client):

regsvr32 C:\Oracle\product\11.2.0\client_1\bin\OraOLEDB11.DLL

Edit May 22, 2015:

I have not yet verified whether or not the same issue exists with the 11.2.0.4 client, however that client version may be downloaded here once you have logged into the My Oracle Support website.  The above method fixed the problem with the 11.2.0.3 client (even on the Windows 10 beta versions) without any identified side effects.

Edit May 11, 2016:

Answering Vani’s comment stating that the ODBC drivers are not available with the 12c client.  It is typically necessary to perform a custom install of the Oracle Client or Database for the ODBC drivers to be installed.  Because Visual Basic 6.0 is a 32 bit program, the 32 bit Oracle Client (or Database) software must be installed, even if the database itself is running on a 64 bit server in a 64 bit Oracle Database install.

ConnectOracleVB6OracleInstall1

If during the install “Oracle Windows Interfaces 12.1.0.2.0” is selected, you are then able to install the Oracle ODBC drivers – I suggest installing the other items also that are checked in the below picture.

ConnectOracleVB6OracleInstall2

On 64 bit Windows 7.0 and 8.0 you must use the 32 bit ODBC Administrator to add ODBC connections that may be used in Visual Basic 6.0.  You will find the 32 bit ODBC Administrator in the C:\Windows\SysWOW64 folder, as shown in the picture below.  It has been a couple of months since I last checked, but I think that with Windows 10 32 bit and 64 bit ODBC configuration is setup using just one tool that is available from the Windows Settings applet.

ConnectOracleVB6OracleInstall3

I have always set up System DSNs for ODBC connections to an Oracle database.  As shown in the below picture, there are two Oracle database 11.2.0.x ODBC connections configured on this computer.

ConnectOracleVB6OracleInstall4

 

 





oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0

22 10 2012

October 22, 2012

Lately, I have been busy with a couple of items that are not completely Oracle Database specific.  I tried experimenting with high definition 1080p security cameras.  After wasting an unacceptable amount of money to purchase two Y-Cam Wireless High Definition 1080p video cameras (model YCBLHD6), and spending in excess of 38 hours fighting with the cameras to make them work as advertised, I wrote a video review of the product.  After writing a review, Y-Cam’s tech support representative contacted me again, so I pulled the cameras back out of the box and wasted another five hours trying to make the cameras work wirelessly – a feature that was lost after installing the third firmware version on the cameras.  The cameras magically connected wirelessly very early the next morning – but wireless communiation was only one of several problems that I was experiencing with this model of security camera.  What might cause a procedure to work magically when that same procedure failed several times in the past – and might this somehow have an odd relation to Oracle Database?  This is the explanation that I provided to Y-Cam’s tech support representative and posted as a follow up on Y-Cam’s support forum:

The camera’s Ethernet port and wireless adapter both have the same hardcoded MAC address. This MAC address is used by network switches to determine which devices are attached to each of the switch ports so that network traffic addressed to a specific MAC address is only sent down the switch port where that MAC address is known to be connected. By contrast a network hub broadcasts network traffic to all ports, rather than just the port to which a hardware device’s MAC address is attached; a hub is less likely to have an issue with two different network adapters having the same MAC address (Y-CAM support reported that they confirmed that the camera with the latest firmware worked with a non-broadcasted WPA2 protected wireless network). The network switch maintains a MAC address table for each switch port – when the camera’s Ethernet port is disconnected and the camera is power cycled, the switch’s MAC address tables may not be immediately updated to remove the camera’s MAC address from the wired Ethernet port so that the MAC address may be added to the switch port used for wireless connectivity. Power cycling the network switch clears the switch’s MAC address tables, allowing the camera’s MAC address to be added to the switch port used for wireless connectivity.

In short, it is necessary to power cycle the switch (or the Cisco Linksys E2000 router acting as only a wireless access point with integrated 4 port gigabit switch in my case) when moving the camera from a wired Ethernet connection to a wireless connection.

Makes a bit of sense… now if only motion detection worked as advertised and stability improved to the point that the camera does not require two or three reboots per day.  Thankfully, most other consumer products tend to work out of the box; imagine the ramifications if every time the weather conditions included rain, a vehicle would sputter, shake violently, and reduce power to 10% of normal when the driver attempted to accelerate.

Changing gears a bit.  You might have noticed that the title of this article appears to be some sort of error message, probably produced by an Oracle product.  I am in the process of writing a program that interfaces with an Oracle Database, rapidly retrieving a variety of information from various database tables – using bind variables in the SQL statements, of course:

The problem?  My program seems to crash randomly on computers running either the 32 bit or 64 bit version of Windows 7 (the program works without issue on computers running the 32 bit version of Windows XP).  The randomness of the crashes makes it a bit difficult to troubleshoot this particular problem, but the vast majority of the crashes are associated with populating the treeview control.  Is the crash caused by a typo (bug) in my program logic?  Is the crash caused by the Windows 7 version of the treeview control?  Is the crash caused by Oracle Database 11.2.0.2, as a result of receiving several different SQL statements in rapid-fire fashion?  Could there be another explanation?

To start the investigation, I first enabled a 10046 trace at level 12.  I then entered an employee ID and started selecting different date ranges until the program crashed.  The crash message varied (as did the point at which the crash happened), and while Windows offered to contact the developer to report the error, I declined the offer.  Unfortunately, I did not capture any of the exact error messages, although I did open one of the debug files created during a crash – “Unhandled exception at 0x0fc01b3d”:

mov cx, word ptr [edi] – Thanks Windows, I think that I will go back to the safety of the 10046 trace files (my best guess is that this instruction is attempting to copy a value from a variable memory pointer).

Digging into the 10046 trace file, I found that this was the last SQL statement executed by my program before the crash:

CLOSE #1301509800:c=0,e=10,dep=0,type=1,tim=4032143047926
=====================
PARSING IN CURSOR #1301508512 len=1452 dep=0 uid=172 oct=3 lid=172 tim=4032143048002 hv=929599856 ad='3eb8207a0' sqlid='4y6m7dsvqj4bh'
SELECT 
  EMPLOYEE_ID, 
  SHIFT_DATE, 
  CLOCK_IN, 
  CLOCK_OUT, 
  HOURS_WORKED, 
  HOURS_VACATION, 
  HOURS_HOLIDAY, 
  HOURS_BEREAVEMENT, 
  HOURS_JURY_DUTY, 
  DECODE(SHIFT_DATE_CODE,'VAC',1,0) VACATION, 
  DECODE(SHIFT_DATE_CODE,'HOL',1,0) HOLIDAY, 
  DECODE(SHIFT_DATE_CODE,'BER',1,0) BEREAVEMENT, 
  DECODE(SHIFT_DATE_CODE,'JUR',1,0) JURY, 
  DECODE(SHIFT_DATE_CODE,'ABS',1,0) ABSENT, 
  DECODE(SHIFT_DATE_CODE,'EXC',1,0) EXCUSED, 
  SUM(DECODE(SHIFT_DATE_CODE,'VAC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) VACATION_COUNT, 
  SUM(HOURS_VACATION) OVER (PARTITION BY EMPLOYEE_ID) VACATION_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'HOL',1,0)) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_COUNT, 
  SUM(HOURS_HOLIDAY) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'BER',1,0)) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_COUNT, 
  SUM(HOURS_BEREAVEMENT) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'JUR',1,0)) OVER (PARTITION BY EMPLOYEE_ID) JURY_COUNT, 
  SUM(HOURS_JURY_DUTY) OVER (PARTITION BY EMPLOYEE_ID) JURY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'ABS',1,0)) OVER (PARTITION BY EMPLOYEE_ID) ABSENT_COUNT, 
  SUM(DECODE(SHIFT_DATE_CODE,'EXC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) EXCUSED_COUNT 
FROM 
  PAY_HOURS
WHERE 
  EMPLOYEE_ID= :1 
  AND SHIFT_DATE BETWEEN :2 AND :3 
  AND COALESCE(SHIFT_DATE_CODE,'ZZZ')<>'NOL' 
ORDER BY 
  COALESCE(SHIFT_DATE_CODE,'ZZZ'), 
  SHIFT_DATE
END OF STMT
PARSE #1301508512:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048001
BINDS #1301508512:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=4d936d40  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=4d936d60  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=4d936d68  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1301508512:c=0,e=129,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048323
WAIT #1301508512: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143048369
FETCH #1301508512:c=0,e=33,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048448
STAT #1301508512 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=43 us cost=6 size=50 card=1)'
STAT #1301508512 id=2 cnt=0 pid=1 pos=1 obj=0 op='WINDOW BUFFER (cr=3 pr=0 pw=0 time=34 us cost=6 size=50 card=1)'
STAT #1301508512 id=3 cnt=0 pid=2 pos=1 obj=0 op='FILTER  (cr=3 pr=0 pw=0 time=21 us)'
STAT #1301508512 id=4 cnt=0 pid=3 pos=1 obj=70154 op='TABLE ACCESS BY INDEX ROWID PAY_HOURS (cr=3 pr=0 pw=0 time=20 us cost=4 size=50 card=1)'
STAT #1301508512 id=5 cnt=0 pid=4 pos=1 obj=71786 op='INDEX RANGE SCAN SYS_C0022057 (cr=3 pr=0 pw=0 time=20 us cost=3 size=0 card=1)'
WAIT #1301508512: nam='SQL*Net message from client' ela= 1448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050020
PARSE #1301509800:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=266456858,tim=4032143050081
WAIT #1301509800: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050142

*** 2012-10-19 09:43:33.522
WAIT #1301509800: nam='SQL*Net message from client' ela= 5761077 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032148811261  

Great, I will take a closer look at the part of my program that executes that SQL statement.  But wait, someone famous once said: “Insanity is repeating the same mistakes and expecting different results.”  Let’s run my program again with a 10046 trace at level 12 enabled.  This time, I found that the following was the last SQL statement executed by my program before the crash:

PARSING IN CURSOR #1078422680 len=1257 dep=0 uid=172 oct=3 lid=172 tim=4031964081030 hv=4064672182 ad='3a68828c8' sqlid='4cj96t7t4bydq'
SELECT 
  LT.TRANSACTION_ID, 
  WO.PART_ID, 
  LT.WORKORDER_BASE_ID, 
  LT.WORKORDER_LOT_ID, 
  LT.WORKORDER_SUB_ID, 
  LT.OPERATION_SEQ_NO, 
  LT.TYPE, 
  LT.INDIRECT_ID, 
  LT.RESOURCE_ID, 
  LT.CLOCK_IN, 
  NVL2(LT.HOURS_WORKED,LT.CLOCK_OUT,NULL) CLOCK_OUT, 
  NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)) HOURS_WORKED, 
  LT.GOOD_QTY, 
  COUNT(*) OVER (PARTITION BY LT.EMPLOYEE_ID) LABOR_TICKETS, 
  SUM(NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2))) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_HOURS, 
  SUM(DECODE(LT.TYPE,'I',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_INDIRECT_HOURS, 
  SUM(DECODE(LT.TYPE,'S',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_SETUP_HOURS, 
  SUM(DECODE(LT.TYPE,'R',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_RUN_HOURS 
FROM 
  LABOR_TICKET LT, 
  WORK_ORDER WO 
WHERE 
  LT.EMPLOYEE_ID= :1 
  AND LT.SHIFT_DATE BETWEEN :2 AND :3 
  AND WO.TYPE(+)='W' 
  AND WO.SUB_ID(+)='0' 
  AND LT.WORKORDER_TYPE=WO.TYPE(+) 
  AND LT.WORKORDER_BASE_ID=WO.BASE_ID(+) 
  AND LT.WORKORDER_LOT_ID=WO.LOT_ID(+) 
  AND LT.WORKORDER_SPLIT_ID=WO.SPLIT_ID(+)
END OF STMT
PARSE #1078422680:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3199062715,tim=4031964081030
BINDS #1078422680:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=40476838  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=40476858  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=40476860  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1078422680:c=0,e=188,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4162394183,tim=4031964081369
WAIT #1078422680: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031964081423
FETCH #1078422680:c=0,e=79,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=4162394183,tim=4031964081539
STAT #1078422680 id=1 cnt=1 pid=0 pos=1 obj=0 op='WINDOW BUFFER (cr=4 pr=0 pw=0 time=81 us cost=9 size=101 card=1)'
STAT #1078422680 id=2 cnt=1 pid=1 pos=1 obj=0 op='FILTER  (cr=4 pr=0 pw=0 time=35 us)'
STAT #1078422680 id=3 cnt=1 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 time=33 us cost=8 size=101 card=1)'
STAT #1078422680 id=4 cnt=1 pid=3 pos=1 obj=69863 op='TABLE ACCESS BY INDEX ROWID LABOR_TICKET (cr=4 pr=0 pw=0 time=28 us cost=7 size=73 card=1)'
STAT #1078422680 id=5 cnt=1 pid=4 pos=1 obj=70966 op='INDEX RANGE SCAN IND_LT_SHIFT_DATE (cr=3 pr=0 pw=0 time=21 us cost=6 size=0 card=1)'
STAT #1078422680 id=6 cnt=0 pid=3 pos=2 obj=70072 op='TABLE ACCESS BY INDEX ROWID WORK_ORDER (cr=0 pr=0 pw=0 time=3 us cost=1 size=28 card=1)'
STAT #1078422680 id=7 cnt=0 pid=6 pos=1 obj=71722 op='INDEX UNIQUE SCAN SYS_C0021989 (cr=0 pr=0 pw=0 time=2 us cost=0 size=0 card=1)'

*** 2012-10-19 09:40:38.424
WAIT #1078422680: nam='SQL*Net message from client' ela= 9641490 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031973723245 

A different SQL statement was found in the trace file when my program crashed… do the SQL statements have anything in common?  Both SQL statements are using analytic functions, and both are using date-type bind variables.  Interesting… what was that saying about insanity?  Let’s try another test of my program with an enabled 10046 trace to see the last SQL statement found in the trace before the crash:

PARSING IN CURSOR #881825936 len=210 dep=1 uid=0 oct=3 lid=0 tim=4030801312382 hv=864012087 ad='3eddb9a08' sqlid='96g93hntrzjtr'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #881825936:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312381
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=2
EXEC #881825936:c=0,e=95,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312551
WAIT #881825936: nam='db file sequential read' ela= 6526 file#=1 block#=98252 blocks=1 obj#=427 tim=4030801319112
WAIT #881825936: nam='db file sequential read' ela= 759 file#=1 block#=100271 blocks=1 obj#=425 tim=4030801319923
FETCH #881825936:c=0,e=7396,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801319963
STAT #881825936 id=1 cnt=1 pid=0 pos=1 obj=425 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=2 pw=0 time=7394 us)'
STAT #881825936 id=2 cnt=1 pid=1 pos=1 obj=427 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=1 pw=0 time=6583 us)'
CLOSE #881825936:c=0,e=46,dep=1,type=3,tim=4030801320046
...
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=11
EXEC #881825936:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321733
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801321767
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801321802
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=33
EXEC #881825936:c=0,e=152,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321992
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322027
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322062
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=35
EXEC #881825936:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801322195
FETCH #881825936:c=0,e=18,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322229
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322265
WAIT #881751720: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030801323194

*** 2012-10-19 09:21:14.806
WAIT #881751720: nam='SQL*Net message from client' ela= 8851219 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030810174461 

You might have noticed that the trace file showed dep=1 in the PARSING IN CURSOR line, which indicates that the SQL statement was executed automatically outside of my program’s control – likely when attempting to optimize another SQL statement that was submitted by my program.

Let’s test that insanity theory again.  Another execution of my program with an enabled 10046 trace at level 12 showed this as the last SQL statement executed:

PARSING IN CURSOR #767713824 len=503 dep=0 uid=172 oct=3 lid=172 tim=4026166358219 hv=2534735629 ad='3b58041b0' sqlid='7x2t5dqbj9zsd'
SELECT 
  N.ID NCMT_ID, 
  N.NCMT_DATE, 
  N.PART_ID, 
  N.WORKORDER_BASE_ID, 
  N.WORKORDER_LOT_ID, 
  N.WORKORDER_PIECE_NO, 
  N.QTY, 
  N.DISCREPANCY_TYPE, 
  N.RESOURCE_ID, 
  N.CUSTOMER_ID, 
  NVL(N.TOTAL_COST,0) NCMT_COST, 
  COUNT(N.ID) OVER (PARTITION BY N.EMPLOYEE_ID) NCMT_COUNT, 
  SUM(NVL(N.TOTAL_COST,0)) OVER (PARTITION BY N.EMPLOYEE_ID) TOTAL_COST 
FROM 
  NON_CONFORMING N 
WHERE 
  N.EMPLOYEE_ID= :1 
  AND N.NCMT_DATE BETWEEN :2 AND :3 
  AND UPPER(N.ERROR_BY)='OPERATOR' 
ORDER BY 
  ID
END OF STMT
PARSE #767713824:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026166358219
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc25d58  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc25d78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc25d80  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
...
PARSE #767713824:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045440
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc21f50  bln=32  avl=07  flg=05
  value="DUDE2"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc21f70  bln=07  avl=07  flg=01
  value="6/22/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc21f78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #767713824:c=0,e=125,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045593
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237045634
FETCH #767713824:c=0,e=57,p=0,cr=10,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045727
WAIT #767713824: nam='SQL*Net message from client' ela= 3080 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237048857
CLOSE #767713824:c=0,e=7,dep=0,type=3,tim=4026237048899
PARSE #767713824:c=0,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237048951
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237051177

*** 2012-10-19 08:06:40.151
WAIT #767713824: nam='SQL*Net message from client' ela= 98732762 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026335783988 

Well, that SQL statement also used date-type bind variables and analytic functions… maybe there is a problem with Oracle Database 11.2.0.2?  But, why is this suddenly a problem… what has changed?  Nearly two years ago I wrote a blog article that described an ODBC update problem that was present in the Oracle 11.2.0.1 Client – the first Oracle Client version that could be easily downloaded and installed on Windows 7 (it is also possible to patch the downloadable 10.2.0.3/4 client to 10.2.0.5).  The Oracle 11.2.0.1 patch 7 (10155837 for 32 bit Windows clients) fixes the false ODBC lock violation errors that are reported in Microsoft Access (and various other programs) when UPDATE SQL statements are executed, and those SQL statements contain bind variables.  That same Oracle Client version and patch version are still installed on the computer (and all other computers here that run Windows 7), so that is not the source of the change.  On a side note, have you found it difficult to locate updated 11.2.0.2 or 11.2.0.3 client versions in MOS (Metalink)?

So, the 10046 trace failed to find a consistent source for the crashes, and the Oracle client has not been changed on the Windows 7 computer in nearly two years.  Program bug?  Analytic query bug?  Bind variable related bug?  I went digging for the sqlnet.log file to see if that file provided any clues.  This is what I found:

Fri Oct 19 13:26:52 2012
Directory does not exist for read/write [C:\Oracle\product\11.2.0\client_1\log] [C:\Oracle\product\11.2.0\client_1\log\diag\clients]
Errors in file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc  (incident=401):
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
Incident details in: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc

That file’s contents proved to be a bit helpful, I think.  Let’s take a look at the contents of the file mentioned above:

Dump file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc
Dump continued from file: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
========= Dump for incident 401 (oci 24550 [3221225477]) ========
Tracing is in restricted mode!
----- Short Call Stack Trace -----
_dbgexPhaseII()+850<-_dbgexProcessError()+2061<-_dbgeExecuteForError()+43<-_dbgePostErrorDirect()+2368<-_kpeDbgSignalHandler()+225<-_skgesig_Win_UnhandledExceptionFilter()+140<-75D6003D<-7295FA2E<-0040AD0D<-77309EF0<-77309EC0<-00000000<-773079DD<-77306E72<-772FE0ED<-275C83DD<-275D219F<-275D1D88<-275D0E0E<-2759C677<-004CB035<-004DC4A3<-72991D31<-00410FD8<-7299202F----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
End of Incident Dump

Might the above indicate that my program logic is not the cause of the problem, but instead that it is the Oracle client that is crashing?  I started digging through various websites looking for a similar oci-24500 error message, but I found nothing that was helpful.  I searched through the bug database and technical article database on MOS (Metalink), and found nothing that was helpful.  By chance, I widened the search on MOS a bit and stumbled upon Mark Powell’s message in this Oracle Support Community thread (you need a MOS account to access the message).  That message pointed to two other MOS articles, at least one of which stated that the problem is (possibly – not exactly the same error message) corrected when the Oracle 11.2.0.3 Client is installed.  I struggled finding the 11.2.0.3 Client in MOS, and then found that it is located on disk 3 of 6 of the 11.2.0.3 server download (direct link – just specify operating system version and download disk 3 – p10404530_112030_platform_3of6.zip (requires a MOS account to download)).  The 11.2.0.3 client version magically corrected the seemingly random crashes in my programprogrammer error averted.

Insanity?  No more so than an expensive security camera that fails at motion detection, or a vehicle that decelerates when the driver commands the vehicle to accelerate during a light rain.  😉

P.S.: Sorry about the formatting of the blog article.  A WordPress update apparently disabled my custom CSS that expanded the article to the full width of the browser window – I am still in search of a fix for that problem.





Undo Quiz – Bringing the Right Tools for the Job

17 09 2012

September 17, 2012

Sometimes humor is a good recipe for undo.  A mid-50s man was driving down the road, riding a purple mule, when he saw a house with a new window.  Uninvited, the man decided to take a closer look using an obvious tool of choice, a tire iron (YPDNTI).  Elapsed time, roughly 60 seconds.  The man is obviously familiar with Oracle Database’s concept of transactions and issuing a ROLLBACK, but did not account for video recording in several autonomous transactions.  Even through a COMMIT was not issued, that does not imply that nothing happened in those 60 seconds. 

Riding a purple mule… strange.

On to the quiz.  Assume that there are three and only three users connected to an Oracle database (preferrably a test database of sorts).  The first session is connected as the SYS user, and the other two sessions are connected as a normal database user (the same user ID, but with access permissions for DBA_SOURCE).  Assume that session 1 creates a new undo tablespace that cannot grow beyond 1MB in size:

CREATE UNDO TABLESPACE SMALL_UNDO2 DATAFILE 'C:\Oracle\OraData\SAMPLE\small_undo2.dbf' SIZE 1M AUTOEXTEND OFF; 

And then that tablespace is set as the system default tablespace:

ALTER SYSTEM SET UNDO_TABLESPACE=SMALL_UNDO2;

In session 2, table T10 is created based on DBA_SOURCE:

CREATE TABLE T10 AS
SELECT
  *
FROM
  DBA_SOURCE;

In my case, I used the Oracle sample schema in Oracle Database 11.2.0.3 (tablespace for T10 is ASSM AUTOALLOCATE).  Let’s check the amount of space consumed (not necessarily all used for data strorage yet) for the T10 table:

SELECT
  ROUND(SUM(BYTES)/1024/1024,2) SIZE_MB
FROM
  USER_EXTENTS
WHERE
  SEGMENT_NAME='T10';

   SIZE_MB
----------
        88 

88MB – a nice even number.  With a 1MB maximum size for the undo tablespace, let’s see if the user is able to select all of the rows from the T10 table:

SET ARRAYSIZE 1000
SET AUTOTRACE TRACEONLY STATISTICS

SELECT
  *
FROM
  T10;

633054 rows selected.

Statistics
---------------------------------------------------
         11  recursive calls
          0  db block gets
      11352  consistent gets
      10635  physical reads
          0  redo size
   66576339  bytes sent via SQL*Net to client
       7322  bytes received via SQL*Net from client
        635  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     633054  rows processed 

That worked, but why?  Why didn’t the session receive an error when performing the 11,352 consistent gets to provide a consistent view of the data?

Maybe on the second execution we will see an error?

SELECT
  *
FROM
  T10;

633054 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11271  consistent gets
          0  physical reads
          0  redo size
   66576339  bytes sent via SQL*Net to client
       7322  bytes received via SQL*Net from client
        635  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     633054  rows processed

—–

The first quiz question is simple: How it is possible for Oracle Database to provide a read consistent view of a table that requires approximately 88MB of storage space when the file supporting the undo tablespace cannot exceed 1MB?  Maybe someone will be kind enough to share the answer with the OP in this message thread:
dbaforums.org/oracle/index.php?s=439a689327b6fbd09ba0018b320041d8&showtopic=21972

The second quiz question is also simple: Why did the first select of the table require 81 more consistent gets than the second select from that table?

—–

Continuing with the test case, this time working in session 3.  Let’s delete some rows from the table created in session 2:

DELETE FROM
  T10
WHERE
  ROWNUM<=300000; 

Oracle Database responses with the following message:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=300000;
  T10
  *
ERROR at line 2:
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO2'

The third question is slightly more challenging: Why is Oracle attempting to extend a segment when the session is deleting rows; also, what does the number 8 signify?

Let’s try again with a smaller number of rows:

DELETE FROM
  T10
WHERE
  ROWNUM<=30;

30 rows deleted. 

That worked.  Interestingly, the first time I tried the test case, I named the tablespace SMALL_UNDO and specified “RETENTION GUARANTEE” when creating the undo tablespace, and the following message appeared when executing the above command:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=30;
  T10
  *
ERROR at line 2:
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO'

After the above message appeared, session 3 successfully deleted 3 rows, 30 rows, and then 300 rows from table T10 without receiving an error:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=3;

3 rows deleted.

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=30;

30 rows deleted.

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=300;

300 rows deleted. 

The fourth question is again slightly more challenging: why was the 30 row delete from table T10 successful when the SMALL_UNDO2 undo tablespace was the default undo tablespace, yet that delete failed when the SMALL_UNDO tablespace was set as the default tablespace – until 3 rows were first deleted from the table?

Going back to the forum thread, another interesting question arises.  The fifth question is again challenging: What type of compression is used in undo logs?

I was curious about the last question.  In session 1 I executed the following:

SELECT
  R.NAME,
  T.XIDUSN,
  T.XIDSLOT,
  T.XIDSQN
FROM
  V$TRANSACTION T,
  V$ROLLNAME R
WHERE
  T.XIDUSN=R.USN; 

NAME                               XIDUSN    XIDSLOT     XIDSQN
------------------------------ ---------- ---------- ----------
_SYSSMU15_897375467$                   15          1          8

ALTER SESSION SET TRACEFILE_IDENTIFIER='UNDO_BLOCKS';
ALTER SYSTEM DUMP UNDO BLOCK '_SYSSMU15_897375467$' XID 15 1 8;

A portion of the generated trace file follows:

********************************************************************************
Undo Segment:  _SYSSMU15_897375467$ (15)
xid: 0x000f.001.00000008
Low Blk   :   (0, 0) 
High Blk  :   (2, 7) 
Object Id :   ALL 
Layer     :   ALL 
Opcode    :   ALL 
Level     :   2 

********************************************************************************
UNDO BLK:  Extent: 1   Block: 1   dba (file#, block#): 6,0x00000071
xid: 0x000f.001.00000008  seq: 0x4   cnt: 0x2c  irb: 0xa   icl: 0x0   flg: 0x0000

 Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x1f24     0x02 0x1e64     0x03 0x1dbc     0x04 0x1d34     0x05 0x1c68     
0x06 0x1b9c     0x07 0x1ae4     0x08 0x1a5c     0x09 0x1998     0x0a 0x18d8     
0x0b 0x1830     0x0c 0x17a8     0x0d 0x16f0     0x0e 0x162c     0x0f 0x15a4     
0x10 0x14ec     0x11 0x1424     0x12 0x1378     0x13 0x12f0     0x14 0x1234     
0x15 0x1168     0x16 0x10b8     0x17 0x1030     0x18 0x0f74     0x19 0x0ecc     
0x1a 0x0e44     0x1b 0x0d88     0x1c 0x0cbc     0x1d 0x0be8     0x1e 0x0b44     
0x1f 0x0a88     0x20 0x09bc     0x21 0x08e8     0x22 0x0840     0x23 0x0784     
0x24 0x06b8     0x25 0x05e4     0x26 0x0540     0x27 0x0484     0x28 0x03b8     
0x29 0x02e4     0x2a 0x023c     0x2b 0x0180     0x2c 0x00b4     

*-----------------------------
* Rec #0xa  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x09   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.09
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 16(0x10) size/delt: 89
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1f
col  4: [60]
 20 20 20 20 70 72 61 67 6d 61 20 42 55 49 4c 54 49 4e 28 27 58 4f 52 27 2c
 38 2c 20 33 2c 20 39 29 3b 20 2d 2d 20 50 45 4d 53 5f 49 4e 54 45 47 45 52
 2c 20 49 4e 54 5f 58 4f 52 0a

*-----------------------------
* Rec #0x9  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x08   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.08
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 15(0xf) size/delt: 90
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1e
col  4: [61]
 20 20 66 75 6e 63 74 69 6f 6e 20 58 4f 52 20 28 4c 45 46 54 20 42 4f 4f 4c
 45 41 4e 2c 20 52 49 47 48 54 20 42 4f 4f 4c 45 41 4e 29 20 72 65 74 75 72
 6e 20 42 4f 4f 4c 45 41 4e 3b 0a

*-----------------------------
* Rec #0x8  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x07   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.07
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 14(0xe) size/delt: 30
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1d
col  4: [ 1]  0a

*-----------------------------
* Rec #0x7  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x06   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.06
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 13(0xd) size/delt: 78
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1c
col  4: [49]
 20 20 2d 2d 20 20 77 68 65 6e 20 74 68 65 79 20 6f 63 63 75 72 20 69 6e 20
 63 6f 6e 64 69 74 69 6f 6e 61 6c 20 73 74 61 74 65 6d 65 6e 74 73 2e 0a 

It would not surprise me too much if a guide existed to assist in decompressing the undo logs (whatever that term implies).





On the Topic of Programming 3

3 09 2012

September 3, 2012

(Back to the Previous Post in the Series)

In the previous article in this series, a question was posed regarding how many SQL statements would be required to generate the following report, which displays in Microsoft Excel:

A significant number of items regarding the report require explanation:

  • Employee IDs and employee names appear in column A (the left-most column).  There could be many employees, but only those employees reporting labor transactions for machining operations in production batches (work order lots) that completed during one of the weeks should be listed.
  • Part IDs, part descriptions, and operations (Sub n OP nn in the above example report) are listed in column B (the second column from the left).  The number of operations listed for an employee could range from one to possibly 50 (or more, depending on the date range of the report), but that list will only include machining type operations (not welding, blasting, painting, laser cutting, etc.).
  • Total hours are reported in column C.  The second output row for an employee shows the sum of the numbers directly below for the employee.  The numbers on the 3rd, 6th, 9th, etc. rows for the employee in column C are the sum of the hours the employee worked in the operation for all manufacturing batches that completed in the report time period.
  • The first row for each employee, in columns E though H in the above report (could very well extend beyond column H, depending on the number of weeks selected), are the Monday through Sunday dates for the week.
  • The second row for each employee, in columns E through H in the above report show the weighted average efficiency for the employee in that week.  The weighting is based on the percentage of hours the employee spent in the operations, for those operations in batches that completed during the week.  This number of hours is calculated, but not printed in the report.
  • The numbers on the 4th, 7th, 10th, etc. rows for each employee, in columns E through H, shows the average number of hours the operation required, for those batches that completed during the week.  If an employee did not work in the operation one week, but did another, then a blank should appear in the week that the employee did not work in the operation.  All employees that worked in the operation for the week will have the same average printed.  For example, Cary Grant and James Stewart both worked on WIDGET101 operation Sub 0 OP 30.  The average hours for the first week is 10.50, and that value appears for both employees.  During the second week, only James Stewart worked on that operation, and the average hours for that week decreased to 7.56.
  • The numbers of on the 3rd, 6th, 9th, etc. rows for each employee, shows the ratio of the engineering standard hours divided by the average hours for the operation in that week.  The same values will appear for all employees reporting time to the operation in the week.
  • The percentage in column I is the simple average of the weighted averages to the left.  There may be causes where a blank will appear for one of the weighted averages for an employee, so it is not possible to simply sum the weighted averages and divide by the number of weeks.

This appeared to be a simple report when it was just a sketch on a sheet of paper!

How many SQL statement executions will be necessary to generate the above report?  Before answering that question, let’s take another look at how the data is organized and related between the tables:

OK, so the above diagram might be a little confusing.

The engineering standard hours are found by querying the OPERATION table with the WORKORDER_TYPE=’W’ and the WORKORDER_BASE_ID equal to the part ID that is of interest (the WORK_ORDER table also contains engineering standard information, accessed by searching for TYPE=’M’ and the BASE_ID equal to the part ID tht is of interest).  In theory, there could be multiple engineering standards for the same part ID and operation, which is why we might need to join to the PART table to make certain that the current engineering standard is retrieved, but we will ignore that potential issue for now.  It would be a good idea to also check the WORK_ORDER table to obtain a distinct list of part IDs that had a close date in the time range that is of interest (there is no sense in forcing the database RDBMS to retrieve the engineering standards for parts that have not been produced in the last 17 years), so we will need to specify that the TYPE column in the WORK_ORDER table is equal to W and that the CLOSE_DATE is in the date range of the report.  Many operations are NOT machining operations, and we can only determine the type of operation by determining the type machine that is selected for use, so we must also query the SHOP_RESOURCE table to determine if the operation is processed at a machining center.

The actual production hours are found by querying the LABOR_TICKET table with the the WORKORDER_TYPE=’W’ (to eliminate spurious indirect labor) and joined to the SHOP_RESOURCE table to make certain that the operation is a machining type operation (as explained above).  To determine the employees’ name, the LABOR_TICKET table is joined to the EMPLOYEE table.  To determine the part ID that is produced in the labor ticket transaction, the LABOR_TICKET table must be joined to the WORK_ORDER table, with the TYPE column in the WORK_ORDER table set to ‘W’ and the SUB_ID column in the WORK_ORDER table set to ‘0’ so that the main part ID for the work order is returned rather than a subordinate part ID.  To retrieve the description of the part ID that is produced, the WORK_ORDER table must be joined to the PART table.

That sketch of a report on a sheet of paper sure looked simple, but the degree of difficulty is all relative (or relational).

How many query executions will be required to produce the report that will appear in Microsoft Excel?  Should we retrieve the entire database over the network and process the data client-side?  I don’t program in the Java programming language, but I am guessing that the majority of Java programmers would not attempt to retrieve the entire database.   Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?

In the words of the TV show “Name that Tune“, I can name that tune in 1 note.  Because there are many to many joins between the various table, it is not possible to construct a single simple SQL statement that retrieves the correct result.  However, it is possible to construct a single complex query using inline views or subquery factoring (WITH blocks), along with analytic functions to retrieve the correct, necessary information to the client for presentation in Microsoft Excel.  The correct answer to the above question must:

  • Minimize the amount of data flowing across the network.  SQL*Net compression may help, but of course that can only possibly help if more than one row is retrieved at a time.  Ideally, the array fetch size should be set to a large value (possibly 100 up to the maximum of 5,000) to make the best use of SQL*Net compression.
  • Minimize as best as possible the performance impact on the database server, the database server is a shared resource without unlimited capacity.  Setting a reasonably large array fetch size may help reduce the number of memory accesses (specifically consistent gets) on the database server.  Make certain that available indexes are usable by the query – be careful about using functions (such as TRUNC) on date type columns in the WHERE clause.
  • When a query accesses many tables, and also when there are multiple inline views (or factored subqueries), the Oracle optimizer may not always find the optimal join order.  The _OPTIMIZER_MAX_PERMUTATIONS hidden parameter defaults to a value of 2,000, which of course restricts the number of possible join permutations attempted per query block in a query, so with many tables in a query block (more than 6, for instance), the limit for the maximum number of permutations may easily be hit.  Statistics must be reasonably accurate for cardinality estimates to be close to accurate, otherwise the query optimizer may select not only an inappropriate join order but also an inappropriate join type.  Histograms may help (or harm) the query optimizer, indexes may help (or harm), establishing default cardinality and cost for custom PL/SQL functions may help, and adding hints to help control the cardinality estimates or join orders may also help.
  • Lastly, and most importantly, the queries must provide correct information.

I struggled with this problem a bit.  The report is intended to measure efficiency, so it would be somewhat laughable if the report performed inefficiently.  How could I minimize execution time for this particular report?  If Oracle Database is licensed by CPU core (or CPU socket in the case of the Standard Edition), does it make sense to perform all of the computations on the database server, or does it make sense to off-load some of that processing to the client’s dual, quad, or eight core CPU?  If the report were rendered by a shared resource, such as a web server, application server, or Citrix server, would I want that same CPU offloading as I would have planned with a dedicated client computer?  The report shows the information in a (modified) pivot table format, should I use the PIVOT function that was introduced with Oracle Database 11.1; or maybe I should use client-side control break logic that was taught in one of my university-level structured programming classes years ago?

That sketch of a report on a sheet of paper sure looked simple, now I need a control break (must be simple one key solution for that problem – check your keyboard 🙂 ).

It is possible to group the manufacturing lots by the Monday of the work week of the completion date (CLOSE_DATE) with the following function call:

NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7)

Looks fairly simple.  If I had a date range of four weeks in mind, I might be inclined to do something like this in the WHERE clause of the query:

CLOSE_DATE BETWEEN (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-28) AND (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-1)

However, I did not do that.  Why not?  One reason is that the Oracle query optimizer would likely have a hard time determining the approximate number of rows that would be returned from the table with the CLOSE_DATE column.  The NEXT_DAY(TO_DATE()) combination hides the actual date range of interest, so the optimizer falls back to using default cardinality percentages for that predicate.  Secondly, what happens when the user of the program is interested in something other than four weeks – the solution may be cumbersome to maintain.  Thirdly, the above did not make use of bind variables, so every time the date range is changed, the query optimizer will perform a hard parse of the query.  A better approach calculates the date range on the client-side and submits the WHERE clause like this:

CLOSE_DATE BETWEEN TO_DATE('30-JUL-2012','DD-MON-YYYY') AND TO_DATE('26-AUG-2012','DD-MON-YYYY')

Or, better yet using bind variables:

CLOSE_DATE BETWEEN :START_DATE AND :END_DATE

So, how many query executions will this report require?  Just one query, executed a single time.  The particular programming environment that I used requires ? to appear in bind variable positions, so when you see a ? just read it as either :START_DATE or :END_DATE.

SELECT
  L.EMPLOYEE_ID,
  E.LAST_NAME||', '||E.FIRST_NAME EMPLOYEE_NAME,
  L.PART_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO,
  L.DESCRIPTION,
  L.CLOSE_WEEK,
  L.PART_EFF_WEEK,
  L.AVG_HRS_PC,
  L.EMP_ACCUM_EFF_WEEK,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID) EMP_HOURS_WORKED,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.PART_ID, L.WORKORDER_SUB_ID,L.OPERATION_SEQ_NO) EMP_PART_HOURS_WORKED,
  SUM(L.EMP_ACCUM_EFF_WEEK) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_EFF_WEEK
FROM
  (SELECT
    O.PART_ID,
    O.DESCRIPTION,
    O.WORKORDER_SUB_ID,
    O.OPERATION_SEQ_NO,
    O.CLOSE_WEEK,
    O.RECEIVED_QTY,
    O.ACT_RUN_HRS,
    L.EMPLOYEE_ID,
    L.HOURS_WORKED,
    ROUND(O.ACT_RUN_HRS/O.RECEIVED_QTY,2) AVG_HRS_PC,
    ENG.ENG_HRS,
    ROUND(DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) PART_EFF_WEEK,
    SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_HRS_WEEK,
    ROUND(L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK)),4) EMP_PERCENT_WEEK,
    ROUND((L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK))) * DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) EMP_ACCUM_EFF_WEEK
  FROM
    (SELECT
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(RECEIVED_QTY) RECEIVED_QTY,
      SUM(ACT_RUN_HRS) ACT_RUN_HRS
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.RECEIVED_QTY,
        WO.CLOSE_DATE,
        SUM(O.ACT_SETUP_HRS+O.ACT_RUN_HRS) ACT_RUN_HRS,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO OPERATION_SEQ_NO,
        P.DESCRIPTION
      FROM
        WORK_ORDER WO,
        PART P,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND WO.PART_ID=P.ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.RECEIVED_QTY,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO,
        P.DESCRIPTION)
    GROUP BY
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) O,
    (SELECT /*+ LEADING(WO LT) */
      WO.PART_ID,
      LT.EMPLOYEE_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(LT.HOURS_WORKED) HOURS_WORKED
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID
      FROM
        WORK_ORDER WO,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID) WO,
      LABOR_TICKET LT,
      SHOP_RESOURCE SR
    WHERE
      WO.TYPE=LT.WORKORDER_TYPE
      AND WO.BASE_ID=LT.WORKORDER_BASE_ID
      AND WO.LOT_ID=LT.WORKORDER_LOT_ID
      AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
      AND LT.TYPE IN ('R','S')
      AND LT.HOURS_WORKED<>0
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      AND LT.RESOURCE_ID=SR.ID
    GROUP BY
      WO.PART_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.EMPLOYEE_ID,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) L,
    (SELECT /*+ LEADING(WO) */
      WO2.TYPE,
      WO2.BASE_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO OPERATION_SEQ_NO,
      SUM(O.SETUP_HRS+O.RUN_HRS) ENG_HRS,
      WO.PART_ID
    FROM
      (SELECT
        WO.PART_ID
      FROM
        WORK_ORDER WO
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
      GROUP BY
        WO.PART_ID) WO,
      WORK_ORDER WO2,
      OPERATION O,
      SHOP_RESOURCE SR
    WHERE
      WO2.TYPE='M'
      AND WO.PART_ID=WO2.BASE_ID
      AND WO2.LOT_ID='0'
      AND WO2.SPLIT_ID='0'
      AND WO2.SUB_ID='0'
      AND WO2.TYPE=O.WORKORDER_TYPE
      AND WO2.BASE_ID=O.WORKORDER_BASE_ID
      AND WO2.LOT_ID=O.WORKORDER_LOT_ID
      AND WO2.SPLIT_ID=O.WORKORDER_SPLIT_ID
      AND O.RESOURCE_ID=SR.ID
      AND SR.BUILDING_ID='Machine'
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
    GROUP BY
      WO2.TYPE,
      WO2.BASE_ID,
      WO.PART_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO) ENG
  WHERE
    O.PART_ID=L.PART_ID
    AND O.WORKORDER_SUB_ID=L.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=L.OPERATION_SEQ_NO
    AND O.CLOSE_WEEK=L.CLOSE_WEEK
    AND O.PART_ID=ENG.PART_ID
    AND O.WORKORDER_SUB_ID=ENG.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=ENG.OPERATION_SEQ_NO) L,
  EMPLOYEE E
WHERE
  L.EMPLOYEE_ID=E.ID
ORDER BY
  L.EMPLOYEE_ID,
  L.PART_ID,
  TO_NUMBER(L.WORKORDER_SUB_ID),
  L.OPERATION_SEQ_NO,
  L.CLOSE_WEEK;

I suppose that I could have performed more of the processing on the database server by sliding the above query into an inline view and used the PIVOT function, and/or additional analytic functions to calculate the employee’s weighted efficiency average per week and the simple averages of the wieghted averages – doing so would have eliminated some potentially complex client-side logic that needed to be programmed a single time, but counter point is that every time the report executed, it would require a more of the database server’s resources than were absolutely required.

That sketch of a report on a sheet of paper sure looked simple, but it turned into a three part blog article series.  Interesting, now the person would like to analyze the data by part ID, listing the employees working on the operations for the part.  Do I smell another three part blog article series?  Sorry, no – I just changed the ORDER BY clause so that the PART_ID column was listed first, and made a small change to the client-side control break logic.  Now the person wants to analyze 104 weeks worth of data rather than just four weeks, and only a single part ID or employee.  Too late, those changes were anticipated, and included in the original specification in the client-side programming, implemented with a simple change to the query and pre-planning for a variable number of weeks in the report.





On the Topic of Programming 2

2 09 2012

September 2, 2012

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Over the course of the last six months I developed several efficiency reports for the machining facility where I work, measuring the average time required to complete a customer’s part compared to the engineering standard “target” run time for producing the part.  Sounds simple, right?  If the engineering target is 30 widgets per hour, and only 20 widgets per hour are produced, then the average efficiency is 20/30 = 66.67%.  It does not take a person with a degree in mathematics to arrive at the conclusion that the widgets are not being produced as quickly as expected.  What is the scope of the measurement time frame for the average: a year, a month, a week, a day, one work shift in a day, a single machining center for a day, a single machining center and employee, or something else?

OK, now flip the engineering standard run times a bit so that the times are a little more consistent with the machining facility where I work.  It might take four hours, six hours, 10 hours, 20 hours, or possibly even 200 hours to complete a single operation at a machining center (one of several operations in the manufacturing process for the widget) to produce a single widget.  With the long run times of the operations, calculating the efficiency of a machining center or an employee for a specified period of time was a daunting task… a task that I was asked to solve roughly 12 years ago (with the help of Oracle Database 8.0.5).

The hours per widget run time (rather than widgets per hour) situation presented various problems for calculating employee efficiency, especially when an operation at a machining center did not complete before the end of an employee’s shift.  Consider a situation where an operation is expected to require eight hours to complete, but only when the tooling used at the machining center is new.  As such, the first shift employee installs new tooling in the machining center every morning before starting a new widget.  The first shift employee spends about 30 minutes finishing up the widget that was started the previous night, changes the tooling, and then starts the machining on the next widget.  So, the first shift employee reports that one widget (started the previous night) completed after 0.5 hours and the second widget completed in 7.5 hours.  The first shift employee’s efficiency, considering that two widgets were completed during his shift, is easily calculated as (8 + 8) / (0.5 + 7.5) * 100 = 200%.  The second shift employee’s efficiency is a consistent 0% because the operation for the widget never completes during his shift because the sharpness of the tooling deteriorates through the day (thus causing the machining operation to take longer).  This obviously leads to odd questions: 1) Why is the second shift employee more efficient when the first shift employee is on vacation (equivalent to the question in Oracle performance tuning: why does my report run faster when it rains Monday mornings?)?  Why is the second shift employee more efficient when working a nine or 10 hour shift, rather than an eight hour shift?  The fun questions that one is able to answer when familiar with the data…

When an employee starts working on a machining operation, a labor ticket transaction is created detailing the fact that the work for the machining operation is in-process.  When the machining operation completes (or when it is time to leave for the day), the employee closes the labor ticket transaction and reports the number of widgets that completed.  These labor ticket transactions are conveniently stored in a table named LABOR_TICKET.  Back in 2006 or 2007, I decided to take another attempt at solving the efficiency problem – after all, I now had access to analytic functions in Oracle Database (analytic functions did not exist in Oracle Database 8.0.5).  I expected that by extending the time range beyond a single employee and/or shift, I might be able to extract useful efficiency information from the database.  I put together a rather long SQL statement that looked something like this:

SELECT DISTINCT
  1 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-30) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-30) AND TRUNC(SYSDATE)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
UNION ALL
SELECT DISTINCT
  2 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-60) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-60) AND TRUNC(SYSDATE-31)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
...

The above SQL statement grew in length substantially based on the number of time periods that I selected to compare.  This data was returned to a program for display purposes, so reformatting the output was always a possibility.  Consider a situation where the programmer is unfamiliar with Oracle specific SQL and is asked to generate the same output in a program that he is constructing.  For a program seeking to measure efficiency, the programmer’s solution would likely involve repeated executions of SQL statements to retrieve the bits and pieces of information that need to be presented – this row by row processing will likely be very slow over a high latency WAN (VPN) connection (but certainly faster than never receiving the information, so the programmer will still be a hero), and could very well hinder the performance of the Oracle Database server.

The above solution is good, but problems such as the progressive wearing of the tooling cutter, as mentioned earlier, could still lead to unexpected differences in efficiency of different employees that are working as fast as is possible.  Fast forward a few years.  The right questions are asked from people with a more complete understanding of efficiency measurements – examine the efficiency calculation problem from an entirely different angle.  Not long ago I was handed an example of a simple efficiency report, and asked to reproduce that report with live data, output in a Microsoft Excel spreadsheet.  The example was little more than a simple sketch, so for this blog article I put together a color-coded example of the report format in Microsoft Excel:

A little explanation is required.  The entirely different angle for calculating employee efficiency in the execution of widget operations involves not looking at the date of the labor ticket transaction, or even the number of widgets produced by a particular employee in a particular time frame.  Instead, the date of the manufacturing batch’s (lot in this particular ERP system) completion date, the quantity produced in the batch, and the total hours to execute a single machining operation for the lot become the criteria when comparing against the engineering standards to determine efficiency.  The manufacturing batch’s completion date is used to divide the batches into specific timeframes (in this case weeks: Monday through Sunday).  All employees working on a particular operation, where the batch completion date is in a certain week, will receive the same efficiency rating for that particular operation (EFF_WEEK1, EFF_WEEK2, EFF_WEEK3, etc.) as all other employees working on the same operation with the same batch completion date range.  It is not uncommon for employees to generate labor transactions for multiple operations for production of the same part, as well as operations for different parts that have batch completion dates in the same week, so the employee’s efficiency rating (W_AVG_EFF1, W_AVG_EFF2, W_AVG_EFF3, etc.) weights the individual part efficiencies based on the percentage of machining time the employee spent in a given operation compared to all of the other operations the employee worked.  The employee’s efficiency rating (Avg W_VE_EFF) is the simple average of the employee’s weekly efficiency ratings.

In the above example report, the blue colored text is a static label.  The red colored text is a calculated static label that shows the Monday through Sunday date range for the week.  The black colored text is a simple “rip and read” from the database – no calculation is required.  The purple colored text indicates a calculated value.  The numbers at the right will not appear on the completed report, but are included to indicate which EFF_WEEKn and AVG_WEEKn values will be identical if the employee worked on an operation whose batch closed in the given week (notice that there are blanks in columns of the sample report, indicating that the employee did not work on that operation in the batch completion week).

The programming challenge is to determine the number of SQL statements that would be needed to retrieve the information from the database, and the number of times those SQL statements would need to be executed.  Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?  Before you think about the problem, we should probably investigate the data organization in the various tables.  The picture below shows the tables (and the specific columns) required for the report, showing the tables for the engineering standards in green boxes and the tables needed for the labor transaction analysis in black boxes.  The lines show the association of the data in the different tables.

(The above image was adjusted 7 hours after this blog article was initially posted.  The linking between the green OPERATION table and the green WORK_ORDER table was corrected, and the red colored text was added to indicate whether the table was to be used for retrieving the engineering master standards (WORKORDER_TYPE=’M’) or information from the production work order batches (WORKORDER_TYPE=’W’ or TYPE=’W’).)

It might also be interesting to think about what processing will be performed on the data returned by the SQL statements – that could be a pivotal design decision for the SQL statements.  Curve balls are to be expected – what is currently an examination of four weeks’ worth of efficiency numbers today might be an examination of 52 weeks tomorrow.  Oh, here is a thought, what if instead on the next day it is important to focus on the various part efficiencies and the employees who worked on the operations, rather than focusing on the employees and the various parts that the employees worked on?

Part 3 of this series will attempt to answer some of the above questions.

Late addition, sample of a completed report in Excel:





On the Topic of Programming 1

26 08 2012

August 26, 2012

(Forward to the Next Post in the Series)

As those of you who have read this blog’s About page probably know, my day to day job responsibilities involve a lot of activities that are not specific to Oracle Database performance tuning, or even remotely Oracle DBA type activities.  Those extra acttivites are part of what keeps the job fresh and interesting, whether I am swapping in a new roll of labels into a Zebra label printer (that was a three minute fun task this morning), troubleshooting phone system problems (from a fork lift truck “disabling” a phone, to programming the PBX with a clever “message delivery system” to waste the time of persistent telemarketers), swapping out the power supply in a computer very early in the morning, or engaged in a marathon of application programming into the evening and weekend hours.

One of the recent programming projects involves the continuing effort of decreasing the number of data islands, allowing the data previously contained in the islands to be accessed and cross-referenced with data generated by other applications.  One of the data island generators that I have been working to phase out is Superbase 3.0, a database platform that started life on a Commodore 64 in 1983.  Superbase 3.0 is a 16 bit client-side database, so it does not play well with the 64 bit Windows 7 that ships with most new desktop computers (16 bit applications will not run natively on 64 bit Windows, instead the 16 bit applications must be run inside a virtual machine such as Windows XP Mode, or run remotely using remote access software such as a VNC client).

The data contained in the old Superbase databases is critical to the company’s processes, so that data must be salvaged – meaning that the data must be transformed and imported into an Oracle database.  Unlike what a developer would likely create in a relational database, often with multiple tables used to store one “record” in the database, the long departed designer of the Superbase databases used a single row in a single database table to store one “record” in the database.  That organization reminds me a lot of the Microsoft Works package’s database from the early 1990s, with its fancy data entry forms which allowed users to escape the dull spreadsheet-like data entry screen.  Microsoft Excel from the early/mid 1990s could magically transform a dull spreadsheet data entry screen into a simple data entry form, in the process transforming the expensive Microsoft Excel into essentially a cheap database program.  It is a bit more of a challenge to locate the automatic data entry form creator in Excel 2010 than I recall it being in the early/mid 1990s version of Excel, but I suppose that helps further reduce the number of potential data islands:

So, what does the above discussion of Microsoft Excel have to do with anything related to Oracle Database?  The data contained in the Superbase databases must be transformed and inserted into an Oracle database.  It is good news that Superbase is able to export data to Microsoft Excel format.  The bad news is that the exported format is designed to work with Microsoft Excel 2.0 – a very old version of Microsoft Excel that seems to date back to 1987!  Time for a lot of manual data entry if that data must end up in an Oracle Database 11.2.0.x database… unless…

Microsoft Excel 2003 (version 12.0 if I remember correctly) is able to open Excel 2.0 files… success, even if the success is minor.  Now, how to go about tranferring the data from Excel into Oracle Database?  I suppose that I could have created a macro in Microsoft Excel to insert the data into Oracle Database, but at the time I was not interested in writing a macro that accomplished the task “the right way” using bind variables.  And just look at that data – some of the date values were imported as very small (roughly -65000) numbers, in some cases nearly 20 different spellings for the same customer name, and alpha-numeric text in columns that should be numeric.

So, how did I import the Superbase data that was now in Excel 2003 into the Oracle Database 11.2.0.x database without writing an Excel macro?  The particular computer with Excel 2003 that I was using also had a copy of Access 2003 installed.  Access 2003 is able to create a table “link” to an Excel 2003 spreadsheet’s worksheet, and handle that worksheet essentially the same as if it were a database table.  Now the data is “in” Microsoft Access 2003, but still not in an Oracle database.  Previous experience with this process pays off – before bringing the data into Microsoft Access, type each of the Oracle Database destination table’s column names into the first row of the Microsoft Excel spreadsheet, above the appropriate column’s data.  Importing the data into the Oracle database then becomes a simple four step process (assuming that no other data transformation is necessary)

  1. Link to the Excel spreadsheet’s worksheet and the destination table in the Oracle database.
  2. Create an Access view (stored Query) that selects all of the columns from the Excel worksheet that must be inserted into the Oracle database.
  3. Convert the view (stored Query) type to an Append type and select the linked Oracle Database table as the destination – Access will automatically find the correct destination column in the Oracle table, if the source column name (from the first row in the Excel worksheet) matches the destination column name.
  4. Execute the append type view.

A simple transformation of the data from 1994 database technology to 1987, 2003, and then on to 2011 in Oracle Database – and without writing a single line of code.  Remember that problem that I mentioned about alpha-numeric text in columns that should be numeric, such as “10&20” in a column named OPERATION_NUMBER (or OPERATION_SEQ_NO) – it turns out that that bit of inconsistent data cannot just be thrown out (thanks Microsoft Access 2003).  To fix that problem, I would need to add another column to the Oracle Database table, and then have Microsoft Access update that table using the Microsoft Excel spreadsheet data (fixing the “10&20”, “10 & 20”, “10  &20”, “10 AND 20”, “10,20” and “10, 20” variants into a standard format.  The SQL dialect in Microsoft Access is a bit odd at times, and I could not remember if the odd syntax applies to UPDATE statements also.  As an example of the odd syntax, the simple CREATE TABLEAS SELECT:

CREATE TABLE
  T2 AS
SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Must be written like the following in Microsoft Access:

SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
INTO
  T2
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Since I am not 100% confident in my SQL authoring skills in Microsoft Access, how do I move the data from the Excel spreadsheet into the new column of the Oracle Database table… and without writing a single line of programming code?  I simply created a temporary table (not a true temporary table, because the table data must be visible to more than one session) that contained the primary key column and a second column for the non-numeric numeric data.  Once the data was in the temporary Oracle table (using the simple four step process outlined above), I simply executed an UPDATE statement similar to this:

UPDATE
  T1
SET
  NON_NUMERIC_NUMERIC=(
    SELECT
      NON_NUMERIC_NUMERIC
    FROM
      T1_TEMP TT
    WHERE
      T1.PRIMARY_KEY=TT.PRIMARY_KEY)
WHERE
  T1.PRIMARY_KEY IN (
    SELECT
      PRIMARY_KEY
    FROM
      T1_TEMP);

With the data successfully transferred into an Oracle database table, the programming continues.  That brings me to the next article in this series, the internal conflicts of the “best” way to accomplish the programming task.





Brain Teaser: 10046 Extended SQL Trace Shows a FETCH Call with c=306350000, e=299174653, p=0, cr=22298 – How is that Possible?

26 04 2012

April 26, 2012

Last week I posted a popular article that questioned the effectiveness of showing a job candidate a raw 10046 trace file that was captured at level 12, asking the job candidate to evaluate whether or not the SQL statement in the 10046 trace is efficient.  Many of the commenters suggested that the approach is potentially very effective if the interviewer is knowledgeable about the subject matter, and he recognizes that the question deserves more than a simple yes or no answer.

A year ago I posted a related article, a brain teaser in fact, that showed 0.015600 seconds of CPU consumption in 0.000510 seconds of elapsed time without using parallel execution.  The comments attached to last year’s article correctly identified the source of the seemingly impossible statistics.

A recent OTN thread (don’t click the link until you have thought about the problem for a while) shows an interesting section of a 10046 trace file, a portion of which appears below:

PARSE #2:c=0,e=4764,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1182070262719
BINDS #2:
EXEC #2:c=0,e=255,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1182070263378
WAIT #2: nam='SQL*Net message to client' ela= 5 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182070263514
FETCH #2:c=306350000,e=299174653,p=0,cr=22298,cu=0,mis=0,r=1,dep=0,og=1,tim=1182369438322
WAIT #2: nam='SQL*Net message from client' ela= 1454 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369440799
WAIT #2: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369441012
FETCH #2:c=0,e=293,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,tim=1182369441265
WAIT #2: nam='SQL*Net message from client' ela= 7437 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369448858
WAIT #2: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369449007
FETCH #2:c=0,e=283,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,tim=1182369449257
WAIT #2: nam='SQL*Net message from client' ela= 6701 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369456082
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369456222 

The interesting portion of the above trace file is the first FETCH line where a single row is returned to the client.  The c=306350000 entry indicates that 306.35 seconds of server CPU time was consumed during the fetch call.  The e=299174653 entry indicates that the elapsed time (as if captured by a stop watch) of the single row fetch was 299.174653 seconds.  The p=0 entry indicates that 0 blocks were physically read from disk when preparing to retrieve the one row.  The cr=22298 entry indicates that 22,298 consistent gets were performed when preparing to retrieve the first row.  Typically, performing a large number of consistent gets will result in the consumption of a considerable number of CPU seconds, yet 22,298 consistent gets could not explain the consumption of more than 5 minutes of CPU time, even if the OP was running Oracle Database on an Intel 8088 CPU.

The brain teaser posed (directly or indirectly) by the OP:

  1. Why did the first fetch of a single row require roughly 299 seconds of elapsed time, when the second fetch of 15 rows apparently required 0.000293 seconds?
  2. Why did the first fetch of a single row require roughly 306 seconds of CPU time, when the second fetch of 15 rows apparently required no CPU time?
  3. How is it possible that the CPU consumption exceeds the elapsed time by 7.175347 seconds?
  4. How is it possible that 306.35 seconds of CPU were burned while performing only 22,298 consistent gets?
  5. The trace file indicates that 306.35 seconds of CPU were burned during the first fetch call, does the number 306.35 represent 100% of the CPU consumption during the fetch call, or is the actual CPU consumption higher or lower than that number?
  6. What release version of Oracle Database is the OP running?
  7. What operating system is the OP using for Oracle Database?
  8. What other questions might you ask of a job candidate if the 10046 trace file from the OTN thread were handed to a job candidate during an interview?

Think about the questions for a couple of minutes before visiting the OTN thread.  Several of the above questions are answered in that thread.





Extents of an Oracle Database DBA’s Knowledge Base

18 04 2012

April 18, 2012

I saw an interesting thread on the OTN forums this morning that forced me to stop and think about several items.  The thread contains a question posed during a recent job interview – one of the best questions that I have seen (dare I say, better than one I might have crafted myself).  Consider this: you are given a SQL statement, and a raw 10046 trace captured at level 12.  The question is very simple: Is the SQL statement efficient or not?

What is the point of the interview question?  Barely scratching the surface:

  • It tests how you will approach unfamiliar problems if you encounter the problem once hired – it may not be this exact question, but this may be similar to a real problem that was faced by the company’s DBAs. Do you panic and say that it cannot be solved, do you say that the job belongs to someone else, do you say that there is no point in looking at the raw trace file, or do you dive into the problem and think about what could happen if… (yes, there is something here, but I do not want to spoil the fun for other candidates that may face this exact problem).
  • It determines in part what types of material you have read to further enhance your knowledge of Oracle Database.
  • It determines whether or not you recognize the potential problems that may be associated with specific Oracle Database releases (unpatched and feeling ANSI?)
  • It determines whether or not you recognize differences in predicated cardinality and the actual number of rows returned, and how that may affect the performance outcome. Is it better to filter early or filter late, and does that apply to this situation?
  • Is there a DISTINCT possibility that the ROWNUM function may allow an arbitrary set of rows to be returned, possibly dependent on the value of the OPTIMIZER_FEATURES_ENABLE parameter (see *** item below)?
  • It determines whether or not you follow logic or magic when troubleshooting problems.

Reflecting on the question a bit, did the person in the interview have access to Google, TKPROF, the Oracle Database documentation, books with red covers, books with yellow covers, books with white covers, books with blue covers, etc.

*** A year ago in an OTN thread I made the following comment that applies to at least one of the above bullet point and also the answer to the question posed by the interviewer:

“Other than the “HASH GROUP BY” operation on the second line of the printed execution plan for 11.2.0.2, and the “SORT GROUP BY” operation on the second line of the printed execution plan for 10.1.0.3, the execution plans are identical. Keep in mind that 11.2.0.2, due to adaptive cursor sharing, is capable of changing the execution plan for future executions (there was a single parse call in this case, so adaptive cursor sharing likely did not take place). Also, keep in mind that 11.2.0.2 by default in a 10046 trace will output the row source operation execution plan after the first execution, rather than when the cursor was closed – this explains the difference in the ROWS column in the execution plan. If we look closely at the summary information, 11.2.0.2 performed 113,319 consistent gets in 673 executions, while 10.1.0.3 performed 175,168 consistent gets in 644 executions. Each execution in 11.2.0.2 is in theory more efficient than each execution in 10.1.0.3, yet the average time per execution is much longer.”

In your opinion, what do you think of the interview question?

(Please do not provide an exact answer to the interviewer’s question for at least 14 days, there is no sense in disrupting the interview process.)





Failed Logon Attempts

4 04 2012

April 4, 2012

A fair number of articles on this blog are Oracle performance related in one respect or another.  I started thinking about how to measure how much more efficient something is when compared to something that is never performed.  Consider a situation where you had a database user account that is a member of the DEFAULT Oracle Database profile that is configured as follows:

ALTER PROFILE DEFAULT LIMIT
  PASSWORD_LIFE_TIME UNLIMITED
  FAILED_LOGIN_ATTEMPTS 5
  PASSWORD_LOCK_TIME 1
  PASSWORD_VERIFY_FUNCTION NULL; 

With the above configuration, passwords do not expire after a specified number of days, there are no password complexity requirements, and after five failed logon attempts, the account is automatically locked for one day.  The above configuration is not ideal from a security standpoint, but that is not the point of this blog article.

Assume that the one database user account is shared by multiple people (or multiple utility programs with an embedded username and password).  Suddenly, you find that your efficient utility program becomes inefficient… to the point that the “utility” portion of the program never has an opportunity to execute.  Well, that was unexpected, the database user account is locked out.  How did that happen?

The following SQL statement is probably quite simplistic for many of the readers of this blog, and probably should have been included in my Neat Tricks article that I published a couple of years ago, but I thought that I would include the SQL statement here to save myself a couple of minutes of typing the next time I need to figure out who-done-it:

SELECT
  TO_CHAR(TIMESTAMP,'MM/DD HH24:MI') TIMESTAMP,
  SUBSTR(OS_USERNAME,1,20) OS_USERNAME,
  SUBSTR(USERNAME,1,20) USERNAME,
  SUBSTR(TERMINAL,1,20) TERMINAL,
  ACTION_NAME,
  RETURNCODE
FROM
  SYS.DBA_AUDIT_SESSION
WHERE
  USERNAME LIKE 'MYUSER%'
  AND TIMESTAMP BETWEEN SYSDATE-1 AND SYSDATE
ORDER BY
  TIMESTAMP DESC;

TIMESTAMP   OS_USERNAME          USERNAME TERMINAL        ACTION_NAME          RETURNCODE
----------- -------------------- -------- --------------- -------------------- ----------
04/03 11:33 USER1                MYUSER   CUSER1          LOGON                         0
04/03 11:33 USER1                MYUSER   CUSER1          LOGOFF                        0
04/03 11:33 USER1                MYUSER   CUSER1          LOGOFF                        0
04/03 11:33 USER1                MYUSER   CUSER1          LOGON                         0
04/03 10:54 USER2                MYUSER   CUSER2          LOGOFF                        0
04/03 10:53 USER2                MYUSER   CUSER2          LOGON                         0
04/03 10:53 USER2                MYUSER   CUSER2          LOGON                         0
04/03 10:53 USER2                MYUSER   CUSER2          LOGON                         0
04/03 09:58 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 09:56 USER4                MYUSER   CUSER4          LOGON                         0
04/03 09:56 USER4                MYUSER   CUSER4          LOGOFF                        0
04/03 09:51 USER5                MYUSER   CUSER5          LOGON                         0
04/03 09:51 USER5                MYUSER   CUSER5          LOGON                         0
04/03 09:51 USER3                MYUSER   CUSER3          LOGON                         0
04/03 09:51 USER5                MYUSER   CUSER5          LOGOFF                        0
04/03 09:51 USER5                MYUSER   CUSER5          LOGOFF                        0
04/03 09:44 USER4                MYUSER   CUSER4          LOGOFF                        0
04/03 09:29 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 09:10 USER4                MYUSER   CUSER4          LOGON                         0
04/03 09:09 USER3                MYUSER   CUSER3          LOGON                         0
04/03 09:06 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 09:06 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 08:47 USER3                MYUSER   CUSER3          LOGON                         0
04/03 07:50 USER4                MYUSER   CUSER4          LOGON                         0
04/03 07:50 USER4                MYUSER   CUSER4          LOGOFF                        0
04/03 07:50 USER4                MYUSER   CUSER4          LOGOFF BY CLEANUP             0
04/03 07:50 USER4                MYUSER   CUSER4          LOGOFF BY CLEANUP             0
04/03 07:50 USER4                MYUSER   CUSER4          LOGOFF BY CLEANUP             0
04/03 07:50 USER4                MYUSER   CUSER4          LOGON                         0
04/03 07:50 USER4                MYUSER   CUSER4          LOGON                         0
04/03 07:50 USER4                MYUSER   CUSER4          LOGON                         0
04/03 07:12 USER3                MYUSER   CUSER3          LOGON                         0
04/03 07:12 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 07:11 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 07:11 IUSR_WEBSERV         MYUSER   SERVER          LOGON                         0
04/03 06:35 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 06:35 IUSR_WEBSERV         MYUSER   SERVER          LOGOFF                        0
04/03 06:35 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 06:34 IUSR_WEBSERV         MYUSER   SERVER          LOGON                         0
04/03 06:34 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 06:34 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 06:29 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 06:29 USER3                MYUSER   CUSER3          LOGON                         0
04/03 06:28 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 06:28 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 06:27 IUSR_WEBSERV         MYUSER   SERVER          LOGOFF                        0
04/03 06:27 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 06:26 IUSR_WEBSERV         MYUSER   SERVER          LOGON                         0
04/03 06:26 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 06:26 USER3                MYUSER   CUSER3          LOGON                         0
04/03 06:26 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 06:23 USER3                MYUSER   CUSER3          LOGON                     28000
04/03 06:23 USER3                MYUSER   CUSER3          LOGON                     28000
04/03 06:22 NETWORK SERVICE      MYUSER   SERVER          LOGON                     28000
04/03 06:22 NETWORK SERVICE      MYUSER   SERVER          LOGON                     28000
04/03 02:30 USER5                MYUSER   SERVER2         LOGON                     28000
04/02 19:53 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 19:53 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 19:11 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 19:11 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 18:37 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 18:37 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 18:37 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 18:37 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 18:32 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 18:32 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 18:31 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 18:31 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 17:08 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 17:03 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 17:03 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:55 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:55 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:54 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:52 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:45 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/02 16:45 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/02 16:45 IUSR_WEBSERV         MYUSER   SERVER          LOGOFF                        0
04/02 16:44 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/02 16:44 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/02 16:44 IUSR_WEBSERV         MYUSER   SERVER          LOGON                         0
04/02 16:43 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:42 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:42 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:32 USER6                MYUSER   CUSER6          LOGON                         0
04/02 16:32 USER6                MYUSER   CUSER6          LOGON                         0
04/02 16:32 USER6                MYUSER   CUSER6          LOGOFF                        0
04/02 16:32 USER6                MYUSER   CUSER6          LOGOFF                        0
04/02 16:21 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:20 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:20 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:19 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:15 USER3                MYUSER   CUSER3          LOGON                         0
04/02 15:40 USER3                MYUSER   CUSER3          LOGON                         0
04/02 15:40 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 15:31 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 15:06 USER1                MYUSER   USER1           LOGON                         0
04/02 15:06 USER1                MYUSER   USER1           LOGOFF                        0
04/02 15:06 USER1                MYUSER   USER1           LOGON                         0 

A couple of the rows in the above output are slightly out of sequence, but the order of the output is close enough for my needs.  The RETURNCODE column is the number associated with the ORA- error code that was returned to the client computer:

  • RETURNCODE=0 indicates success
  • RETURNCODE=1017 indicates bad password
  • RETURNCODE=28000 indicates account is locked out

Reviewing the above output, operating system user USER3 logged on successfully at 5:03 PM and logged off at 5:08 PM.  Operating system user USER1 attempted to log on at 6:31 PM from a computer named CUSER1-LT, but failed due to an invalid password.  By 6:37 PM, that same operating system user and computer combination had entered an incorrect password five times, which resulted in an ORA-28000 account lockout error being returned starting with the sixth logon attempt.

At 2:30 AM, USER5 on a computer named SERVER2 attempted to connect using the same database user account, but was also greeted with an ORA-28000.  The same fate awaited operating system user NETWORK SERVICE (this was a web-based logon attempt) and USER3 at 6:22 AM and 6:23 AM, respectively.  The problem was obviously corrected by 6:26 AM, quite likely through the application of the following SQL statement:

ALTER USER MYUSER ACCOUNT UNLOCK;

If you tried executing the above query, you might have noticed that the query execution required a bit of time.  Enterprise Manager also, apparently, executes a somewhat similar SQL statement every 30 minutes.  You can learn more about the potential performance impact of Enterprise Manager’s use of SYS.DBA_AUDIT_SESSION here.

I again started thinking about how to measure how much more efficient something is when compared to something that is never performed… I suppose that it is important to first determine whether or not the task at hand is important before deciding if not doing something is infinitely more efficient or infinitely less efficient than would be the case if the task at hand were completed.





Monitoring Changes to Table Data

22 03 2012

March 22, 2012

Suppose that you receive a request stating that a particular table in one of your databases must be monitored for changes.  One such table might be the one that lists the ERP system’s suppliers and their addresses – you would not want someone to be able to set up a legitimate supplier, and later have the billing remit to name and address changed without being detected.

What approach would you take to solve the above problem?  A question similar to the above arrived recently in an ERP mailing list email – the original poster (OP) is using a SQL Server RDBMS, so that might change the proposed solution just a bit.

Here is the “How simple is too simple?” suggestion that I offered:

Periodically, create a VENDOR_SAVE table (drop it if it already exists, or delete all rows and re-insert from the original source table):

CREATE TABLE VENDOR_SAVE AS
SELECT
  *
FROM
  VENDOR;

Now, just wait a while.

The question then becomes, how do we detect:

  • A new row (record) added to the original source (VENDOR) table.
  • An old row (record) deleted from the original source (VENDOR) table.
  • A change to any column (program field) in the original source (VENDOR) table since the last time the VENDOR_SAVE table was created/refreshed.

Let’s start with the first two bullet points.  I will write the SQL statements so that the statements should work with Oracle Database 9.0.1 and above, and SQL Server with very few changes, even if I feel a bit ANSI about doing so.  I will use the COALESCE function, rather than the NVL function, and CASE syntax rather than the equivalent DECODE syntax.

To identify cases where a row has appeared in, or disappeared from the original source (VENDOR) table, we can simply perform a full outer join between the original source table and the historical mirror image of the original table (VENDOR_SAVE).  We are only interested in cases where the primary key column (ID) is found in exactly one of the two tables:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL;

So, the above SQL statement satisfies the first two bullet points.  The third bullet point is a little more challenging to accomplish… unless of course we employ UNION labor.  If we have two row sources with identical columns, and UNION the row sources together, the resulting row source will be absent of any entirely duplicated rows from the two original row sources (two rows will be reduced to a single row).  If there were no changes to any of the column values (or if the row was added to or deleted from the original source table), there will be a single row for the primary key column value.  If any columns were changed, there will be two rows containing the primary key column value.

Let’s build a SQL statement that UNIONs the rows from the two tables together, and counts the number of rows for each primary key value:

SELECT
  COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
  VL.ID,
  VL.NAME,
  VL.ADDR_1
FROM
  (SELECT
    *
  FROM
    VENDOR
  UNION
  SELECT
    *
  FROM
    VENDOR_SAVE) VL;

To complete the requirement for bullet point 3 above, we need to eliminate all rows from the result set where there is a single row for the primary key value:

SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1;

As a final step, we should join the two resultsets into a single resultset using UNION ALL:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL
UNION ALL
SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1
ORDER BY
  ID;

While somewhat limited in usefulness, the above approach will indicate which rows in the original source table should be examined because the column values in those rows changed (or were added or deleted).

Another, potentially more useful approach involves setting up a logging trigger and logging table.  I previously shared a VBS script that helps to automate and standardize the process of creating the logging trigger and logging table.  A very similar feature is built into my Hyper-Extended Oracle Performance Monitor program – but, much like the VBS script, this solution is useless for the OP who uses a SQL Server backend.

What are the other options?  Oracle Database’s built-in auditing capabilities.  Auditing built into the ERP system (this particular ERP system writes all audit/change records to a single table that uses a VARCHAR2 column to capture the before/after images of the values stored in columns).  Any other options?  (The mess that someone can create with a bit of idle time on their hands…)





Tables and Their Indexes Should be Located on Different Disks – What is Wrong with this Quote?

17 02 2012

February 17, 2012

As I mentioned in a previous blog article, I placed an order for the book “Oracle Database 11gR2 Performance Tuning Cookbook“, mostly because I was curious how the book would compare with the “Oracle Database 11g Performance Tuning Recipes” book (my review of that book was a total of about 35 typewritten pages in length), and some of the other Oracle performance related books that I have reviewed.

After placing my order for the book, I took a quick look at a couple of the book’s preview pages on Amazon, downloaded the sample chapter from the book publisher – chapter 10 “Tuning I/O”, and spent a bit of time reading the sample chapter (making comments in the margins of the printed pages).  I circled the following paragraph (sorry for the long quote – I want to make certain that the context of the quote is not lost), found on page 422 of the book (page 17 in the PDF download):

“We can also distribute tables and related indexes on different disks, to obtain performance gain in both read and write operations. If we have tables and indexes on the same disk, we need to read and write in two different places on the same disk. By dividing the work between two disks, we can perform an index range scan faster than when the index and the table are on the same disk.”

What, if anything, is wrong with the above quote from the book?  For some reason, the acronym MASE came to mind when I read the above paragraph. 

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





SESSIONS Derived Value Formula – Doc Bug Explains

8 02 2012

February 8, 2012

Default parameter values are great, as long as it is possible to predict the default values, and the defaults are appropriate for the environment in which the defaults are present.  It is sometimes a challenge to remember all of the rules, and exceptions to those rules, that determine the defaults. I had a vague recollection of how the value of the SESSIONS parameter value is derived from the value of the PROCESSES parameter value.  An Oracle Database 11.2.0.2 question recently arrived in my email box from an ERP mailing list.  The original poster (OP) showed the following output in the email:

SQL> show parameter sessions

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
sessions                             integer     624

SQL> show parameter processes

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
processes                            integer     400 

At first glance, I thought that the OP made a mistake with the manual specification of the SESSIONS parameter value when taking into consideration the current value of the PROCESSES parameter value.

A Google search found a couple of helpful discussion threads that seemed to provide a formula that showed how the default value for the SESSIONS parameter is derived.  I decided to also perform a Google search of the Oracle documentation (trying to avoid circular references).   The same formula found in the helpful discussion threads is also included in the documentation, so that lends a bit of credibility to the information.  Let’s take a look at the Oracle Database 11.2 documentation for the SESSIONS parameter:

Default Value: Derived: (1.1 * PROCESSES) + 5

“… You should always set this parameter explicitly to a value equivalent to your estimate of the maximum number of concurrent users, plus the number of background processes, plus approximately 10% for recursive sessions.

Oracle uses the default value of this parameter as its minimum. Values between 1 and the default do not trigger errors, but Oracle ignores them and uses the default instead.”

A second page from the documentation library included additional information about the PROCESSES parameter:

“Oracle uses the default value of this parameter as its minimum. Values between 1 and the default do not trigger errors, but Oracle ignores them and uses the default instead…”

If you plan on running 50 user processes, a good estimate would be to set the PROCESSES initialization parameter to 70.”

So, let’s take another look at the PROCESSES value specified by the OP, keeping in mind the information from the documentation.  The OP has a database instance with a PROCESSES value of 400, and with the insight provided by the first documentation quote, that would suggest that the SESSIONS parameter would have been autoset to a value of (1.1 * 400) + 5 = 445.  That formula result coupled with the first paragraph of the second quote might lead someone to believe that the OP manually set the SESSIONS value to 624 – there could be a good reason for doing so.  The second paragraph in the second quote from the documentation suggests that OP’s PROCESSES value would allow about (400 – (70 – 50)) =  380 user processes  (each session will require 1 dedicated process, so that value would permit about 380 user sessions).

In an earlier blog article we saw that the auto-tuned value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter is dependent, in part, on the value of the SESSIONS parameter and the size of the SGA.  In another blog article we found that the maximum number of blocks read by serial direct path read is the greatest power of 2 that is equal to or less than the value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter.  Various other items are also affected by the SESSIONS parameter value, so setting the value to too high of a value has few rewards.

So, did the OP make a mistake when setting the SESSIONS parameter value?  I took a look at a test 11.2.0.2 database instance where I had set the PROCESSES parameter to a value of 150 and left the SESSIONS parameter unset.  The SESSIONS parameter value in the 11.2.0.2 database instance showed a value of 248 when I opened the database and executed:

SHOW PARAMETER SESSIONS

Something a bit stange is happening.  Oracle Database 10.2 seemed to respect the 11.2 documentation’s formula for the auto-calculation of the SESSIONS parameter, but Oracle Database 11.2.0.2 seems to use a different formula.  It was late at night when I posted my reply, so I guessed that the formula was about (1.52 * PROCESSES) + 20 (it was too late at night for thinking in terms of basic algebra, so I just picked a set of numbers that worked with my test PROCESSES and SESSIONS parameter values).

Since responding to the email, I took a sampling of a couple of 11.2.0.2 database instances, and summarized those findings below, including the actual derived SESSIONS value and the calculated value using the formula from the documentation:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
330 520 368
150 248 170
400 624 445
60 112 71
80 144 93

The actual numbers don’t quite match what the documentation suggests.

Time for a little basic algebra.  The formula from the documentation could be represented as:

(x * PROCESSES) + y = SESSIONS

If the structure of the above formula is still correct, just with unknown values for x and y, we will need at least two sets of equations to determine the value of the x and y variables.  To find the x and y variable values, we will use the known PROCESSES and actual derived SESSIONS values from first two rows of the above table:

(x * 330) + y = 520
(x * 150) + y = 248

Solving for y in the first equation:

y = 520 - (x * 330)
y = 520 - 330x

Substituting the above value of y into the second equation to solve for x:

(x * 150) + (520 - 330x) = 248
150x + 520 - 330x = 248
-180x = -272
x = 272/180
x = 1.51111111

Plug the value for x into the above equation for y:

y = 520 - 330x
y = 520 - 330*(1.51111111)
y = 21.3333337

So, based on the values from the first two rows from the above table, in Oracle Database 11.2.0.2:

SESSIONS = (1.51111111 * PROCESSES) + 21.3333337

The above formula works for the first, second, and fourth rows in the above table, but the formula is a bit off for the remaining two rows:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
   Formula 1
330 520 368   520
150 248 170   248
400 624 445   625.77778
60 112 71   112
80 144 93   142.22222

Let’s try solving the formula again using the values from the two rows that did not agree with the new formula for the derived SESSIONS parameter value:

(x * 400) + y = 624
(x * 80) + y = 144

Solving for y in the first equation:

y = 624 - (x * 400)
y = 624 - 400x

Substituting the above value of y into the second equation to solve for x:

(x * 80) + (624 - 400x) = 144
80x - 400x + 624 = 144
-320x = -480
x = 480/320
x = 1.5

Plug the value for x into the above equation for y:

y = 624 - 400 * 1.5
y = 624 - 600
y = 24

The above result leads to the following formula:

SESSIONS = (1.5 * PROCESSES) + 24

Let’s take a look at the values generated by the two formulas side-by-side with the actual derived SESSIONS values:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
  Formula 1 Formula 2
330 520 368   520 519
150 248 170   248 249
400 624 445   625.77778 624
60 112 71   112 114
80 144 93   142.22222 144

Interesting results… Doc Bug?  Why three formulas (plus the guess that I wrote in the email)?  No wonder it is sometimes difficult to remember the default derived values for parameters.

What about this suggestion from the documentation:

“If you plan on running 50 user processes, a good estimate would be to set the PROCESSES initialization parameter to 70.”

That suggestion might also be out of date.  From a generic test 11.2.0.2 Standard Edition database instance running on Windows:

SELECT
  COUNT(*)
FROM
  V$SESSION
WHERE
  USERNAME IS NULL;

COUNT(*)
--------
      27

SELECT
  PROGRAM
FROM
  V$SESSION
WHERE
  USERNAME IS NULL
ORDER BY
  PROGRAM;

PROGRAM
-----------------
ORACLE.EXE (ARC0)
ORACLE.EXE (ARC1)
ORACLE.EXE (ARC2)
ORACLE.EXE (ARC3)
ORACLE.EXE (CJQ0)
ORACLE.EXE (CKPT)
ORACLE.EXE (DBRM)
ORACLE.EXE (DBW0)
ORACLE.EXE (DIA0)
ORACLE.EXE (DIAG)
ORACLE.EXE (GEN0)
ORACLE.EXE (J000)
ORACLE.EXE (J001)
ORACLE.EXE (LGWR)
ORACLE.EXE (MMAN)
ORACLE.EXE (MMNL)
ORACLE.EXE (MMON)
ORACLE.EXE (PMON)
ORACLE.EXE (PSP0)
ORACLE.EXE (Q001)
ORACLE.EXE (Q002)
ORACLE.EXE (QMNC)
ORACLE.EXE (RECO)
ORACLE.EXE (SMCO)
ORACLE.EXE (SMON)
ORACLE.EXE (VKTM)
ORACLE.EXE (W000) 

Interesting results… Doc Bug?  So this generic Standard Edition database has 27 background processes.  Maybe this page in the documentation needs a little work also?  Ah, Doc Bug explains it.





The Transforming Face of the Oracle Support Site

23 12 2011

December 23, 2011

A day or two ago I saw an announcement that the HTML (non-Flash) version of the Oracle Support site was to be phased out in January 2012.  It seems like the last time I tried to use that site, the search functionality did not quite work for Oracle Database products; I gave up on the HTML version of the site and went back to the Flash-based version of the site. 

The performance of the Flash-based version of Metalink (I usually navigate to the site by entering metalink.oracle.com into the web browser’s address bar) My Oracle Support has improved significantly since the initial launch, and it appears that once the website is loaded, it is a bit more stable now.  I do not own any i products (iPhone, iPad, iToy) which do not support Flash, so the Flash-based version of the Oracle support site works OK on most of my devices (Windows 7 64 bit with IE 9, Windows 7 32 bit with IE 8, Windows XP 32 bit, Motorola Xoom, Amazon BlackBerry Playbook, etc.) as long as Flash is supported in the web browser.  Now the shocker, the Flash-based version of the support site is also planned to be phased out, based on Metalink (MOS) Doc ID 1385682.1.

Turning the other cheek, or doing an about-face?  The support document mentions that ADF Faces will be used – the site will be built using Oracle Application Development Framework.  I hate to say that this is a case of Oracle eating its own dog food – maybe there is a better way to state that the support site will be using their own technology rather than the technology of a third party?  I wonder if the Oracle OTN forums are also being redesigned to use Oracle technology, rather than that of a third party?  On a related note, it appears that the OTN forum site has fixed the Internal Server Error problem.

Anyone found any additional information about the change to the support site?  Any experience with ADF Faces?