Hyper-Extended Oracle Performance Monitor 7.0 Beta

16 08 2015

August 16, 2015

Three and a half years ago I offered the Hyper-Extended Oracle Performance 6.0 Beta for download – that Beta version expired a year later.  This program has been somewhat of a pet project for the last 13 years (almost 10 years at the time of the previous Beta offering), so I was excited to read some of the feedback about the 6.0 Beta.  The minimal feedback had me wondering for a while, then the pet project was pushed to a dark corner for roughly three years.  I taught the pet a few new tricks on Windows 10, so I thought that I make the latest version available.  Unlike the previous Beta, Beta 7.0 does not have an expiration date.

NOTE August 17, 2015: The first two download links are working now – Wordpress was blocking the downloads.  Note that the first two downloads are actually compressed .zip files with a .doc extension – inside those .zip files are the files that are of interest.  Right-click the files, save the files to your computer, then rename the files to end with .zip.

  1. Program download (right-click Save As, then change the .doc extension to .zip): Hyper-ExtendedOraclePerformanceMonitor7.zip
  2. Documentation download (right-click Save As, then change the .doc extension to .zip – still incomplete, see old docs): Hyper-Extended Oracle Performance Monitor7.doc
  3. Old Documentation download: Hyper-Extended Oracle Performance Monitor3.doc
  4. Update August 20, 2015: If you have an old computer, you may need to put the MSCOMCTL.OCX file in your C:\Windows\SysWOW64 folder (on 64 bit Windows, or C:\Windows\System32 on 32 bit Windows).  You will then need to register (process) the file with REGSVR32 – see the command in the instructions below.  A recent version of MSCOMCTL.OCX may be downloaded here: MSCOMCTL.OCX (save the file, then rename the file as MSCOMCTL.OCX).

Requirements:

Windows 2000 through Windows 10, 32 bit Oracle Client, MDAC/ADO 2.8 (or greater – preinstalled starting with Windows XP), connection to the Oracle database by database SID (tnsnames.ora) using Oracle’s Oracle Provider for OLE DB (Oracle’s Oracle ODBC Driver is required for connections that are initiated by SYS – primarily for access to the Advanced Init Parameter functionality), Excel 2000-2013, Microsoft Grid control (provided in the download – put into the program’s folder), TIMED_STATISTICS set to TRUE. Most features require DBA permissions (SYSADM or SYS), or specific grants to views or packages. For example, sessions that use the DBMS Xplan and Trace functionality must have permission to execute ALTER SESSION, have execute permission on the DBMS_XPLAN package, and select permissions on V$SQL, V$SQL_PLAN, and V$SQL_PLAN_STATISTICS.  Configuring session tracing requires execute permission on the DBMS_SYSTEM package and/or the DBMS_MONITOR package.

Description:

The Hyper-Extended Oracle Performance Monitor provides a variety of functions to simplify working with and performance monitoring Oracle databases. Various types of simple data dictionary and database instance related reports may be generated in Excel by simply putting a check in a box on the program’s main window, and clicking the Report button. Generating DBMS Xplan output (using the raw information from a 10046 level 4/12 trace) with optional 10046/10053/other traces, creating data change logging triggers using the data dictionary, peeking at the hidden and non-hidden initialization parameters, and access to a quick Oracle keyword search are also a click away on the program’s main window. Performance monitoring and tracing tools for checking a cross-reference of the time model and wait event statistics (at the system and session levels), checking a cross-reference of the system statistics and wait event statistics (with drill-down to the session level), reviewing potentially high-load SQL statements and their execution plans, cross-referencing the current session waits with enqueues, and enabling various types of Oracle traces are also a click away on the program’s main window. Permanently recording of performance monitoring statistics in a Microsoft Access compatible database (Microsoft Access is not required) is also a click away on the program’s main window.

General Command Line Parameters:

  • -D   The Database instance SID to which the program should connect.
  • -U   The user name to be used for connecting to the database instance.
  • -P   The password to be used for connecting to the database instance.

Logging-Specific Command Line Parameters:

  • -LC 20   Specifies Force a Log Capture when CPU Usage Exceeds value to 20%
  • -LI 30   Specifies Force a Log Capture if No Log Captured in Minutes value to 30 minutes
  • -LB   Specifies the Force a Log Capture when a Blocking Lock is Detected value to checked
  • -LW  Specifies the Force a Log Capture when a Wait Reason is Detected value to checked
  • -LR  Specifies the Capture SQL Execution Statistics for Wait Reasons value to checked
  • -LD   Specifies the Capture Segment Change Statistics value to checked
  • -LO   Specifies the Capture Operating System and Time Model Statistics value to checked
  • -LH   Specifies the Capture High Load SQL Statement Statistics value to checked
  • -LT   Specifies the Capture High Load SQL Statement Text value to checked
  • -LP   Specifies the Capture High Load SQL Statement Plan value to checked
  • -LHC 60   Species the minimum CPU time that is considered high load to 60 seconds accum.
  • -LHE 90   Species the minimum elapsed time that is considered high load to 90 seconds accum.
  • -LS   Specifies that Smart Logging should begin as soon as the login completes
  • -LE 240   Specifies that Smart Logging should end after 240 minutes
  • -LQ   Specifies that the program should quit (end) when logging ends

Using the above command line parameters, you could create a program shortcut for each of the databases that you administer.  For example, create a shortcut, and set its Target to (assuming that the program is on the root of the F: drive, the Oracle SID is OR1122P, the username is myuser, the password for that account is pword, and the smart logging should default to logging when 10% of CPU utilization is observed):

"F:\Hyper-Extended Oracle Performance Monitor.exe" -D OR1122P -U myname -p pword -LC 10

What’s New:

The Hyper-Extended Oracle Performance Monitor runs on Windows 10, even though the touchpad on the Sony laptop sometimes stops working until the computer is put to sleep and then awoken.  The program should also work with Oracle Database 12c (and might still be compatible with Oracle Database 8i, although the Time Model Viewer will not work).

Hyper7HyperExtendedMainScreen

The Time Model Viewer main window is still essentially the same, showing the hierarchy of the time model statistics, with color-coded session-level time model statistics (color ranges from yellow to red to indicate the percentage of the total represented by the session).  The bottom of the window shows the system-wide wait events, and the sessions that contributed to those system level wait events:

Hyper7TimeModelViewer

When the Time Model Viewer is active, an Excel spreadsheet is created that shows the graphic history of the statistics for the 20 most recent time capture periods – if you find this view helpful, you might want to widen the DB Time and CPU chart:

Hyper7TimeModelViewerExcelCharts

The statistics also appear in numerical form on the Statistics tab in the Excel spreadsheet:

Hyper7TimeModelViewerExcelStatistics

The Real-Time Monitor is also essentially the same as in previous Beta versions:

Hyper7RealTimeMonitor

Like the Time Model Viewer, an Excel spreadsheet is created showing the graphical history of the statistics.  Which statistics?  Click a wait event or one of the white statistics boxes to add that statistic to the list for which charts are created:

Hyper7RealTimeMonitorExcelCharts

Previous Beta versions of the program wrote out a tab delimited file containing the statistic deltas after every 30 time capture periods (this version also creates such a file).  The Beta for version 7 writes those statistics to Excel after each time capture period to facilitate the chart creation:

Hyper7RealTimeMonitorExcelStatistics

Oracle’s OLEDB provider is used for connectivity, see this article for a description of how to fix the bug in the Oracle Database 11.2.0.3 Client installer that is related to the OLEDB provider (the problem may have been corrected in later releases).  I have not yet found a method for SYS to login using Oracle’s OLEDB provider when “AS SYSDBA” must be specified for the login to complete.  As such, Oracle’s ODBC driver is used when SYS tries to login and the login attempt using the OLEDB provider fails.  Oracle’s ODBC driver is also used when viewing the normally hidden parameters using the SYS login.  Because there could be multiple Oracle homes, you must create a 32 bit ODBC System DSN using the Oracle ODBC driver – name that DSN HYPEREXTEND:

Hyper7Login

Hyper7ODBC1

Hyper7ODBC2

Hyper7ODBC3

The ODBC connection driver information is used to login as the SYS user (AS SYSDBA) to view the Advanced Initialization Parameters:

Hyper7AdvancedInitParameters

Hyper-Extended Oracle Performance Monitor 7.0 Beta is also more compliant with restrictions placed on the computer by User Account Control (User Access Control), which prohibits programs from create folders in the root of the C:\ drive and writing files into that folder.  Version 7 instead creates the OracleLog folder in the current user’s profile (typically C:\Users\username\OracleLog\), and separates the files created by database.  The picture below shows a couple of Microsoft Access compatible logging database that were created, as well as some of the tab-delimited statistic files, and various other generated files:

Hyper7LoggingFolder

As mentioned above, the documentation is still incomplete, with several pages of the documentation containing nothing but pictures.  It takes a lot of effort and time  to put the documentation together, so I will try to piece it together over the next couple of months.  You might be able to refer to the older documentation if the pictures are not self-explanatory.  If you find the program useful, please leave a note.  If you need help understanding a program window or the generated output, leave a comment attached to this blog article and I will try to help.





Install the Oracle Client on a Synology DiskStation DS1813+ or DS412+ for Nagios to Monitor Databases

26 07 2013

July 26, 2013

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

In the previous article of this series I provided steps to install and run the Nagios network monitoring utility on either a Synology DiskStation DS1813+ or a DS412+ for the purpose of pinging network devices to verify that the devices respond to ping requests.  That article may have seemed a bit out of place on a blog that is primarily intended to include notes about using Oracle Database.  In the previous article I hinted that there is a check_oracle plugin for Nagios, suggesting that a Synology DiskStation DS1813+ or DS412+ could be used to monitor Oracle databases.

This blog article does NOT use the check_oracle plugin.  There are a small handful of official Nagios plugins for Oracle Database, but this article does not use any of those plugins either.  Instead, I took a chance at crafting a custom Perl script to monitor Oracle databases, the first time I have ever used Perl.  I thought back to some of the earlier blog articles where I introduced a couple of Oracle Database monitoring scripts that were written in VBScript, including Working with Oracle’s Time Model Data 3 and Oracle Statistics Chart Viewer – I started seeing the possibilities of using Nagios to alert DBAs about potential issues.  Yet, I have never used Perl, and at the time did not even have a clue how to concatenate strings in the language (it turns out that there are more than two techniques).

Verify that Perl is installed on the DiskStation by accessing the Package Center in the DiskStation’s DSM web interface.  If there is an Install button under the Perl heading, that indicates that Perl is not yet installed – click the Install button and wait until that button changes to Installed.

InstallOracleClient-1

Let’s start by installing the Oracle Client on the DiskStation.  I will use the Oracle Instant Client version 11.2.0.3 (the 12.1.0.1 Instant Client throws a different error message than did the 11.2.0.3 and 11.1.0.7 clients, so I reverted back to the 11.2.0.3 version).  To download the Oracle Instant Client for Linux, visit this link.  You will need an OTN account to download the client files – I downloaded the files using a desktop computer, and then transferred the files to a custom created Config share that I set up on the DiskStation.  This article requires the Basic Instant Client package and the SQL*Plus add-on package for the Instant Client.

Connect to the DiskStation using Telnet as the root user (see the previous article for Telnet connection directions).  For consistency, the .zip files for the Oracle Instant Client will be copied to the downloads directory that was created in the previous article.  We will create an oracle directory in the downloads directory:

mkdir /volume1/downloads/oracle/

Next, the downloaded files for the Oracle Instant Client are copied from the config share that I created on the DiskStation (to allow transporting files from a desktop computer) to the /volume1/downloads/oracle directory that was just created, and then the Oracle Instant Client files are unzipped, which automatically creates the instantclient_11_2 directory:

cp /volume1/config/instantclient-basic-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cp /volume1/config/instantclient-sqlplus-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cd /volume1/downloads/oracle/
unzip -o instantclient-basic-linux-11.2.0.3.0.zip
unzip -o instantclient-sqlplus-linux-11.2.0.3.0.zip
cd instantclient_11_2

The Oracle Instant Client 11.2.0.3 (and 11.1.0.7) require a file named libaio.so.1 that is not installed by default on the DiskStation.  After searching the Internet for a while, I found a copy of that file in the /lib directory on a Red Hat Enterprise Linux 3 server.  Using a desktop computer, I copied that file to the DiskStation’s config share, as well as a copy of the tnsnames.ora file that includes the database connection information for the databases that must be monitored.  The libaio.so.1 file will be placed in the /volume1/downloads/oracle/instantclient_11_2 directory, and the tnsnames.ora file will be placed in the /etc directory (one of the directories that is automatically searched).

cp /volume1/config/libaio.so.1 .
cp /volume1/config/tnsnames.ora /etc

Let’s start up SQL*Plus:

./sqlplus /nolog
./sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory

An error…

ls
BASIC_README    adrci   glogin.sql   libclntsh.so.11.1  libocci.so.11.1  libocijdbc11.so  libsqlplusic.so  ojdbc6.jar  sqltest.sql  xstreams.jar
SQLPLUS_README  genezi  libaio.so.1  libnnz11.so        libociei.so      libsqlplus.so    ojdbc5.jar       sqlplus     uidrvci

The file mentioned in the error message exists in the correct directory.  That error is caused by an undefined environment variable.  Fixing the error and trying again:

LD_LIBRARY_PATH="/volume1/downloads/oracle/instantclient_11_2"
export LD_LIBRARY_PATH
./sqlplus /nolog

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 26 13:22:16 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL>

Try to connect to one of the databases that are defined in the tnsnames.ora file, and then try executing a simple SQL statement before exiting SQL*Plus (change testuser, password, and DBName as appropriate for your environment):

CONNECT testuser/password@DBName
Connected.
SELECT SYSDATE FROM DUAL;

SYSDATE
---------
26-JUL-13

EXIT

Now that we have verified that the Instant Client version of SQL*Plus works from the DiskStation, we will create a simple SQL script to verify that the Instant Client version of SQL*Plus is able to accept script names from a command line.  The script will be created in the same directory where the Oracle Instant Client is located (see the basic directions for using vi, as found in the previous article):

vi /volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The first line in the script instructs SQL*Plus to abort the execution of the script when an error is encountered; if the connection attempt fails, there is no point in attempting to execute any SQL statements that follow.  The second line connects to the database (change testuser, password, and DBName as appropriate for your environment).  The third line executes a simple SQL statement, selecting from a table (rather than the virtual table DUAL) – change the SQL statement to a valid statement for your database.  The last statement exits SQL*Plus once the SQL statement finishes executing:

WHENEVER SQLERROR EXIT SQL.SQLCODE
CONNECT testuser/password@DBName
SELECT ID, DESCRIPTION FROM T1 WHERE ID LIKE '8X%';
EXIT

Try using SQL*Plus to execute the script:

./sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The results from the SQL statement should display in the Telnet window, and then the normal DiskStation prompt should appear.  If the database instance was not running at the time of the execution, you would see something like this rather than the results of the SQL statement:

ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Process ID: 0
Session ID: 0 Serial number: 0

Nagios apparently expects its plugins to return one of four exit codes, as explained in this article, to determine whether or not the checked object is behaving as expected:

Exit Code Status
        0 OK
        1 WARNING
        2 CRITICAL
        3 UNKNOWN

The exit function in Perl permits returning a number, so that is one problem solved.  I want the Perl plugin to accept a command line parameter from Nagios so that I am able to check more than one database using the same Perl script – the Nagios command line parameter will specify the script to execute.  This article explains how to receive a command line parameter from Nagios, so that is a second problem that is solved.

Let’s put together a simple (OK, not simple for me) script that executes a SQL*Plus script that is specified by Nagios (the sqltest.sql file that was just created), and then returns 0 if no ORA- type error message is returned when the Oracle Instant Client’s SQL*Plus executes a script.  If an ORA- type error message is returned, then the script’s exit code is set to 2 and the ORA- errors are returned to Nagios.  Nagios apparently only accepts one line of output from the script, reading what is written by Perl’s print command.  As we saw earlier, the script may return multiple nested ORA- type errors if the database instance is down, so we need these multiple error messages to appear on one line of output from the Perl script.

vi /opt/libexec/check_oracle.pl

This is the script that I crafted after spending, quite literally, hours days searching the Internet for Perl command syntax that would execute on the DiskStation:

#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(:config no_ignore_case);
my $output = "";
my ($script);

my $result = GetOptions(
 "s|script=s"         => \$script,
  );

$ENV{"LD_LIBRARY_PATH"} = "/volume1/downloads/oracle/instantclient_11_2/";

my $ExitValue = 0;
print "Test Script: " . $script . ":  ";
open my $in, "/volume1/downloads/oracle/instantclient_11_2/sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/$script |";

while (my $line = <$in>)
  {
  if ($line =~ /^(ORA-\d{5})/)
    {
    #$line =~ s/\v//g;
    chomp($line);
    $output = $output . $line . " ; ";
    $ExitValue = 2;
    }
  }

close($in);
print $output;
exit($ExitValue);

The script defines the LD_LIBRARY_PATH environment variable, so the Nagios user should (hopefully) have no problems executing the Perl script.  There are apparently a half-dozen different ways to remove the end of line characters from the SQL*Plus output, chomp($line); worked for my test, but $line =~ s/\v//g; did not work quite as expected.

Save the script and exit vi.  Next, we need to make the Perl script executable:

chmod +x /opt/libexec/check_oracle.pl

In the earlier article we had to modify a couple of the Nagios configuration files – this time we will need to modify some of the same files, starting with the commands.cfg file.

vi /opt/etc/objects/commands.cfg

Locate the ‘check_ping’ command definition that we modified in the earlier article.  Below the } character for that command definition, add the following, which will tell Nagios that our Perl script exists, and that the script command should be recognized by Nagios as check_oracle_ch:

# check_oracle_ch command definition by Charles Hooper
define command{
        command_name    check_oracle_ch
        command_line    $USER1$/check_oracle.pl -s $ARG1$
        }

Save the commands.cfg file and exit vi.  Let’s create another Nagios configuration file that will be used to list the Oracle Databases (and their scripts) that will be checked:

vi /opt/etc/objects/oracle.cfg

On a new line in the oracle.cfg file, add the following:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server123       ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST   ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest.sql    ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Additionally, we need to instruct Nagios that there is now an additional configuration file to read when starting.  Save the oracle.cfg file and exit vi.  Execute the following command:

vi /opt/etc/nagios.cfg

Below the cfg_file=/opt/etc/objects/switch.cfg line in the file (added in the previous article), add the following line:

cfg_file=/opt/etc/objects/oracle.cfg

Save the nagios.cfg file and exit vi.  Since we have modified the Nagios configuration, we must restart Nagios.  Find the first Nagios process ID, then kill that process (as was described in the previous article):

ps

kill 2683

Verify that Nagios does not return an error message when reading the configuration files:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are reported, then start Nagios:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Nagios should be writing log entries to a file name nagios.log (as defined in the nagios.cfg file).  The tail command permits viewing the last few (100 in this case) lines from a specified file:

tail -n 100 /opt/var/nagios.log

If Perl is not installed, Nagios may send an email with the following in the Additional Info section:

(Return code of 127 is out of bounds - plugin may be missing)

If the database instance is down, Nagios should send an email, possibly with the following in the Additional Info section (note that the script name is listed first – a different script will be used to test each database):

Test Script: sqltest.sql:  ORA-01034: ORACLE not available : ORA-27101: shared memory realm does not exist :

When the database instance is running, and the SQL statement(s) in the script executes successfully for the first time, Nagios should send an email with just the test script listed in the Additional Info section:

Test Script: sqltest.sql:

Now that the we have confirmed that Nagios and the Perl script work OK together, create additional SQL scripts in the /volume1/downloads/oracle/instantclient_11_2/ directory to connect to the other databases to be monitored (strongly consider changing the permissions for these files, since the files include usernames and passwords for your databases).  For each of those SQL scripts, create an additional service entry in the oracle.cfg file.  For example, if a script named sqltest42.sql is created to test a database on server42, the service entry might be created as shown below:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server42        ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST42 ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest42.sql  ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Don’t forget to verify the nagios configuration before restarting Nagios.

Anyone up to the challenge of rewriting my Working with Oracle’s Time Model Data 3 script in Perl so that the script will work on the DiskStation?





Install Nagios on a Synology DiskStation DS1813+ or DS412+

19 07 2013

July 19, 2013 (Modified July 27, 2013, July 28, 2013, November 19, 2013, August 12, 2015)

(Forward to the Next Post in the Series)

—-

Update July 27, 2013:

  • I now have Nagios running on an ARM based Synology DiskStation DS212+.  Most of the steps are the same as outlined below, however there are a few additional errors that must be addressed (see below additional steps).
  • All of the ./configure commands should have included –prefix=/opt (rather than –prefix=/usr/local or completely omitting that parameter).  That change eliminates the need to copy the Nagios plugins to the correct location.  Possibly related, the -i parameter was unnecessary for the snmp and Nagios plugins make and make install commands when the ./configure command included the –prefix=/opt prefix.
  • The wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20×86%20Linux%203.2.11&#8230; download step for the gcc compiler is apparently unnecessary, at least on the Synology DiskStation DS212+ (see below).

—-

This article describes how to compile and run Nagios on a Synology DiskStation DS1813+ (64 bit) or Synology DiskStation DS412+ (32 bit, the 32 bit steps should also apply to the DS1812+) NAS, both of which utilize Intel Atom processors (cat /proc/cpuinfo indicates that the DS412+ is using a 2.13GHz Atom D2700, while the DS1813+ is using a 2.13GHz Atom D2701), and utilize the DSM 4.2 operating system.  Not all Synology DiskStation NAS devices use Intel based CPUs – some of the less expensive DiskStations use ARM type processors (see this link to determine the type of CPU installed in a specific DiskStation).  It may be possible to produce a working version of Nagios on NAS devices that do not have Intel 32 bit or 64 bit processors, but I have not yet fully tested the procedure.

Warning: A lot of what follows is based on experimentation, with the end goal of having Nagios running on a Synology DiskStation having the ability to ping devices on the network or the Internet, with an email sent to an administrator when a device stops responding to ping requests, and to send a second email when the device resumes responding to ping requests.  This functionality represents a small fraction of Nagios’ capabilities through the use of plugins.  File paths vary from one Linux distribution to the next, so that adds a bit of challenge to make certain that the files are placed in the required directory.  Copying a file to the wrong directory may temporarily disable the DiskStation and require the reinstallation of the Synology DSM operating system.  The directions below are not final, and quite likely do not represent the most efficient approaches to accomplish the end goal – but the directions will hopefully be “close enough to correct” to allow the average reader of this blog to ping and send email alerts from a DiskStation.

I have relied on the free Nagios network monitoring solution since 2002 to provide an early warning of problems associated with network attached equipment including servers, production floor computers, switches, printers, wireless access points, IP cameras, Internet connection stability, etc.  While I rely on Nagios’ alerting system, I am not an expert at configuring the Nagios network monitoring system; the Nagios configuration documentation may be downloaded here.

First, make certain that the Telnet Service (or SSH Service if that is preferred) is enabled on the DiskStation.  In the DiskStation’s Control Panel, click Terminal.

InstallNagiosDiskStation1

Place a checkmark next to Enable Telnet service (if the item is not already checked), and then click the Apply button.

InstallNagiosDiskStation2

Verify that the computer that you intend to use has a Telnet client.  For Windows 7, access the Programs link in the Control Panel, and then click the Turn Windows features on or off link.  Make certain that there is a checkmark next to Telnet Client, then click the OK button.

InstallNagiosDiskStation3

Open a command line (in Windows, Start – Run – type  cmd  and press the Enter key).  On the command line, type telnet followed by either the name of the DiskStation or the IP address of the DiskStation, then press the Enter key.  When prompted for a username, type root and press the Enter key.  Type the admin user’s password (that is used to access the DSM interface in a web browser) and press the Enter key.

InstallNagiosDiskStation4

The command line on the DiskStation is very similar to the command line on a Unix or Linux computer, and is somewhat similar to a Windows command line or MS-DOS command line (use / rather than \, use ls rather than dir, use vi rather than edit):

InstallNagiosDiskStation5

We first need to add ipkg support to the DiskStation, detailed directions may be viewed at this link.  The exact directions may be different for other DiskStation models, but the following directions work for both the DS1813+ and DS412+ (note that all files downloaded from the Internet will be placed on volume1 in the downloads directory – copy and paste the lines to the Telnet session, one line at a time):

cd /volume1
mkdir downloads
cd downloads
wget http://ipkg.nslu2-linux.org/feeds/optware/syno-i686/cross/unstable/syno-i686-bootstrap_1.2-7_i686.xsh
chmod +x syno-i686-bootstrap_1.2-7_i686.xsh
sh syno-i686-bootstrap_1.2-7_i686.xsh

The vi editor is used on the DiskStation to modify files; that vi editor is a bit challenging to use at first sight, so you may need help with a couple of basic commands (see this quick reference for other commands).  The commands in vi are case sensitive (i is not the same as I).  When a file is opened, press the i key on the keyboard to allow making changes to the file (such as typing commands, or deleting commands).  When finished making changes to the file press the Esc key.  Once the Esc key is pressed, type ZZ to save the changed file and quit, or :q! to quit without saving the changes.

Next, we must modify the file that establishes the environment for the root user, when that user connects to the DiskStation.  This change is needed as part of the ipkg installation.  Edit the .profile file used by the root user:

vi /root/.profile

Add a # character in front of the two lines that contain the word PATH, then save the file (see the brief directions above to switch between command and insert mode in vi):

InstallNagiosDiskStation6

Next, reboot the DiskStation by clicking the Restart button in the Synology DSM interface (note: it should be possible to type reboot in the Telnet interface, however the DiskStation locked up the one time I attempted to execute that command).

InstallNagiosDiskStation7

Once the DiskStation reboots, reconnect to the DiskStation using Telnet, connecting as the root user, just as was done earlier.

The ipkg command should now work on the command line.  First, request that an updated list of available packages is downloaded, then display that list of packages:

ipkg update
ipkg list

Next, download a couple of packages that will be used by the Nagios network monitoring tool.  Note that using ipkg to install packages is a lot easier than compiling source code, so have fun with the ipkg utility.  When installing the optware-devel package, an error may appear stating that there is an incompatibility between wget and wget-ssl – just ignore that error for now.

ipkg update wget-ssl
ipkg install optware-devel
ipkg install gcc
ipkg install libtool
ipkg install mysql

Next, we need to compile a file and copy a couple of files:

cd /opt/share/libtool/libltdl/
./configure --prefix=/opt
make all
make install

cp /usr/syno/apache/modules/mod_ext_filter.so /opt/libexec/mod_ext_filter.so
cp /usr/syno/apache/modules/*.* /opt/libexec/

Now, install the Apache package:

ipkg install apache

If an error message is displayed on screen about mod_ext_filter.so, then modify the /opt/etc/apache2/httpd.conf file and add a # in front of the line LoadModule ext_filter_module libexec/mod_ext_filter.so and save the file.  Re-execute the ipkg install apache command (note that the up arrow on the keyboard may be pressed to quickly retype one of the previously executed commands).

InstallNagiosDiskStation8

Using the DiskStation’s Control Panel, create a nagios group and a nagcmd group (the nagcmd group probably will not be used for anything specific).  These groups do not require any special DiskStation permissions.

InstallNagiosDiskStation9

Using the DiskStation’s Control Panel, create a nagios user and add that user to the nagios and nagcmd groups.  The nagios user does not require any specific DiskStation permissions.

Next, switch back to the Telnet session, download the Nagios source code, and compile the source code:

DiskStation DS212+ Notes:

The following ./configure call was used on the DS212+:

./configure --prefix=/opt --with-command-group=nagios --disable-nanosleep --enable-nanosleep=no

The ./configure aborted with the following error message:

checking for pthread_create in -lpthread... no
checking for pthread_mutex_init in -lpthread... no
checking for pthread_create in -lpthreads... no
checking for pthread_create in -llthread... no
checking if we need -pthread for threads... no
checking for library containing nanosleep... no
Error: nanosleep() needed for timing operations.

The test that threw the error is located roughly 63% of the way through the configure file (on roughly line 5635).  If the exit 1 line in the configure file is commented out, then the configure step will complete.  However, the make all command will then fail with the following error messages:

/volume1/downloads/nagios/base/nebmods.c:363: undefined reference to `dlclose'
nebmods.o: In function `neb_load_module':
/volume1/downloads/nagios/base/nebmods.c:218: undefined reference to `dlopen'
/volume1/downloads/nagios/base/nebmods.c:249: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:266: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:299: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:225: undefined reference to `dlerror'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v2@GLIBC_PRIVATE'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v2@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v1@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v1@GLIBC_PRIVATE'
collect2: ld returned 1 exit status
make[1]: *** [nagios] Error 1
make[1]: Leaving directory `/volume1/downloads/nagios/base'
make: *** [all] Error 2

After a bit of searching on the Internet, I found a page that suggested making the following changes (note that I unsuccessfully tried a couple of other steps that may have also partially corrected the issue):

mkdir /opt/arm-none-linux-gnueabi/lib_disabled
mv /opt/arm-none-linux-gnueabi/lib/libpthread* /opt/arm-none-linux-gnueabi/lib_disabled

cp /lib/libpthread.so.0 /opt/arm-none-linux-gnueabi/lib/
cd /opt/arm-none-linux-gnueabi/lib/
ln -s libpthread.so.0 libpthread.so
ln -s libpthread.so.0 libpthread-2.5.so

After making the above changes, I was able to run the configure and make all commands without receiving an error.

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz
tar xzf nagios-3.5.0.tar.gz
cd nagios
./configure --prefix=/opt --with-command-group=nagios
make all
make install
make install-init
make install-config
make install-commandmode

We apparently need to copy a couple of files to different locations at this point:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Undo the changes that were earlier made to the /root/.profile file, where # characters were added in front of any line that contained the word PATH.  Remove those # characters and save the file:

vi /root/.profile

(This part still needs some fine tuning to make the web interface work with Nagios.)  Edit the Nagios Makefile and change the line beginning with HTTPD_CONF to show HTTPD_CONF=/opt/etc/apache2/conf.d  Then save the file.

cd /volume1/downloads/nagios
vi Makefile

InstallNagiosDiskStation10

Execute the following command:

make install-webconf

Create a nagiosadmin user for the web administration, specify a password when prompted:

htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Update November 19, 2013:

GabrielM reported in a comment below that it may be necessary to specify the full path to the htpasswd program:

/usr/syno/apache/bin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Install a couple of additional ipkg packages that will be used by Nagios (the last package adds a ping utility that may be used by Nagios – the security permissions on the DiskStation prevent non-root users from using the built-in ping utility):

ipkg install openssl
ipkg install openssl-dev
ipkg install sendmail
ipkg install inetutils

A step that may or may not be required is to download a functioning C++ compiler (some of the commands below point to files provided with the C++ compiler) – it appears that there should already be a compiler on the DiskStation at this point (in /opt/bin), so the successful completion of this task of downloading a usable C++ compiler might not be required.

DiskStation DS212+ Notes:

These wget and tar steps were completely skipped on the DS212+

For the DiskStation DS1813+ 64 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Cedarview%29/gcc420_glibc236_x64_cedarview-GPL.tgz
tar zxpf gcc420_glibc236_x64_cedarview-GPL.tgz -C /usr/local/

For the DiskStation DS412+ 32 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.2%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Bromolow%29/gcc421_glibc236_x86_bromolow-GPL.tgz
tar zxpf gcc421_glibc236_x86_bromolow-GPL.tgz -C /usr/local/

Now the net-snmp source code is downloaded and extracted:

DiskStation DS212+ Notes:

The ./configure call on the DS212 (might also work on the other DiskStation models):

./configure –prefix=/opt

The make call threw several errors, including:

/bin/sh: arm-none-linux-gnueabi-ld: not found
make[2]: *** [../blib/arch/auto/NetSNMP/default_store/default_store.so] Error 127

Before running the make command on the DS212+, execute the following command:

ln -s /opt/bin/ld /opt/bin/arm-none-linux-gnueabi-ld

The -i parameter may be omitted when running the make and make install commands.

cd /volume1/downloads
wget http://sourceforge.net/projects/net-snmp/files/net-snmp/5.7.2/net-snmp-5.7.2.tar.gz
tar xzf net-snmp-5.7.2.tar.gz
cd net-snmp-5.7.2

For the DiskStation DS1813+ 64 bit, execute the following to compile the net-snmp source (note that this command uses the compiler that was downloaded):

env CC=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-gcc \
LD=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ld \
RANLIB=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ranlib \
CFLAGS="-I/usr/local/x86_64-linux-gnu/include" \
LDFLAGS="-L/usr/local/x86_64-linux-gnu/lib" \
./configure --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux --prefix=/usr/local

For the DiskStation DS412+ 32 bit, execute the following to compile the net-snmp source (note: I could not use any of the different compilers that I tried downloading due to the compilers crashing with one of two error messages, so this command uses the compiler in /opt/bin):

env CC=/opt/bin/i686-linux-gnu-gcc \
LD=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ld \
RANLIB=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-linux-gnu/lib" \
./configure --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local

Several prompts will appear on the screen when either of the two commands is executed.  I entered the following for the prompts:

Default version of SNMP to use (3): 3
System Contact Information: (Enter)
System Location (Unknown): (Enter)
Location to write logfile (/var/log/snmpd.log): /opt/var/snmpd.log
Location to write persistent information (/var/net-snmp): (Enter)

Two additional commands to execute:

make -i
make install -i

Now we need to download the source code for the Nagios plugins (check_apt, check_breeze, check_by_ssh, check_clamd, check_cluster, check_dhcp, check_disk, check_disk_smb, check_dns, check_dummy, check_file_age, check_flexlm, check_ftp, check_http, check_icmp, check_ide_smart, check_ifoperstatup, check_ifstatus, check_imap, check_ircd, check_jabber, check_ldap, check_ldaps, check_load, check_log, check_mailq, check_mrtg, check_mrtgtraf, check_mysql, check_mysql_query, check_nagios, check_nntp, check_nntps, check_nt, check_ntp, check_ntp_peer, check_ntp_time, check_nwstat, check_oracle, check_overcr, check_ping, check_pop, check_procs, check_real, check_rpc, check_sensors, check_simap, check_smtp, check_snmp, check_spop, check_ssh, check_ssmtp, check_swap, check_tcp, check_time, check_udp, check_ups, check_users, check_wave) that allow Nagios to perform various monitoring tasks:

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
tar xzf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16/

Update November 19, 2013:

GabrielM reported in a comment below that the occasionally changing “current version” of the Nagios plugins makes it difficult to download the plugins from the source shown above.  If you open the http://prdownloads.sourceforge.net/sourceforge/nagiosplug/ web page in a web browser, the web browser will be redirected to http://sourceforge.net/projects/nagiosplug/files/ which contains the following statement:

“The Nagios Plugins are no longer distributed via SourceForge. For downloads and other information, please visit: https://www.nagios-plugins.org/
Source: README.md, updated 2013-10-01″

If you follow that link and then click the Download heading at the top of the page, there should be a link on the page that allows access to the current version of the Nagios plugins.  That link is currently: https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

The command that GabrielM provided should work:

wget https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

DiskStation DS212+ Notes:

The following configure, make, and make install commands were used:

./configure --prefix=/opt --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w"
make
make install

For the DiskStation DS1813+ 64 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux
make -i 
make install -i

For the DiskStation DS412+ 32 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local
make -i 
make install -i

Copy the Nagios plugins to the location expected by Nagios:

DiskStation DS212+ Notes:

The plugins were installed in the correct location on the DS212+

cp /usr/local/nagios/libexec/*.* /opt/libexec
cp /usr/local/nagios/libexec/* /opt/libexec
cp /usr/local/libexec/check_* /opt/libexec

Update November 19, 2013:

GabrielM reported in a comment below that the third command above may fail.  Depending on the compile options used, the first two commands or the third command may fail.  The first two commands are intended to accomplish the same task as the third command; the first two commands or the last command are expected to fail, but all three commands should not fail.  I should have explained this potential area of concern better.

Copy the Nagios startup script to the correct location so that Nagios will automatically start when the DiskStation is rebooted:

cp /usr/local/etc/rc.d/nagios /opt/etc/init.d/S81nagios

Verify that the ownership of the nagios directory is set correctly:

DiskStation DS212+ Notes:

The file is actually in the /opt/bin directory, so use this command instead:

chown nagios:nagios /opt/bin/nagios/nagios -R
chown nagios:nagios /usr/local/nagios -R

In addition to the main /opt/etc/nagios.cfg Nagios file, there are several other configuration files that are potentially used by Nagios (defined in the nagios.cfg file):

/opt/etc/objects/commands.cfg
/opt/etc/objects/contacts.cfg
/opt/etc/objects/timeperiods.cfg
/opt/etc/objects/templates.cfg
/opt/etc/objects/localhost.cfg
/opt/etc/objects/windows.cfg
/opt/etc/objects/server.cfg
/opt/etc/objects/switch.cfg
/opt/etc/objects/printer.cfg

We need to make a couple of adjustments in the  /opt/etc/objects/commands.cfg file.

vi /opt/etc/objects/commands.cfg

Change the ‘notify-host-by-email’ command definition section as follows:

define command{
    command_name notify-host-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘notify-service-by-email’ command definition section as follows:

define command{
    command_name notify-service-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘check_ping’ command definition section as follows (feel free to read the documentation for check_ping and specify different values):

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000,25% -c 5000,90% -p 3 
        }

Save the file and exit vi.

At this point, the Nagios network monitoring utility will likely experience an error similar to the following when attempting to send an alert email:

output=collect: Cannot write ./dfr6BFFPC7027203 (bfcommit, uid=1026, gid=25): Permission denied

Execute the following commands, which should fix the above problem:

chmod g+w /opt/var/spool/clientmqueue
chmod 444 /opt/etc/mail/*.cf
chmod 7555 /opt/sbin/sendmail

We will need to use su to test the execution of various commands as the nagios user.  Without this fix (described here), you might see the following error message:

su: warning: cannot change directory to /var/services/homes/nagios: No such file or directory su: /sbin/nologin: No such file or directory

Enter the following commands:

mkdir /var/services/homes
mkdir /var/services/homes/nagios
chown nagios:nagios /var/services/homes/nagios -R
vi /etc/passwd

Locate the line in the passwd file for the Nagios user.  Near the end of the line, /sbin/nologin should appear.  Replace that text with /bin/ash then save and exit vi.

Verify that the Nagios user is able to execute the check_ping plugin.  Replace MyDeviceHere with either an IP address or a network device name that is on your network:

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the ping command (called by check_ping) is not able to resolve a network device name, and the fully qualified dns name was not specified (MyDeviceHere.MyDomainHere.com), edit the /etc/resolv.conf file:

vi /etc/resolv.conf

On a new line in the file, add the following line (replacing MyDomainHere.com with your dns domain name for the network):

search MyDomainHere.com

Verify that sendmail works for the Nagios user.  At the prompt that appears, type a short message, press the Enter key, type a period, then press the Enter key again – replace MyEmailAddressHere@MyDomainHere.com with your email address):

su - nagios -c "/opt/sbin/sendmail -vt MyEmailAddressHere@MyDomainHere.com"

—-

It is important to always verify the Nagios configuration before starting (or restarting after a configuration change) Nagios.  To verify the configuration type the following:

/opt/bin/nagios -v /opt/etc/nagios.cfg

To start up Nagios as a background task (daemon), execute the following:

/opt/bin/nagios -d /opt/etc/nagios.cfg

To stop Nagios that is executing as a background task, type:

ps

InstallNagiosDiskStation11

Then search though the list of processes for the first line that shows /opt/bin/nagios -d /opt/etc/nagios.cfg.  The number at the left of that line, 31152 in this case, is used to stop Nagios.  To stop Nagios, type the following (replace 31152 with the number shown on your screen):

kill 31152

Side note: I tried installing quite a few different C++ compilers that supposedly work with the Synology DSM (see here).  As such, I had to find a way to remove a directory, that directory’s subdirectories, and files.  The following command will completely remove the /usr/local/i686-linux-gnu directory, should the need arise:

rm -rf /usr/local/i686-linux-gnu

At this point, Nagios will hopefully run as a background task, and it should be able to ping and send email alerts.  However, if you were following the above directions, we have not yet instructed Nagios which devices to monitor, and to whom the alert emails should be sent.  The next step is to define the email contacts by modifying the /opt/etc/objects/contacts.cfg file (see the documentation for assistance):

vi /opt/etc/objects/contacts.cfg

After setting up the contacts, we should probably tell Nagios which devices to monitor.  If there are a lot of devices on your network to be monitored, you might find that using Microsoft Excel rather than vi to create the object definitions makes the task more manageable.  Set up a simple worksheet with four columns.  Column A will be used to specify the short host_name for the object to be monitored.  Column B will be used to specify the alias (long description for the object).  Column C will be used to either specify the IP address for the device or the network name for the device.  Column D will be used to identify the group to which the object belongs and the file name to which the definition is saved (the Excel macro supports the following groups: ap, camera, computer, external, other, printer, server, switch).

InstallNagiosDiskStation13

The Excel macro is set up to read a tab delimited file, rather than reading the object description directly from the Excel worksheet.  Highlight all of the rows in the worksheet except for the top header row, and press Ctrl C (or edit – Copy) to copy the definitions to the Windows clipboard in tab delimited format.  Start Notepad (Start – Run – Notepad), and then press Ctrl V (or edit – Paste) to paste the tab delimited object descriptions into Notepad.  The Excel macro code expects the text file to be saved as nagioshosts.txt.

The Excel macro code follows (I image that not many computers still have a second floppy drive installed, so change the B:\Hardware Documentation\Synology\ path as appropriate for your environment):

Private Sub cmdProcessText_Click()
    Dim intFileNumRead As Integer
    Dim intFileNumAP As Integer
    Dim intFileNumCamera As Integer
    Dim intFileNumComputer As Integer
    Dim intFileNumExternal As Integer
    Dim intFileNumOther As Integer
    Dim intFileNumPrinter As Integer
    Dim intFileNumServer As Integer
    Dim intFileNumSwitch As Integer
    Dim intFileNumWrite As Integer

    Dim strLine As String
    Dim strItem() As String

    intFileNumRead = FreeFile
    Open "B:\Hardware Documentation\Synology\nagioshosts.txt" For Input As #intFileNumRead

    intFileNumAP = FreeFile
    Open "B:\Hardware Documentation\Synology\ap.cfg" For Output As intFileNumAP
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "# ap.cfg - lists the wireless access points to be monitored"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "define hostgroup{"; Chr(10);
    Print #intFileNumAP, "        hostgroup_name  ap                      ; The name of the hostgroup"; Chr(10);
    Print #intFileNumAP, "        alias           Local Access Points       ; Long name of the group"; Chr(10);
    Print #intFileNumAP, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumCamera = FreeFile
    Open "B:\Hardware Documentation\Synology\camera.cfg" For Output As intFileNumCamera
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "# camera.cfg - lists the IP cameras to be monitored"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "define hostgroup{"; Chr(10);
    Print #intFileNumCamera, "        hostgroup_name  camera                  ; The name of the hostgroup"; Chr(10);
    Print #intFileNumCamera, "        alias           Local IP Cameras          ; Long name of the group"; Chr(10);
    Print #intFileNumCamera, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumComputer = FreeFile
    Open "B:\Hardware Documentation\Synology\computer.cfg" For Output As intFileNumComputer
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "# computer.cfg - lists the shop floor computers to be monitored"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "define hostgroup{"; Chr(10);
    Print #intFileNumComputer, "        hostgroup_name  computer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumComputer, "        alias           Domain Computers          ; Long name of the group"; Chr(10);
    Print #intFileNumComputer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumExternal = FreeFile
    Open "B:\Hardware Documentation\Synology\external.cfg" For Output As intFileNumExternal
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "# external.cfg - lists the devices external to the LAN network to be monitored"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "define hostgroup{"; Chr(10);
    Print #intFileNumExternal, "        hostgroup_name  external               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumExternal, "        alias           Monitored devices External to the Network ; Long name of the group"; Chr(10);
    Print #intFileNumExternal, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumOther = FreeFile
    Open "B:\Hardware Documentation\Synology\other.cfg" For Output As intFileNumOther
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "# other.cfg - lists the miscellaneous devices to be monitored"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "define hostgroup{"; Chr(10);
    Print #intFileNumOther, "        hostgroup_name  other                 ; The name of the hostgroup"; Chr(10);
    Print #intFileNumOther, "        alias           Miscellaneous Devices ; Long name of the group"; Chr(10);
    Print #intFileNumOther, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumPrinter = FreeFile
    Open "B:\Hardware Documentation\Synology\printer.cfg" For Output As intFileNumPrinter
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "# printer.cfg - lists the printer devices to be monitored"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "define hostgroup{"; Chr(10);
    Print #intFileNumPrinter, "        hostgroup_name  printer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumPrinter, "        alias           Printers and Copiers  ; Long name of the group"; Chr(10);
    Print #intFileNumPrinter, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumServer = FreeFile
    Open "B:\Hardware Documentation\Synology\server.cfg" For Output As intFileNumServer
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "# server.cfg - lists the servers to be monitored"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "define hostgroup{"; Chr(10);
    Print #intFileNumServer, "        hostgroup_name  server               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumServer, "        alias           Server and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumServer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumSwitch = FreeFile
    Open "B:\Hardware Documentation\Synology\switch.cfg" For Output As intFileNumSwitch
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "# switch.cfg - lists the network equipment type devices to be monitored"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "define hostgroup{"; Chr(10);
    Print #intFileNumSwitch, "        hostgroup_name  switch               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumSwitch, "        alias           Switche and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumSwitch, "        }"; Chr(10); Chr(10); Chr(10);

    Do While Not (EOF(intFileNumRead))
        Line Input #intFileNumRead, strLine
        strItem = Split(strLine, vbTab)
        'strItem(0) = host_name
        'strItem(1) = alias
        'strItem(2) = address
        'strItem(3) = hostgroups
        Select Case strItem(3)
            Case "ap"
                intFileNumWrite = intFileNumAP
            Case "camera"
                intFileNumWrite = intFileNumCamera
            Case "computer"
                intFileNumWrite = intFileNumComputer
            Case "external"
                intFileNumWrite = intFileNumExternal
            Case "other"
                intFileNumWrite = intFileNumOther
            Case "printer"
                intFileNumWrite = intFileNumPrinter
            Case "server"
                intFileNumWrite = intFileNumServer
            Case "switch"
                intFileNumWrite = intFileNumSwitch
        End Select

        Print #intFileNumWrite, "define host{"; Chr(10);
        Select Case strItem(3)
            Case "ap"
                Print #intFileNumWrite, "        use             ap              ; Inherit default values from a template"; Chr(10);
            Case "camera"
                Print #intFileNumWrite, "        use             camera          ; Inherit default values from a template"; Chr(10);
            Case "computer"
                Print #intFileNumWrite, "        use             computer        ; Inherit default values from a template"; Chr(10);
            Case "external"
                Print #intFileNumWrite, "        use             external        ; Inherit default values from a template"; Chr(10);
            Case "other"
                Print #intFileNumWrite, "        use             other           ; Inherit default values from a template"; Chr(10);
            Case "printer"
                Print #intFileNumWrite, "        use             printer         ; Inherit default values from a template"; Chr(10);
            Case "server"
                Print #intFileNumWrite, "        use             server          ; Inherit default values from a template"; Chr(10);
            Case "switch"
                Print #intFileNumWrite, "        use             switch          ; Inherit default values from a template"; Chr(10);
        End Select
        Print #intFileNumWrite, "        host_name       "; strItem(0); "         ; The name we're giving to this device"; Chr(10);
        Print #intFileNumWrite, "        alias           "; strItem(1); "         ; A longer name associated with the device"; Chr(10);
        Print #intFileNumWrite, "        address         "; strItem(2); "         ; IP address of the device"; Chr(10);
        Print #intFileNumWrite, "        hostgroups      "; strItem(3); "         ; Host groups this device is associated with"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);

        Print #intFileNumWrite, "define service{"; Chr(10);
        Print #intFileNumWrite, "        use                     generic-service ; Inherit values from a template"; Chr(10);
        Print #intFileNumWrite, "        host_name               "; strItem(0); "        ; The name of the host the service is associated with"; Chr(10);
        Print #intFileNumWrite, "        service_description     PING            ; The service description"; Chr(10);
        Print #intFileNumWrite, "        check_command           check_ping!3000,25%!5000,90%    ; The command used to monitor the service"; Chr(10);
        Print #intFileNumWrite, "        normal_check_interval   5               ; Check the service every 5 minutes under normal conditions"; Chr(10);
        Print #intFileNumWrite, "        retry_check_interval    1               ; Re-check the service every minute until its final/hard state is determined"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);
    Loop

    Close #intFileNumRead
    Close #intFileNumAP
    Close #intFileNumCamera
    Close #intFileNumComputer
    Close #intFileNumExternal
    Close #intFileNumOther
    Close #intFileNumPrinter
    Close #intFileNumServer
    Close #intFileNumSwitch
End Sub

The files that are created use Unix/Linux standard line feed end of line marker characters, rather than the Windows standard carriage return/line feed combination characters.  As such, opening the generated files using Notepad is not advised.  Copy the generated files back to the /opt/etc/objects/ path on the DiskStation (copy the files to a Shared Folder on the DiskStation, then use the cp command to copy the files from the share location to /opt/etc/objects/ – the Shared Folders are typically created as a subdirectory in the /volume1/ directory).

If you decided to use some of the non-standard Nagios group names (as I did), those non-standard group names must be defined in the /opt/etc/objects/templates.cfg file:

vi /opt/etc/objects/templates.cfg

A portion of the additional entries that I made in this file include the following:

define host{
       name                    ap      ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, access points are monitored round t
       check_interval          5               ; Actively check the access point every 5 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each access point 10 times (max)
       check_command           check_ping      ; Default command to check if access points are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   30              ; Resend notifications every 30 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              ap ; Host groups that access points should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

define host{
       name                    camera  ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, cameras are monitored round t
       check_interval          60              ; Actively check the device every 60 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each device 10 times (max)
       check_command           check_ping      ; Default command to check if device are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   240             ; Resend notifications every 240 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              camera ; Host groups that cameras should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

Nagios will not know that it should read the additional configuration files until it is told to do so by modifying the /opt/etc/nagios.cfg file.

vi /opt/etc/nagios.cfg

Add the following lines to the nagios.cfg file:

# Charles Hooper's object types
cfg_file=/opt/etc/objects/ap.cfg
cfg_file=/opt/etc/objects/camera.cfg
cfg_file=/opt/etc/objects/computer.cfg
cfg_file=/opt/etc/objects/external.cfg
cfg_file=/opt/etc/objects/other.cfg
cfg_file=/opt/etc/objects/printer.cfg
cfg_file=/opt/etc/objects/server.cfg
cfg_file=/opt/etc/objects/switch.cfg

We have made a large number of changes to the configuration files, so it is important to verify that there are no errors in the configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are found in the configuration, terminate (kill) nagios and then restart as described above.

—-

Update July 28, 2013:

When attempting to start Nagios in daemon mode (/opt/bin/nagios -d /opt/etc/nagios.cfg) I encountered a couple of problems related to permissions for the Nagios user.  The nagios process was not listed when I used the ps command.  I then tried executing the following commands:

touch /opt/var/nagios.log
chown nagios:nagios /opt/var/nagios.log

Nagios was then able to start in daemon mode, but wrote messages similar to the following in the /opt/var/nagios.log file:

[1375058364] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058364] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058364] Bailing out due to errors encountered while attempting to daemonize… (PID=11451)
[1375058656] Nagios 3.5.0 starting… (PID=12936)
[1375058656] Local time is Sun Jul 28 20:44:16 EDT 2013
[1375058656] LOG VERSION: 2.0
[1375058656] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058656] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058656] Bailing out due to errors encountered while attempting to daemonize… (PID=12936)
[1375060107] Error: Unable to create temp file for writing status data: Permission denied
[1375060117] Error: Unable to create temp file for writing status data: Permission denied
[1375060127] Error: Unable to create temp file for writing status data: Permission denied
[1375060137] Error: Unable to create temp file for writing status data: Permission denied
[1375060147] Error: Unable to create temp file for writing status data: Permission denied
[1375060157] Error: Unable to create temp file for writing status data: Permission denied

I tried to set the permissions for a couple of other files, only to find another long list of Permission denied messages:

touch /opt/var/objects.cache
touch /opt/var/nagios.lock
touch /opt/var/nagios.tmp
chown nagios:nagios /opt/var/objects.cache
chown nagios:nagios /opt/var/nagios.lock
chown nagios:nagios /opt/var/nagios.tmp

I then recalled that I had seen similar messages on the DiskStation DS412+.  I then tried a different approach, creating a nagios directory in the /opt/var directory, creating a couple of subdirectories in that directory, and then assigning nagios as the owner of that directory structure:

mkdir /opt/var/nagios
mkdir /opt/var/nagios/archives
mkdir /opt/var/nagios/spool
mkdir /opt/var/nagios/spool/checkresults
chown nagios:nagios /opt/var/nagios -R
vi /opt/etc/nagios.cfg

In the nagios.cfg file, I made the following changes:

log_file=/opt/var/nagios/nagios.log
status_file=/opt/var/nagios/status.dat
lock_file=/opt/var/nagios/nagios.lock
temp_file=/opt/var/nagios/nagios.tmp
log_archive_path=/opt/var/nagios/archives
check_result_path=/opt/var/nagios/spool/checkresults
state_retention_file=/opt/var/nagios/retention.dat
debug_file=/opt/var/nagios/nagios.debug

After saving the file and exiting vi, I restarted Nagios in daemon mode.  Reading the last 100 lines of the Nagios log file is now accomplished with this command:

tail -n 100 /opt/var/nagios/nagios.log

—-

There are a lot of seemingly interesting Nagios plugins, including check_oracle (I believe that this plugin requires the Oracle client to be installed – good luck with that install).  On one of the DiskStations the check_snmp plugin did not compile, while on the other DiskStation the check_http plugin did not compile.

It might be interesting to see what solutions readers are able to develop from the above starting point.  The above information is the result of many hours of experimentation as well as a couple minutes reading through sections of the Nagios documentation (it reads like the Oracle Database documentation, so it should be an easy read once I am in the right mood) and hopelessly scanning the ‘net for information about obscure error messages.  Have fun, and try not to put the DiskStation out of service due to a mistaken file copy.

Update November 19, 2013:

Installing an updated version of the Synology DSM operating system may temporarily disable Nagios.  Make backups of all Nagios confirguration files (copying the files with the cp command to a directory in /volume1 is generally safe) before installing different versions of the Synology DSM operating system.

The DSM 4.3 operating system installation apparently removed the /var/services/homes directory.  That directory removal makes it impossible for the Nagios user to login to run various commands.  I assume that the removal of the homes directory is intentional, so a work around for that problem:

mkdir /var/services/home
mkdir /var/services/home/nagios
chown nagios:nagios /var/services/home/nagios -R
vi /etc/passwd

In the /etc/passwd file, change all /homes/ entries to /home/ then save and exit vi.

The installation of the different DSM version (including versions before 4.3) will likely also replace/remove the libltdl.* files located in /opt/local/lib and /usr/lib, so we need to copy those files back into the correct directories:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Once the above items are copied, try executing the check_ping command as the nagios user (replace MyDeviceHere with either an IP address or the name of a device on your network).

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the DiskStation reports that the check_ping command was not found, then copy that file back to the /opt/libexec/ directory.  If the above command was successful, try verifying the Nagios configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If the verification was successful, start Nagios as a daemon:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Execute the ps command and verify that the above command is listed in the running processes:

ps

Finally, verify that Nagios is still set to start automatically as a daemon:

ls /opt/etc/init.d/S81nagios

If a file is listed when the above command is executed, then Nagios should now be fully repaired.

Update August 12, 2015:

You may at some point need to verify that a http web server is online.  If you execute the following command (replace http://www.mydomain.com with an actual web server for a domain to be monitored):

/opt/libexec/check_http -H www.mydomain.com

You may see one of the following errors:

/opt/libexec/check_http: error while loading shared libraries: libssl.so.0.9.8: cannot open shared object file: No such file or directory
/opt/libexec/check_http: error while loading shared libraries: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory

The problem is likely caused by two missing symbolic links.  The following commands worked on both the DS1813+ and the DS412+:

ln -s /usr/lib/libssl.so.1.0.0 /usr/lib/libssl.so.0.9.8
ln -s /lib/libcrypto.so.1.0.0 /usr/lib/libcrypto.so.0.9.8

If instead you see this error message, then there is likely a problem with the compiled Nagios plugins (the DS1813+ was missing this plugin, so I copied it from the DS412+):

-ash: /opt/libexec/check_http: not found

To use check_http to monitor a website status I had to create a new host template because Nagios continued to use the check_ping command to check the website status, even though I had tried to override the default template value in the service entry for the website (this might be a bug, or I might have configured something wrong).  Add the following lines to the /opt/etc/objects/templates.cfg file below the external host definition section:

define host{
        name                    external-http   ; The name of this host template
        use                     generic-host    ; Inherit default values from the generic-host template
        check_period            24x7            ; By default, Windows servers are monitored round the clock
        check_interval          5               ; Actively check the server every 5 minutes
        retry_interval          1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts      1               ; Check each server 10 times (max)
        check_command           check_http      ; Default command to check if servers are "alive"
        notification_period     24x7            ; Send notification out at any time - day or night
        notification_interval   30              ; Resend notifications every 30 minutes
        notification_options    d,r             ; Only send notifications for specific host states
        contact_groups          admins          ; Notifications get sent to the admins by default
        hostgroups              external        ; Host groups that web servers should be a member of
        register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
        }

Next, define the websites to be monitored in the /opt/etc/objects/external.cfg file.  For example, to monitor the http://www.oracle.com website, add the following lines:

define host{
        use             external-http        ; Inherit default values from a template
        host_name       Oracle_website       ; The name we're giving to this device
        alias           Oracle Website       ; A longer name associated with the device
        address         www.oracle.com       ; Host address of the device
        hostgroups      external             ; Host groups this device is associated with
        }
 
define service{
        use                     generic-service ; Inherit values from a template
        host_name               Oracle_website  ; The name of the host the service is associated with
        service_description     HTTP            ; The service description
        check_command           check_http      ; The command used to monitor the service
        normal_check_interval   3               ; Check the service every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-check the service every minute until its final/hard state is determined
        }

It appears that Nagios might be expecting an IP address rather than a host name (such as http://www.oracle.com), so you may also need to modify the /opt/etc/objects/commands.cfg file, changing the -I to -H in the check_http command definition, as shown below:

# 'check_http' command definition
define command{
        command_name    check_http
        command_line    $USER1$/check_http -H $HOSTADDRESS$ $ARG1$
        }

When making changes to the Nagios configuration, always use the verify command to make certain that syntax errors are not present in the configuration files.

/opt/bin/nagios -v /opt/etc/nagios.cfg

To restart Nagios, use the ps command to locate the first process entry with “/opt/bin/nagios -d /opt/etc/nagios.cfg” in its COMMAND column:

ps | grep 'nagios'

If that line contains the number 3692 in the first column, execute the following:

kill 3692

Then restart Nagios in the background (daemon mode)

/opt/bin/nagios -d /opt/etc/nagios.cfg

I never bothered to determine how to make the Nagios monitoring status web pages work with the Synology DS412+, DS1813+, and DS212+.  I thought that I would see what steps would be required to make that functionality work on a Synology DS415+ (note that the following steps work nearly exactly the same on the Synology DS412+ and DS1813). The Synology DSM operating system utilizes port 5000 for web traffic.  If you add the WordPress package to the Synology, that package uses the standard port 80 for web traffic.  If you followed the above steps for installing Nagios, you installed a second copy of the Apache web server on the Synology that uses port 8000 for web traffic.  If your Synology has an IP address of 192.168.1.60, then you would use the following website address to access the second web server on the Synology: http://192.168.1.60:8000/

The Nagios monitoring status web pages, once set up, will appear as a directory of that second web server, as shown below:

nagiosds415_web_status

The value of the Nagios monitoring status web pages becomes apparent fairly quickly by reviewing the above screen capture.  The above Tactical Overview shows that there are 18 monitored devices that are up, one monitored device that is down, and another device that is in the process of being checked.  The links at the left provide additional information about the monitored devices.  Let’s configure the second copy of Apache on the Synology to support the Nagios monitoring status web pages.

The second copy of Apache uses the configuration file /opt/etc/apache2/httpd.conf.  Edit that file:

vi /opt/etc/apache2/httpd.conf

Add the following directly below the # Supplemental configuration heading, which is near the end of the file:

# Added by per Charles Hooper's Nagios installation guide for Synology DS415+
Include etc/apache2/conf.d/nagios.conf
Include etc/apache2/conf.d/php.conf

Save the file and exit vi.  Next, we need to create the htpasswd.users file in the correct location.  In the above steps, we executed the following command:

/opt/sbin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

That htpasswd.users file is expected to be in the /opt/etc directory.  Execute the following command to create the file in the correct directory (you will be prompted for a password for the nagiosadmin user):

/opt/sbin/htpasswd -c /opt/etc/htpasswd.users nagiosadmin

The Nagios monitoring status web pages require PHP support to be added to the second Apache installation.  Execute the following commands to install PHP:

/opt/bin/ipkg update
/opt/bin/ipkg list
/opt/bin/ipkg install php
/opt/bin/ipkg install php-apache

Next, we need to modify the /opt/etc/apache2/conf.d/nagios.conf file that was created during the Nagios installation.

vi /opt/etc/apache2/conf.d/nagios.conf

Below the heading <Directory “/opt/share”> add the following line:

   DirectoryIndex index.php

Save the file and exit vi.  Next, we need to adjust the php.ini file on the Synology:

vi /opt/etc/php.ini

In the file (near line 108), locate the following line:

output_buffering = Off

Change that line to show:

output_buffering = 8192

Locate the following line (near line 248) in the file:

memory_limit = 8M

Change that line to show:

memory_limit = 128M

Save the file and exit vi.  There may be a second php.ini file located in the /etc/php directory, make the same change to that file, then save the file and exit vi:

vi /etc/php/php.ini

Perform a graceful restart of the second copy of Apache:

/opt/sbin/apachectl -k graceful

Note:

The above command executed without a problem on the DS1813+, however, the DS412+ displayed one of the following two error messages:

httpd: Syntax error on line 102 of /opt/etc/apache2/httpd.conf: Cannot load /opt/libexec/mod_dav.so into server: /opt/libexec/mod_dav.so: undefined symbol: syno_recv_usr2
httpd: Syntax error on line 108 of /opt/etc/apache2/httpd.conf: Cannot load /opt/libexec/mod_dav_fs.so into server: /opt/libexec/mod_dav_fs.so: undefined symbol: HomeIsExist

When each error appeared, I modified the /opt/etc/apache2/httpd.conf file to put a # symbol in front of the offending line.  Doing so permitted the apachectl command to execute without error, but the errors may be a sign that the Nagios status pages will not work correctly.

Try to access the Nagios status monitoring pages in a web browser (replace 192.168.1.60 with the IP address of your Synology)  http://192.168.1.60:8000/nagios/

You should be prompted to enter a username and password.  Enter nagiosadmin for the username, and enter the password for that user that you created when executing the /opt/sbin/htpasswd command earlier.  If the web page does not display, take a look at the last 100 lines of the Apache error log to see if any clues are provided:

tail -n 100 /opt/var/apache2/log/error_log

Note:

The DS1813+ showed the following in the error_log – if you see this error, the error may go away after completing the steps that follow:

File does not exist: /opt/share/www/nagios

Some of the interesting error messages that I experienced including the following:

[Sun Jan 04 14:07:55 2015] [error] [client 192.168.1.218] (2)No such file or directory: Could not open password file: /opt/etc/htpasswd.users
 
[Sun Jan 04 14:23:04 2015] [error] [client 192.168.1.218] Directory index forbidden by Options directive: /opt/share/
 
[Sun Jan 04 15:07:09 2015] [error] [client 192.168.1.218] File does not exist: /opt/share/<, referer: http://192.168.1.60:8000/nagios/
 
[Sun Jan 04 17:53:06 2015] [notice] child pid 15126 exit signal Segmentation fault (11)
 
[Sun Jan 11 09:41:57 2015] [error] [client 192.168.1.213] script not found or unable to stat: /opt/sbin/statusmap.cgi, referer: http://192.168.1.60:8000/nagios/side.php

The last of the above errors still remains, the file /opt/sbin/statusmap.cgi does not exist.  For now, I will let someone else find a solution for that problem.

Note:

If you saw a login prompt, but the login was unsuccessful, check the Apache error log:

tail -n 100 /opt/var/apache2/log/error_log

On the DS1813+, I saw the following error:

No such file or directory: Could not open password file: /opt/etc/htpasswd.users

The above error indicates that the htpasswd.users file was expected to be found in a different location.  To work around this issue, I simply created the file in the location specified in the error log:

/opt/sbin/htpasswd -c /opt/etc/htpasswd.users nagiosadmin

If you find that a monitored device is down, using the web interface it is possible to disable future checking of the device, as well as alter a number of other notification options.  Unfortunately, clicking any of the options under the Host Commands heading will result in a permission error.

nagiosds415_change_notifications

To fix the permission error:

Using the Synology Control Panel interface, create a new group named www – this group requires no specific Synology permissions.  Next, using the Synology Control Panel interface, create a user named apache – make this user a member of the nagios, users, and www groups.  This user requires no specific Synology permissions.  Then change the username and group under which Apache executes:

vi /opt/etc/apache2/httpd.conf

In the httpd.conf file, locate the following two lines:

User nobody
Group #-1

Change those lines to appears as follows:

User apache
Group www

Save the file and exit vi.  Gracefully restart the second copy of Apache:

/opt/sbin/apachectl -k graceful

Refresh the Nagios monitoring status web page – the options under the Host Commands heading should no longer return an error.





Value of Improving One’s Knowledge

24 03 2013

March 24, 2013

As I type this blog article I am in the middle of reading a second book on the topic of Windows Server 2008 R2.  I bought the books several months ago, possibly even a year or two ago, and just had not found the time to invest in reading yet another dry IT-related 1,500 page book.  I would much prefer to read an exciting Oracle Database book, but those books seem to be in short supply (and I really do need to finish these two Windows Server 2008 R2 books so that I can move onto the Windows Server 2012 books, the Exchange Server 2013 books, and the Oracle Database 12c books that will be released eventually).

Three years ago I put together an article about finding the motivation to keep reading through various IT books, and offered a suggested reading list for people interested specifically in various aspects of Oracle Database (totally unrelated, but I recall that the word aspect was a pet peeve of one of my college English professors… he would say, “What is an aspect?” – I probably have used that word about five times since then).  I think that my list of suggested books is still a decent list.  However, Kyle Hailey recently put together a related article, “Where to begin with Oracle and SQL” that is excellent, and a bit more up to date than my list – the only book on Kyle’s list that I have not read is written by Dan Tow.

So, for people working in the information technology (IT) disciplines, how much effort do you put into improving your knowledge, or just maintaining that knowledge?  Do you actively try to break things, in the hope of discovering a solution?  Do you participate in discussion forums, such as the OTN forums or Usenet?  Do you read magazines, books, and various online articles?  Or, do you simply continue to perform the same actions that you have for the last five, 10, or 20 years (if the RULE worked just fine back then, why change)?

I try to read at least a couple of IT focused books a year.  For many of the Oracle Database specific books read since mid-2008 I have tried to write formal reviews of those books, posted to Amazon.com.  One of those reviews stretched to 35.5 typewritten pages!  Early yesterday morning a reader left the following comment attached to that review:

Mar 22, 2013 11:43:09 PM PDT SavvyShopper says:
Charles, you seem Very Intelligent(?) and should then Write a Book, fixing all these issues, than writing so much … :) How do you get so much time , if you are really “Working” ??

Oddly, the person deleted their comment a mere two hours after I posted a reply.  My first reaction to the comment was “must spending time expanding one’s knowledge be mutually exclusive to being employed?”  That was not the first time that someone using Amazon left a similar comment attached to one of my reviews:

Mar 31, 2011 12:37:49 PM PDT Wanda A. Cadogan says:
And don’t you ever wonder how guys like Mr. Hooper have so much time to write a 24 page review?

Is it really that hard to believe that guys like Mr. Hooper take notes (sometimes extensive notes) while reading technical publications?

I noticed that SavvyShopper had previously reviewed a couple of Oracle Database related books, as well as a couple of woodworking products.  Quite a while ago I toyed a bit with woodworking (a bit in woodworking…. cuts both ways), as a matter of fact I thought that I was headed to college to become a woodworking instructor.  So, I possibly have an odd connection with SavvyShopper, might this be an opportunity for a teachable moment?  I noticed too that in one of his reviews that SavvyShopper appeared to be struggling with the concept that a softwood can in fact be harder than a hardwood.  I am a little disappointed that SavvyShopper deleted his comment after all the effort that I put into formulating a reply:

SavvyShopper,

Thank you for leaving your comment.

Some of my book reviews, especially when there are many errors in the book being reviewed, tend to be very long.  I read computer related books to improve my knowledge, whether that means that I am simply reminded how something works, or I am learning something entirely new.   Investing effort (and occasionally an excessive amount of time) into improving my skill set is important to me, and I hope that it is important to other people in the IT related professions.  The Oracle Database book reviews that I post to Amazon are mostly a by-product of that self-improvement process.

I am still happily employed at the same place where I have worked for more than a decade as an IT professional.  Making time for expanding one’s computer skills through reading sometimes means sacrificing time spent with hobbies or other outside of work activities.  I believe that it is that sacrifice that is an important ingredient in the formula for long-term gainful employment.

Woodworking is a hobby of mine, and judging by your reviews on Amazon, woodworking may also be one of your hobbies.  I see that you gave a thin kerf Forrest Woodworker II saw blade a 2 out of 5 rating (coincidentally the same rating that I gave to this book), primarily because you found that the blade did not impress you when cutting pine and redwood while building one project.  Pine, much more than redwood, leaves deposits of pitch on the face and sides of saw blade teeth (kerosene (or Simple Green) and an old toothbrush will remove the pitch deposits, but I think that there are commercial spray on products on the market now that serve the same purpose).  This buildup of pitch effectively reduces the saw blade’s ability to cut smoothly.  The thin kerf body of the saw blade that you reviewed is less resistant to flexing in a heavy cut, so combined with the pitch build up on the blade, you probably did experience poor quality cuts.  If you mounted the saw blade on an inexpensive saw with a weak/bad bearing setup, that will also explain the poor quality cut that you received.  If you want a treat, mount a regular kerf Woodworker II on an older, properly tuned 3HP or 5HP Unisaw (or Powermatic Model 66) and witness the glass smooth cuts in (even three inch thick) hardwood such as poplar, oak, walnut, or hard maple.

Getting back on topic of your comment, I currently have very little extra time to put into writing.  While probably not your intention, I do not see a book that summarizes mistakes found in other books and the accompanying corrections as selling more than a couple of dozen copies; based on some Oracle book authors remarkable ability to abuse the DMCA (Digital Millennium Copyright Act), such a book could also become a legal nightmare for the publishing company.  I have been approached a couple of times to see if I had an interest in writing a book on the topic of Oracle Database.  At this time, I simply cannot make that commitment (writer’s block, lack of available free time, required reading to stay current in the computer-related field, wanting to spend more time working with non-computer related routers and bits, etc.).

A couple of years ago I contributed to an Oracle Database specific book with fellow OakTable Network members.  The technical standards of the members of that group are extremely high (do a Google search, if you are interested).  As such, I would estimate that I spent roughly eight hours per page writing, testing, proofreading, rewriting, proofreading, testing, rewriting, and proofreading again.  It was quite an experience, especially when Oracle Corporation announced the general availability of a new major version release a day or two before the first draft of the section of the book had to be submitted to the publisher – more testing, rewriting, and proofreading of the book section followed after that first draft was submitted.

Thank you again for leaving the comment.

So, do you go out of your way to learn something new every day, or are you more likely to “let it roll” for five, 10, or 20 years?  For the record, I have a rather large stack of unread woodworking magazines (most of the unread magazines range from 0 to 7 years old) sitting on the shelf that are waiting for the time when spending time expanding one’s knowledge is mutually exclusive to being employed (in the IT industry).

A new logo in development – resawn from 1 inch x 2 inch quarter sawn oak.

Bookmatch Arrangement #1:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #2:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #3:

OLYMPUS DIGITAL CAMERA

The New Logo:

OLYMPUS DIGITAL CAMERA





What does it Mean when a Select Statement in Oracle is using 100% CPU?

14 02 2013

February 14, 2013

A couple of days ago I noticed that an interesting set of search keywords were used to access this blog.  The search keywords:

What does it mean when a select statement in Oracle is using 100% cpu

I had two thoughts when I first saw that set of search keywords:

  • Well Done!
  • What an inconsiderate Oracle developer!

I feel a bit conflicted about the two first thought answers.  Think about the above bullet points for a moment.

I was interested enough in the keywords that were phrased in the form of a question, that I went though the search keywords used to access this blog for the last two weeks, and found the following related search keywords:

  • ora-01861 Oracle causing cpu 100% usage
  • Oracle one session 100% cpu
  • Oracle cpu utilisation 100%, is it a good or bad sign
  • Oracle database server cpu utilization is 100

Interesting, and possibly thought provoking.  I recalled an article that I wrote on this blog roughly three years ago.

Possible related thoughts of a temporarily rogue DBA:

  1. I fixed your problem without changing the execution plan… take a look at the EXPLAIN PLAN output for yourself.
  2. I fixed your problem without changing the execution plan… the server is more efficient now that we have consolidated three more database instances onto the server.
  3. I fixed your problem without changing the execution plan… that PGA_AGGREGATE_TARGET parameter was clear on the other side of 100 MB, so I set it to 10 MB.
  4. I fixed your problem without changing the execution plan… a group of the guys in the office found a way to load the game Doom on the server.
  5. I fixed your problem without changing the execution plan… I took half of the hard drives out of the array and moved the server to a 10 Mb hub.

Took a close look at the bulletpointed search terms.  If someone were to ask you one of the following questions, how would you respond if you were earnestly attempting to help the person?

  1. What does it mean when a Select statement in Oracle is using 100% CPU?
  2. The server admin reported that the Oracle database server cpu utilization hits 100% too frequently, what should I do about the problem?
  3. Oracle is throwing an ORA-01861 error, and that is causing 100% CPU usage.  What should I do?

Share your thoughts.

Summary of Suggestions/Causes Shared by Readers (Last Updated February 14, 2013):

  • Intended or unintended Cartesian product between row sources – does the query have a DISTINCT clause.  (Rodger)
  • Intended or unintended data type conversions – is the data model storing numbers or date values in VARCHAR2 columns.  (Jeremy Kendrick)




Connecting to an Oracle Database with Visual Basic 6.0 on Windows 8 64 Bit

25 11 2012

November 25, 2012 (Modified December 7, 2012, May 22, 2015)

Compatibility problems?  Visual Basic 6.0, released in 1998, is not officially compatible with Windows 8 Pro 64 bit… or Windows 7, or Windows Vista.  But I still like the language a lot for its simplicity, rapid development, and significant pre-existing code base within my company.  Of course, Oracle Database 11.2.0.3 and the Oracle Client 11.2.0.3 are not officially supported on Windows 8, so maybe the quest is an exercise in futility.

Roughly a month ago a thread appeared on the OTN forums asking how to connect Visual Basic 2010 to Oracle Database Personal Edition on Windows 7 Pro 64 bit running on the same computer.  Several people offered very good advice to guide the OP.  The OP eventually asked how to connect Visual Basic 6 to Oracle Database Personal Edition running on the same Windows 7 Pro 64 bit computer.  As I mentioned, Visual Basic 6 is not officially compatible with Windows 7, and the fact that it is a 32 bit application running on a 64 bit Windows computer means that the 32 bit Oracle Client must also be installed on the computer.

Visual Basic 6.0 (note that you should install service pack 6 for Visual Basic 6, even if the installer locks up at the very end) will run just fine on Windows 7 Pro 64 bit (and probably on Windows 8 Pro 64 bit also), although drawing form objects is a little slow.  The automatic Windows updates may also pose an issue.  From time to time it may be necessary to re-register the 32 bit MSCOMCTL.OCX file to avoid error messages when opening Visual Basic projects.  To re-register that file, from a Windows command prompt (Run – cmd.exe) , enter the following:

cd \windows\syswow64
regsvr32 MSCOMCTL.OCX

Using Visual Basic 6.0, or any development environment for that matter, requires that certain components be selected for installation when installing the Oracle Client.  At a minimum, Oracle ODBC Driver 11.2.0.x.0 (for ODBC type connections), and Oracle Provider for OLE DB 11.2.0.x.0 (for OLE DB type connections) must be installed with the Oracle Client.

Let’s set up a simple Visual Basic project to demonstrate connecting to Oracle Database 11.2.0.3 running on the same Windows 8 computer.  First, when installing Visual Basic 6, you may want to install and register a couple of additional uncommonly used controls.  Those controls are located in the \COMMON\TOOLS\VB\CONTROLS folder on the Visual Basic install CD.  On a 64 bit computer, the controls should be copied to the C:\Windows\Syswow64 folder and then registered in that location using regsvr32 as demonstrated above with the MSCOMCTL.OCX file.  In some cases, a license file must also be imported into the Windows registry – those files have a .REG extension in the same location on the CD:

On to building the project.  First, we need to add a reference to allow the project to use ADO and the Oracle Provider for OLE DB that was installed with the Oracle Client.  From the Project menu, select References…:

Next, we need to add the most recent version of the Microsoft ActiveX Data Objects Library, version 6.1 for Windows 7 and Windows 8, version 6.0 for Vista, or version 2.8 for Windows XP.  After selecting the correct version, click the OK button:

Now let’s add a couple of additional controls to the project.  I first started using the Microsoft Grid Control in Visual Basic 2.0, so for old time’s sake let’s add that control to the project.  We may also want to add a status bar to the project’s form, so let’s also add Microsoft Windows Common Controls 6.0 (SP6).  Then click the OK button:

Now, draw the form controls on the form.  We need three command buttons named cmdLateBinding, cmdEarlyBinding, and cmdQueryDatabase, a grid control named grdOutput, and optionally a status bar named stbStatus with its Style property set to Simple:

Let’s add the code to the Late Binding (cmdLateBinding) button (this is essentially the same code that I provided in the OTN thread, and does not require the addition of the Microsoft ActiveX Data Objects Library in the references for the project – so this code is a good simple test to make certain that everything is installed correctly):

Dim intResult
Dim strDatabase
Dim strUserName
Dim strPassword
Dim strSQL
Dim dbDatabase
Dim snpData

'On Error Resume Next

strDatabase = "MyDB" 'From tnsnames.ora
strUserName = "MyUserID"
strPassword = "MyPassword"

Set snpData = CreateObject("ADODB.Recordset")
Set dbDatabase = CreateObject("ADODB.Connection")

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";"
dbDatabase.Open

If (dbDatabase.State = 1) And (Err = 0) Then
    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  SYSDATE CURRENT_DATE" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  DUAL"

    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        If Not (snpData.EOF) Then
            Do While Not (snpData.EOF)
                MsgBox snpData("current_date")

                snpData.MoveNext
            Loop
        End If
        snpData.Close
    End If
Else
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
End If

dbDatabase.Close
Set snpData = Nothing
Set dbDatabase = Nothing

Let’s run the project and click the Late Binding button to see what happens.  We should expect to see some sort of connection failure message, unless there is an entry in the tnsnames.ora for MyDB:

Well, that error message certainly was unexpected (the OP in the OTN thread may have encountered the same error message)!  Let’s click on the Debug button:

ADODB cannot find the OraOLEDB.Oracle provider.  I am sure that I installed that component when installing the 32 bit Oracle Client on the computer.  The Visual Basic code that was added to the Late Binding command button is very generic.  In fact, as written the code can be placed in a plain text file with a .VBS extension and executed as a VBScript file with either the wscript or cscript command.  Copy the code from the Late Binding command button into a plain text file (start the Notepad program and paste the code), then save the file as “LateBinding.vbs” (including the quotes).

Now to test the script that was just created.  Open a Windows command prompt (Windows key and R, type cmd, press the Enter key).  Change to the folder where the LateBinding.vbs file was saved, then type:

wscript LateBinding.vbs

Note that this time the computer indicated an ORA-12154: TNS:could not resolve the connect identifier error, rather than a Provider could not be found error – we were hoping to obtain the same ORA-12154 error as we did when working in Visual Basic 6.  What changed?  Well, this is a 64 bit computer, so the 64 bit wscript program was used, which used the 64 bit Oracle client (actually the 64 bit Oracle Database home files).

Let’s try again, this time with the 32 bit version of the wscript program.  To execute the script with the 32 bit wscript, execute the following:
c:\windows\SysWOW64\wscript LateBinding.vbs

Notice that the error now appears as Provider could not be found, which is the same error message that was observed in Visual Basic 6.  We reproduced the problem!  Now how do we fix it?  I suppose that we should check with the Oracle Client installer to verify that the Oracle Provider for OLE DB 11.2.0.x.0 component was in fact installed.  Let’s re-run the Oracle installer to check the Inventory of the installed components.  Just open the Start menu, click the Oracle – OraClient11g_home1 folder, then expand Oracle Installation Products, then click Universal Installer:

Uh, sure.  Windows 8 and its tablet-like Start menu… good luck finding the Universal Installer for the Oracle Client. (Edit November 26, 2012: I was trying to be humorous here.  If you have a touch screen, you can swipe your finger up from the bottom of the screen to reveal the option to display all items that normally appear on the pre-Windows 8 Start menu, separated by folder/Oracle home.  If you do not have a touch screen, hold down the Windows key and press the W key while in this new Windows 8 start screen to display the same list of items separated by folder/Oracle home.)  I always change the base install folder to C:\Oracle when installing Oracle components, so to start up the Universal Installer for the Oracle Client, I can just run:
C:\oracle\product\11.2.0\client_1\oui\bin\setup.exe

Oracle Provider for OLE DB 11.2.0.3.0 is in the list of installed components (click the Installed Products button in the Oracle Installer to see the above list), so why doesn’t the Oracle Provider for OLE DB from the 32 bit client work as expected?  That component requires that the OraOLEDB11.DLL file is found in the expected location, in my case, here:
C:\Oracle\product\11.2.0\client_1\bin\OraOLEDB11.DLL

Yes, the file is in that location.  Is the Oracle Client 11.2.0.3 just not compatible with Windows 8 Pro?  The Oracle Client 11.2.0.3 seemed to work as expected on a computer that was upgraded from Windows 7 Pro to Windows 8 Pro, including the OLE DB functionality, so why will that functionality NOT work on a new computer with Windows 8 Pro installed?

While comparing the Windows registry entries on a Windows 7 Pro 64 bit computer with the Windows registry entries on the Windows 8 Pro computer where the OLE DB functionality did not work, I noticed a problem – missing entries in the Windows 8 Pro’s Windows registry.  From the Windows 7 Pro 64 bit computer, I exported the three missing Windows registry sections:

3F63C36E-51A3-11D2-BB7D-00C04FA30080.reg:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}]
@="OraOLEDB.Oracle"
"OLEDB_SERVICES"=dword:ffffffff

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors]
@="Extended Error Lookup Service"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\OLE DB Provider]
@="Oracle Provider for OLE DB"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ProgID]
@="OraOLEDB.Oracle.1"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\TypeLib]
@="{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.Oracle"

3FC8E6E4-53FF-11D2-BB7D-00C04FA30080.reg:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\ProdID]
@="OraOLEDB.ErrorLookup.1"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.ErrorLookup.1"

0BB9AFD1-51A1-11D2-BB7D-00C04FA30080.reg (just the win32 section was missing):

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\TypeLib\{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}\1.0\win32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"

After verifying that the Oracle Client’s home on the Windows 8 Pro computer was in fact located in C:\Oracle\product\11.2.0\client_1 (it may not be on your computer, change the above registry files as necessary for your computer – use two \\ characters for each \ character that normally appears in the path to the Oracle Client’s home), I imported the registry entries into the Windows 8 Pro computer’s registry by double-clicking each of the files that were exported from the Windows 7 Pro computer.

The result after importing the registry entries and re-running the script with the 32 bit wscript:

Well, that is not good, another error message.  But wait, that was the same error message displayed when the 64 bit wscript was used to execute the script file.  Maybe we are making progress.  Let’s fix the LateBinding.vbs script file so that the strDatabase variable is set to a valid database name found in the 32 bit client’s tnsnames.ora file (I will use a database named sample, which has the Oracle sample schema loaded), the strUserName variable is set to a valid database username in the database (I will use the sh user), and the strPassword variable is set to the correct password for the username.

Let’s try executing the script again:

Oh, another error message.  ORA-28000: the account is locked.  I guess that some Oracle rules still apply.  Start up a SQL*Plus session in another Windows command prompt, connect to the database as the SYS AS SYSDBA user, and unlock the SH account (edit November 26, 2012: note that this account should probably be locked again once we finish experimenting):

ALTER USER SH ACCOUNT UNLOCK;

Let’s try executing the script again:

It Worked!  The 32 bit version of wscript on Windows 8 Pro 64 bit was able to connect to the 11.2.0.3 64 bit database running on the same computer and select the current date and time from the database.  But, does the Late Binding example code in Visual Basic 6 work?  Let’s check:

Nice!

[Headache Induced Rant] You may notice a nearly 3 hour time difference between the time output by wscript and the time output by Visual Basic 6.  That is not an error – I spent nearly three hours trying to put this blog article together in WordPress (total time putting together this article was close to 5 hours) on the Windows 8 computer AFTER successfully connecting to the database using the 32 bit version of wscript.  The Sony laptop designer who thought it would be a good idea to place the laptop’s left and right mouse buttons underneath the corners of the touch pad and then thought that it would be a good idea to enable zoom control when one finger is barely touching the location of the left mouse button while the other finger is attempting to scroll up and down the WordPress editor’s page should be taken out behind the building and slapped a couple of times.  I quite literally had to undo an unintended page zoom very close to 50 times while putting together this blog article.  The darn laptop has a touch screen… if I really want to zoom the screen, I will either manually select the zoom feature in the application or put two fingers on the screen and then separate those fingers.  (Edit November 26, 2012: Sony does a good job of burying the option to disable this touch pad feature – it is found in the same application that disables the touch pad’s tap to left-click “feature”.)

And which Microsoft developer thought that it would be a great idea to have a large clock pop up on screen in the lower left (taking away the typing entry focus from the application that was being used) on occasion when someone is trying to type on the computer?  That developer should also be taken out behind the building and slapped. [/Headache Induced Rant]

The above example code used early late (edit: July 24, 2014) binding, which defines variables as variants and then later assigns object types to the variables.  That technique, which now does work and is compatible with wscript, tends to result in poor performance.  Poor performance should be considered an application bug, so let’s build an example that uses early binding (note that the References mentioned earlier in this article must have been selected).  Add the following code to the Early Binding button (note that the code is attempting to connect to the sample database as sh user):

Dim intResult As Integer
Dim strSQL As String
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String

Dim snpData As ADODB.Recordset
Dim dbDatabase As ADODB.Connection

On Error Resume Next

Set dbDatabase = New ADODB.Connection

strDatabase = "sample" '"MyDB" 'From tnsnames.ora
strUserName = "sh" '"MyUserID"
strPassword = "sh" '"MyPassword"

'Connect to the database, specify that the default array fetch size should be 100 rows
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"
dbDatabase.ConnectionTimeout = 40
dbDatabase.CursorLocation = adUseClient
dbDatabase.Open

If (dbDatabase.State <> 1) Or (Err <> 0) Then
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
Else
    Set snpData = New ADODB.Recordset

    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  COUNT(*) NUM_ROWS" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  CUSTOMERS"
    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        If Not (snpData.EOF) Then
            MsgBox "Number of rows in the Customers table: " & Format(snpData("num_rows")), vbInformation
        Else
            MsgBox "No rows returned from the query.", vbCritical
        End If
        snpData.Close
    Else
        MsgBox "The query could not be executed.", vbCritical
    End If
End If

'When finished, clean up
If dbDatabase.State = 1 Then
    dbDatabase.Close
End If

Set snpData = Nothing
Set dbDatabase = Nothing

Let’s see the result of executing the above code:

Wow, 55,500 rows in that sample table, that is a lot of rows.

Finally, let’s add the code to the Query Database button.  This code will use early binding, bind variables, and present the result rows in the grid control.

Dim i As Integer
Dim intResult As Integer
Dim strSQL As String
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String
Dim strOut As String

Dim snpData As ADODB.Recordset
Dim comData As ADODB.Command

On Error Resume Next

Set dbDatabase = New ADODB.Connection

strDatabase = "sample" '"MyDB" 'From tnsnames.ora
strUserName = "sh" '"MyUserID"
strPassword = "sh" '"MyPassword"

'Connect to the database, specify that the default array fetch size should be 100 rows
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"
dbDatabase.ConnectionTimeout = 40
dbDatabase.CursorLocation = adUseClient
dbDatabase.Open

If (dbDatabase.State <> 1) Or (Err <> 0) Then
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
Else
    Set comData = New ADODB.Command
    With comData
        strSQL = "SELECT" & vbCrLf
        strSQL = strSQL & "  CUST_ID," & vbCrLf
        strSQL = strSQL & "  CUST_FIRST_NAME," & vbCrLf
        strSQL = strSQL & "  CUST_LAST_NAME," & vbCrLf
        strSQL = strSQL & "  CUST_GENDER," & vbCrLf
        strSQL = strSQL & "  CUST_YEAR_OF_BIRTH," & vbCrLf
        strSQL = strSQL & "  CUST_MARITAL_STATUS," & vbCrLf
        strSQL = strSQL & "  CUST_STREET_ADDRESS," & vbCrLf
        strSQL = strSQL & "  CUST_POSTAL_CODE," & vbCrLf
        strSQL = strSQL & "  CUST_CITY," & vbCrLf
        strSQL = strSQL & "  CUST_STATE_PROVINCE," & vbCrLf
        strSQL = strSQL & "  COUNTRY_ID" & vbCrLf
        strSQL = strSQL & "FROM" & vbCrLf
        strSQL = strSQL & "  CUSTOMERS" & vbCrLf
        strSQL = strSQL & "WHERE" & vbCrLf
        strSQL = strSQL & "  CUST_FIRST_NAME= ?"

        .CommandText = strSQL
        .CommandType = adCmdText
        .CommandTimeout = 30
        .Parameters.Append .CreateParameter("first_name", adVarChar, adParamInput, 20, "")
        .ActiveConnection = dbDatabase
    End With

    comData("first_name") = "Charles"
    Set snpData = comData.Execute

    If Not (snpData Is Nothing) Then
        If snpData.State = 1 Then
            If Not (snpData.EOF) Then
                grdOutput.Cols = snpData.Fields.Count
                grdOutput.FixedRows = 0
                grdOutput.FixedCols = 0
                grdOutput.Rows = 1

                strOut = ""
                For i = 0 To snpData.Fields.Count - 1
                    strOut = strOut & snpData(i).Name & vbTab
                Next i
                grdOutput.AddItem strOut 'Add the new heading row
                grdOutput.RemoveItem 0 'Remove whatever is on the first row

                Do While Not (snpData.EOF)
                    strOut = ""
                    strOut = strOut & Format(snpData("cust_id")) & vbTab
                    If Not (IsNull(snpData("cust_first_name"))) Then
                        strOut = strOut & snpData("cust_first_name") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_last_name"))) Then
                        strOut = strOut & snpData("cust_last_name") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_gender"))) Then
                        strOut = strOut & snpData("cust_gender") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_year_of_birth"))) Then
                        strOut = strOut & Format(snpData("cust_year_of_birth")) & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_marital_status"))) Then
                        strOut = strOut & snpData("cust_marital_status") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_street_address"))) Then
                        strOut = strOut & snpData("cust_street_address") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_postal_code"))) Then
                        strOut = strOut & snpData("cust_postal_code") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_city"))) Then
                        strOut = strOut & snpData("cust_city") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_state_province"))) Then
                        strOut = strOut & snpData("cust_state_province") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("country_id"))) Then
                        strOut = strOut & snpData("country_id") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    grdOutput.AddItem strOut 'Add the new data row to the grid control

                    snpData.MoveNext
                Loop
            Else
                MsgBox "No rows were returned by the query.", vbInformation
            End If
        Else
            MsgBox "The query could not be executed.", vbCritical
        End If
    Else
        MsgBox "The query could not be executed.", vbCritical
    End If
    dbDatabase.Close
End If

If grdOutput.Rows > 1 Then
    grdOutput.FixedRows = 1
End If

Set snpData = Nothing
Set dbDatabase = Nothing

What does the output look like when the above code is executed?

Nice!  Now the problem.  It is a very bad habit to repeatedly connect to and disconnect from the database, and such a habit may result in a variety of problems, including performance problems.  Ideally, the code to connect to the database should be relocated to a different part of the program, possibly the Form_Load subroutine.

Added December 7, 2012:

I was able to reproduce the above mentioned problem on a Windows 7 Pro computer when the 11.2.0.3 client was installed (without first installing the 11.2.0.1 or 11.2.0.2 client).  Even after importing the above mentioned registry entries, the program (and the VBScript) reported that the “Provider cannot be found”.  As mentioned, after verifying that the above indicated Oracle components were installed, I imported the following registry script (save as a plain text file, rename with a .reg extension, then double-click the file), which combines the three above mentioned registry scripts into a single file:

Windows Registry Editor Version 5.00
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}]
@="OraOLEDB.Oracle"
"OLEDB_SERVICES"=dword:ffffffff
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors]
@="Extended Error Lookup Service"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\OLE DB Provider]
@="Oracle Provider for OLE DB"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ProgID]
@="OraOLEDB.Oracle.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\TypeLib]
@="{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.Oracle"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\ProdID]
@="OraOLEDB.ErrorLookup.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.ErrorLookup.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\TypeLib\{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}\1.0\win32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"

The missing element that I neglected to mention in the original version of this article is that the Oracle 11.2.0.3 Client installer apparently forgets to register the OraOLEDB11.DLL file that is in the Oracle home.  The simple work-around for this problem is to manually register the DLL file, with a command similar to the following executed at a Windows command line (replace C:\Oracle\product\11.2.0\client_1 with the correct location of the Oracle client):

regsvr32 C:\Oracle\product\11.2.0\client_1\bin\OraOLEDB11.DLL

Edit May 22, 2015:

I have not yet verified whether or not the same issue exists with the 11.2.0.4 client, however that client version may be downloaded here once you have logged into the My Oracle Support website.  The above method fixed the problem with the 11.2.0.3 client (even on the Windows 10 beta versions) without any identified side effects.





oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0

22 10 2012

October 22, 2012

Lately, I have been busy with a couple of items that are not completely Oracle Database specific.  I tried experimenting with high definition 1080p security cameras.  After wasting an unacceptable amount of money to purchase two Y-Cam Wireless High Definition 1080p video cameras (model YCBLHD6), and spending in excess of 38 hours fighting with the cameras to make them work as advertised, I wrote a video review of the product.  After writing a review, Y-Cam’s tech support representative contacted me again, so I pulled the cameras back out of the box and wasted another five hours trying to make the cameras work wirelessly – a feature that was lost after installing the third firmware version on the cameras.  The cameras magically connected wirelessly very early the next morning – but wireless communiation was only one of several problems that I was experiencing with this model of security camera.  What might cause a procedure to work magically when that same procedure failed several times in the past – and might this somehow have an odd relation to Oracle Database?  This is the explanation that I provided to Y-Cam’s tech support representative and posted as a follow up on Y-Cam’s support forum:

The camera’s Ethernet port and wireless adapter both have the same hardcoded MAC address. This MAC address is used by network switches to determine which devices are attached to each of the switch ports so that network traffic addressed to a specific MAC address is only sent down the switch port where that MAC address is known to be connected. By contrast a network hub broadcasts network traffic to all ports, rather than just the port to which a hardware device’s MAC address is attached; a hub is less likely to have an issue with two different network adapters having the same MAC address (Y-CAM support reported that they confirmed that the camera with the latest firmware worked with a non-broadcasted WPA2 protected wireless network). The network switch maintains a MAC address table for each switch port – when the camera’s Ethernet port is disconnected and the camera is power cycled, the switch’s MAC address tables may not be immediately updated to remove the camera’s MAC address from the wired Ethernet port so that the MAC address may be added to the switch port used for wireless connectivity. Power cycling the network switch clears the switch’s MAC address tables, allowing the camera’s MAC address to be added to the switch port used for wireless connectivity.

In short, it is necessary to power cycle the switch (or the Cisco Linksys E2000 router acting as only a wireless access point with integrated 4 port gigabit switch in my case) when moving the camera from a wired Ethernet connection to a wireless connection.

Makes a bit of sense… now if only motion detection worked as advertised and stability improved to the point that the camera does not require two or three reboots per day.  Thankfully, most other consumer products tend to work out of the box; imagine the ramifications if every time the weather conditions included rain, a vehicle would sputter, shake violently, and reduce power to 10% of normal when the driver attempted to accelerate.

Changing gears a bit.  You might have noticed that the title of this article appears to be some sort of error message, probably produced by an Oracle product.  I am in the process of writing a program that interfaces with an Oracle Database, rapidly retrieving a variety of information from various database tables – using bind variables in the SQL statements, of course:

The problem?  My program seems to crash randomly on computers running either the 32 bit or 64 bit version of Windows 7 (the program works without issue on computers running the 32 bit version of Windows XP).  The randomness of the crashes makes it a bit difficult to troubleshoot this particular problem, but the vast majority of the crashes are associated with populating the treeview control.  Is the crash caused by a typo (bug) in my program logic?  Is the crash caused by the Windows 7 version of the treeview control?  Is the crash caused by Oracle Database 11.2.0.2, as a result of receiving several different SQL statements in rapid-fire fashion?  Could there be another explanation?

To start the investigation, I first enabled a 10046 trace at level 12.  I then entered an employee ID and started selecting different date ranges until the program crashed.  The crash message varied (as did the point at which the crash happened), and while Windows offered to contact the developer to report the error, I declined the offer.  Unfortunately, I did not capture any of the exact error messages, although I did open one of the debug files created during a crash – “Unhandled exception at 0x0fc01b3d”:

mov cx, word ptr [edi] – Thanks Windows, I think that I will go back to the safety of the 10046 trace files (my best guess is that this instruction is attempting to copy a value from a variable memory pointer).

Digging into the 10046 trace file, I found that this was the last SQL statement executed by my program before the crash:

CLOSE #1301509800:c=0,e=10,dep=0,type=1,tim=4032143047926
=====================
PARSING IN CURSOR #1301508512 len=1452 dep=0 uid=172 oct=3 lid=172 tim=4032143048002 hv=929599856 ad='3eb8207a0' sqlid='4y6m7dsvqj4bh'
SELECT 
  EMPLOYEE_ID, 
  SHIFT_DATE, 
  CLOCK_IN, 
  CLOCK_OUT, 
  HOURS_WORKED, 
  HOURS_VACATION, 
  HOURS_HOLIDAY, 
  HOURS_BEREAVEMENT, 
  HOURS_JURY_DUTY, 
  DECODE(SHIFT_DATE_CODE,'VAC',1,0) VACATION, 
  DECODE(SHIFT_DATE_CODE,'HOL',1,0) HOLIDAY, 
  DECODE(SHIFT_DATE_CODE,'BER',1,0) BEREAVEMENT, 
  DECODE(SHIFT_DATE_CODE,'JUR',1,0) JURY, 
  DECODE(SHIFT_DATE_CODE,'ABS',1,0) ABSENT, 
  DECODE(SHIFT_DATE_CODE,'EXC',1,0) EXCUSED, 
  SUM(DECODE(SHIFT_DATE_CODE,'VAC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) VACATION_COUNT, 
  SUM(HOURS_VACATION) OVER (PARTITION BY EMPLOYEE_ID) VACATION_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'HOL',1,0)) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_COUNT, 
  SUM(HOURS_HOLIDAY) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'BER',1,0)) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_COUNT, 
  SUM(HOURS_BEREAVEMENT) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'JUR',1,0)) OVER (PARTITION BY EMPLOYEE_ID) JURY_COUNT, 
  SUM(HOURS_JURY_DUTY) OVER (PARTITION BY EMPLOYEE_ID) JURY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'ABS',1,0)) OVER (PARTITION BY EMPLOYEE_ID) ABSENT_COUNT, 
  SUM(DECODE(SHIFT_DATE_CODE,'EXC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) EXCUSED_COUNT 
FROM 
  PAY_HOURS
WHERE 
  EMPLOYEE_ID= :1 
  AND SHIFT_DATE BETWEEN :2 AND :3 
  AND COALESCE(SHIFT_DATE_CODE,'ZZZ')<>'NOL' 
ORDER BY 
  COALESCE(SHIFT_DATE_CODE,'ZZZ'), 
  SHIFT_DATE
END OF STMT
PARSE #1301508512:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048001
BINDS #1301508512:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=4d936d40  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=4d936d60  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=4d936d68  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1301508512:c=0,e=129,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048323
WAIT #1301508512: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143048369
FETCH #1301508512:c=0,e=33,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048448
STAT #1301508512 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=43 us cost=6 size=50 card=1)'
STAT #1301508512 id=2 cnt=0 pid=1 pos=1 obj=0 op='WINDOW BUFFER (cr=3 pr=0 pw=0 time=34 us cost=6 size=50 card=1)'
STAT #1301508512 id=3 cnt=0 pid=2 pos=1 obj=0 op='FILTER  (cr=3 pr=0 pw=0 time=21 us)'
STAT #1301508512 id=4 cnt=0 pid=3 pos=1 obj=70154 op='TABLE ACCESS BY INDEX ROWID PAY_HOURS (cr=3 pr=0 pw=0 time=20 us cost=4 size=50 card=1)'
STAT #1301508512 id=5 cnt=0 pid=4 pos=1 obj=71786 op='INDEX RANGE SCAN SYS_C0022057 (cr=3 pr=0 pw=0 time=20 us cost=3 size=0 card=1)'
WAIT #1301508512: nam='SQL*Net message from client' ela= 1448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050020
PARSE #1301509800:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=266456858,tim=4032143050081
WAIT #1301509800: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050142

*** 2012-10-19 09:43:33.522
WAIT #1301509800: nam='SQL*Net message from client' ela= 5761077 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032148811261  

Great, I will take a closer look at the part of my program that executes that SQL statement.  But wait, someone famous once said: “Insanity is repeating the same mistakes and expecting different results.”  Let’s run my program again with a 10046 trace at level 12 enabled.  This time, I found that the following was the last SQL statement executed by my program before the crash:

PARSING IN CURSOR #1078422680 len=1257 dep=0 uid=172 oct=3 lid=172 tim=4031964081030 hv=4064672182 ad='3a68828c8' sqlid='4cj96t7t4bydq'
SELECT 
  LT.TRANSACTION_ID, 
  WO.PART_ID, 
  LT.WORKORDER_BASE_ID, 
  LT.WORKORDER_LOT_ID, 
  LT.WORKORDER_SUB_ID, 
  LT.OPERATION_SEQ_NO, 
  LT.TYPE, 
  LT.INDIRECT_ID, 
  LT.RESOURCE_ID, 
  LT.CLOCK_IN, 
  NVL2(LT.HOURS_WORKED,LT.CLOCK_OUT,NULL) CLOCK_OUT, 
  NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)) HOURS_WORKED, 
  LT.GOOD_QTY, 
  COUNT(*) OVER (PARTITION BY LT.EMPLOYEE_ID) LABOR_TICKETS, 
  SUM(NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2))) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_HOURS, 
  SUM(DECODE(LT.TYPE,'I',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_INDIRECT_HOURS, 
  SUM(DECODE(LT.TYPE,'S',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_SETUP_HOURS, 
  SUM(DECODE(LT.TYPE,'R',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_RUN_HOURS 
FROM 
  LABOR_TICKET LT, 
  WORK_ORDER WO 
WHERE 
  LT.EMPLOYEE_ID= :1 
  AND LT.SHIFT_DATE BETWEEN :2 AND :3 
  AND WO.TYPE(+)='W' 
  AND WO.SUB_ID(+)='0' 
  AND LT.WORKORDER_TYPE=WO.TYPE(+) 
  AND LT.WORKORDER_BASE_ID=WO.BASE_ID(+) 
  AND LT.WORKORDER_LOT_ID=WO.LOT_ID(+) 
  AND LT.WORKORDER_SPLIT_ID=WO.SPLIT_ID(+)
END OF STMT
PARSE #1078422680:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3199062715,tim=4031964081030
BINDS #1078422680:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=40476838  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=40476858  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=40476860  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1078422680:c=0,e=188,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4162394183,tim=4031964081369
WAIT #1078422680: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031964081423
FETCH #1078422680:c=0,e=79,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=4162394183,tim=4031964081539
STAT #1078422680 id=1 cnt=1 pid=0 pos=1 obj=0 op='WINDOW BUFFER (cr=4 pr=0 pw=0 time=81 us cost=9 size=101 card=1)'
STAT #1078422680 id=2 cnt=1 pid=1 pos=1 obj=0 op='FILTER  (cr=4 pr=0 pw=0 time=35 us)'
STAT #1078422680 id=3 cnt=1 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 time=33 us cost=8 size=101 card=1)'
STAT #1078422680 id=4 cnt=1 pid=3 pos=1 obj=69863 op='TABLE ACCESS BY INDEX ROWID LABOR_TICKET (cr=4 pr=0 pw=0 time=28 us cost=7 size=73 card=1)'
STAT #1078422680 id=5 cnt=1 pid=4 pos=1 obj=70966 op='INDEX RANGE SCAN IND_LT_SHIFT_DATE (cr=3 pr=0 pw=0 time=21 us cost=6 size=0 card=1)'
STAT #1078422680 id=6 cnt=0 pid=3 pos=2 obj=70072 op='TABLE ACCESS BY INDEX ROWID WORK_ORDER (cr=0 pr=0 pw=0 time=3 us cost=1 size=28 card=1)'
STAT #1078422680 id=7 cnt=0 pid=6 pos=1 obj=71722 op='INDEX UNIQUE SCAN SYS_C0021989 (cr=0 pr=0 pw=0 time=2 us cost=0 size=0 card=1)'

*** 2012-10-19 09:40:38.424
WAIT #1078422680: nam='SQL*Net message from client' ela= 9641490 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031973723245 

A different SQL statement was found in the trace file when my program crashed… do the SQL statements have anything in common?  Both SQL statements are using analytic functions, and both are using date-type bind variables.  Interesting… what was that saying about insanity?  Let’s try another test of my program with an enabled 10046 trace to see the last SQL statement found in the trace before the crash:

PARSING IN CURSOR #881825936 len=210 dep=1 uid=0 oct=3 lid=0 tim=4030801312382 hv=864012087 ad='3eddb9a08' sqlid='96g93hntrzjtr'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #881825936:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312381
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=2
EXEC #881825936:c=0,e=95,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312551
WAIT #881825936: nam='db file sequential read' ela= 6526 file#=1 block#=98252 blocks=1 obj#=427 tim=4030801319112
WAIT #881825936: nam='db file sequential read' ela= 759 file#=1 block#=100271 blocks=1 obj#=425 tim=4030801319923
FETCH #881825936:c=0,e=7396,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801319963
STAT #881825936 id=1 cnt=1 pid=0 pos=1 obj=425 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=2 pw=0 time=7394 us)'
STAT #881825936 id=2 cnt=1 pid=1 pos=1 obj=427 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=1 pw=0 time=6583 us)'
CLOSE #881825936:c=0,e=46,dep=1,type=3,tim=4030801320046
...
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=11
EXEC #881825936:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321733
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801321767
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801321802
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=33
EXEC #881825936:c=0,e=152,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321992
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322027
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322062
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=35
EXEC #881825936:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801322195
FETCH #881825936:c=0,e=18,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322229
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322265
WAIT #881751720: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030801323194

*** 2012-10-19 09:21:14.806
WAIT #881751720: nam='SQL*Net message from client' ela= 8851219 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030810174461 

You might have noticed that the trace file showed dep=1 in the PARSING IN CURSOR line, which indicates that the SQL statement was executed automatically outside of my program’s control – likely when attempting to optimize another SQL statement that was submitted by my program.

Let’s test that insanity theory again.  Another execution of my program with an enabled 10046 trace at level 12 showed this as the last SQL statement executed:

PARSING IN CURSOR #767713824 len=503 dep=0 uid=172 oct=3 lid=172 tim=4026166358219 hv=2534735629 ad='3b58041b0' sqlid='7x2t5dqbj9zsd'
SELECT 
  N.ID NCMT_ID, 
  N.NCMT_DATE, 
  N.PART_ID, 
  N.WORKORDER_BASE_ID, 
  N.WORKORDER_LOT_ID, 
  N.WORKORDER_PIECE_NO, 
  N.QTY, 
  N.DISCREPANCY_TYPE, 
  N.RESOURCE_ID, 
  N.CUSTOMER_ID, 
  NVL(N.TOTAL_COST,0) NCMT_COST, 
  COUNT(N.ID) OVER (PARTITION BY N.EMPLOYEE_ID) NCMT_COUNT, 
  SUM(NVL(N.TOTAL_COST,0)) OVER (PARTITION BY N.EMPLOYEE_ID) TOTAL_COST 
FROM 
  NON_CONFORMING N 
WHERE 
  N.EMPLOYEE_ID= :1 
  AND N.NCMT_DATE BETWEEN :2 AND :3 
  AND UPPER(N.ERROR_BY)='OPERATOR' 
ORDER BY 
  ID
END OF STMT
PARSE #767713824:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026166358219
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc25d58  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc25d78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc25d80  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
...
PARSE #767713824:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045440
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc21f50  bln=32  avl=07  flg=05
  value="DUDE2"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc21f70  bln=07  avl=07  flg=01
  value="6/22/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc21f78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #767713824:c=0,e=125,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045593
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237045634
FETCH #767713824:c=0,e=57,p=0,cr=10,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045727
WAIT #767713824: nam='SQL*Net message from client' ela= 3080 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237048857
CLOSE #767713824:c=0,e=7,dep=0,type=3,tim=4026237048899
PARSE #767713824:c=0,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237048951
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237051177

*** 2012-10-19 08:06:40.151
WAIT #767713824: nam='SQL*Net message from client' ela= 98732762 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026335783988 

Well, that SQL statement also used date-type bind variables and analytic functions… maybe there is a problem with Oracle Database 11.2.0.2?  But, why is this suddenly a problem… what has changed?  Nearly two years ago I wrote a blog article that described an ODBC update problem that was present in the Oracle 11.2.0.1 Client – the first Oracle Client version that could be easily downloaded and installed on Windows 7 (it is also possible to patch the downloadable 10.2.0.3/4 client to 10.2.0.5).  The Oracle 11.2.0.1 patch 7 (10155837 for 32 bit Windows clients) fixes the false ODBC lock violation errors that are reported in Microsoft Access (and various other programs) when UPDATE SQL statements are executed, and those SQL statements contain bind variables.  That same Oracle Client version and patch version are still installed on the computer (and all other computers here that run Windows 7), so that is not the source of the change.  On a side note, have you found it difficult to locate updated 11.2.0.2 or 11.2.0.3 client versions in MOS (Metalink)?

So, the 10046 trace failed to find a consistent source for the crashes, and the Oracle client has not been changed on the Windows 7 computer in nearly two years.  Program bug?  Analytic query bug?  Bind variable related bug?  I went digging for the sqlnet.log file to see if that file provided any clues.  This is what I found:

Fri Oct 19 13:26:52 2012
Directory does not exist for read/write [C:\Oracle\product\11.2.0\client_1\log] [C:\Oracle\product\11.2.0\client_1\log\diag\clients]
Errors in file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc  (incident=401):
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
Incident details in: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc

That file’s contents proved to be a bit helpful, I think.  Let’s take a look at the contents of the file mentioned above:

Dump file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc
Dump continued from file: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
========= Dump for incident 401 (oci 24550 [3221225477]) ========
Tracing is in restricted mode!
----- Short Call Stack Trace -----
_dbgexPhaseII()+850<-_dbgexProcessError()+2061<-_dbgeExecuteForError()+43<-_dbgePostErrorDirect()+2368<-_kpeDbgSignalHandler()+225<-_skgesig_Win_UnhandledExceptionFilter()+140<-75D6003D<-7295FA2E<-0040AD0D<-77309EF0<-77309EC0<-00000000<-773079DD<-77306E72<-772FE0ED<-275C83DD<-275D219F<-275D1D88<-275D0E0E<-2759C677<-004CB035<-004DC4A3<-72991D31<-00410FD8<-7299202F----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
End of Incident Dump

Might the above indicate that my program logic is not the cause of the problem, but instead that it is the Oracle client that is crashing?  I started digging through various websites looking for a similar oci-24500 error message, but I found nothing that was helpful.  I searched through the bug database and technical article database on MOS (Metalink), and found nothing that was helpful.  By chance, I widened the search on MOS a bit and stumbled upon Mark Powell’s message in this Oracle Support Community thread (you need a MOS account to access the message).  That message pointed to two other MOS articles, at least one of which stated that the problem is (possibly – not exactly the same error message) corrected when the Oracle 11.2.0.3 Client is installed.  I struggled finding the 11.2.0.3 Client in MOS, and then found that it is located on disk 3 of 6 of the 11.2.0.3 server download (direct link – just specify operating system version and download disk 3 – p10404530_112030_platform_3of6.zip (requires a MOS account to download)).  The 11.2.0.3 client version magically corrected the seemingly random crashes in my programprogrammer error averted.

Insanity?  No more so than an expensive security camera that fails at motion detection, or a vehicle that decelerates when the driver commands the vehicle to accelerate during a light rain.  ;-)

P.S.: Sorry about the formatting of the blog article.  A WordPress update apparently disabled my custom CSS that expanded the article to the full width of the browser window – I am still in search of a fix for that problem.








Follow

Get every new post delivered to your Inbox.

Join 166 other followers