Install the Oracle Client on a Synology DiskStation DS1813+ or DS412+ for Nagios to Monitor Databases

26 07 2013

July 26, 2013

(Back to the Previous Post in the Series)

In the previous article of this series I provided steps to install and run the Nagios network monitoring utility on either a Synology DiskStation DS1813+ or a DS412+ for the purpose of pinging network devices to verify that the devices respond to ping requests.  That article may have seemed a bit out of place on a blog that is primarily intended to include notes about using Oracle Database.  In the previous article I hinted that there is a check_oracle plugin for Nagios, suggesting that a Synology DiskStation DS1813+ or DS412+ could be used to monitor Oracle databases.

This blog article does NOT use the check_oracle plugin.  There are a small handful of official Nagios plugins for Oracle Database, but this article does not use any of those plugins either.  Instead, I took a chance at crafting a custom Perl script to monitor Oracle databases, the first time I have ever used Perl.  I thought back to some of the earlier blog articles where I introduced a couple of Oracle Database monitoring scripts that were written in VBScript, including Working with Oracle’s Time Model Data 3 and Oracle Statistics Chart Viewer – I started seeing the possibilities of using Nagios to alert DBAs about potential issues.  Yet, I have never used Perl, and at the time did not even have a clue how to concatenate strings in the language (it turns out that there are more than two techniques).

Verify that Perl is installed on the DiskStation by accessing the Package Center in the DiskStation’s DSM web interface.  If there is an Install button under the Perl heading, that indicates that Perl is not yet installed – click the Install button and wait until that button changes to Installed.

InstallOracleClient-1

Let’s start by installing the Oracle Client on the DiskStation.  I will use the Oracle Instant Client version 11.2.0.3 (the 12.1.0.1 Instant Client throws a different error message than did the 11.2.0.3 and 11.1.0.7 clients, so I reverted back to the 11.2.0.3 version).  To download the Oracle Instant Client for Linux, visit this link.  You will need an OTN account to download the client files – I downloaded the files using a desktop computer, and then transferred the files to a custom created Config share that I set up on the DiskStation.  This article requires the Basic Instant Client package and the SQL*Plus add-on package for the Instant Client.

Connect to the DiskStation using Telnet as the root user (see the previous article for Telnet connection directions).  For consistency, the .zip files for the Oracle Instant Client will be copied to the downloads directory that was created in the previous article.  We will create an oracle directory in the downloads directory:

mkdir /volume1/downloads/oracle/

Next, the downloaded files for the Oracle Instant Client are copied from the config share that I created on the DiskStation (to allow transporting files from a desktop computer) to the /volume1/downloads/oracle directory that was just created, and then the Oracle Instant Client files are unzipped, which automatically creates the instantclient_11_2 directory:

cp /volume1/config/instantclient-basic-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cp /volume1/config/instantclient-sqlplus-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cd /volume1/downloads/oracle/
unzip -o instantclient-basic-linux-11.2.0.3.0.zip
unzip -o instantclient-sqlplus-linux-11.2.0.3.0.zip
cd instantclient_11_2

The Oracle Instant Client 11.2.0.3 (and 11.1.0.7) require a file named libaio.so.1 that is not installed by default on the DiskStation.  After searching the Internet for a while, I found a copy of that file in the /lib directory on a Red Hat Enterprise Linux 3 server.  Using a desktop computer, I copied that file to the DiskStation’s config share, as well as a copy of the tnsnames.ora file that includes the database connection information for the databases that must be monitored.  The libaio.so.1 file will be placed in the /volume1/downloads/oracle/instantclient_11_2 directory, and the tnsnames.ora file will be placed in the /etc directory (one of the directories that is automatically searched).

cp /volume1/config/libaio.so.1 .
cp /volume1/config/tnsnames.ora /etc

Let’s start up SQL*Plus:

./sqlplus /nolog
./sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory

An error…

ls
BASIC_README    adrci   glogin.sql   libclntsh.so.11.1  libocci.so.11.1  libocijdbc11.so  libsqlplusic.so  ojdbc6.jar  sqltest.sql  xstreams.jar
SQLPLUS_README  genezi  libaio.so.1  libnnz11.so        libociei.so      libsqlplus.so    ojdbc5.jar       sqlplus     uidrvci

The file mentioned in the error message exists in the correct directory.  That error is caused by an undefined environment variable.  Fixing the error and trying again:

LD_LIBRARY_PATH="/volume1/downloads/oracle/instantclient_11_2"
export LD_LIBRARY_PATH
./sqlplus /nolog

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 26 13:22:16 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL>

Try to connect to one of the databases that are defined in the tnsnames.ora file, and then try executing a simple SQL statement before exiting SQL*Plus (change testuser, password, and DBName as appropriate for your environment):

CONNECT testuser/password@DBName
Connected.
SELECT SYSDATE FROM DUAL;

SYSDATE
---------
26-JUL-13

EXIT

Now that we have verified that the Instant Client version of SQL*Plus works from the DiskStation, we will create a simple SQL script to verify that the Instant Client version of SQL*Plus is able to accept script names from a command line.  The script will be created in the same directory where the Oracle Instant Client is located (see the basic directions for using vi, as found in the previous article):

vi /volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The first line in the script instructs SQL*Plus to abort the execution of the script when an error is encountered; if the connection attempt fails, there is no point in attempting to execute any SQL statements that follow.  The second line connects to the database (change testuser, password, and DBName as appropriate for your environment).  The third line executes a simple SQL statement, selecting from a table (rather than the virtual table DUAL) – change the SQL statement to a valid statement for your database.  The last statement exits SQL*Plus once the SQL statement finishes executing:

WHENEVER SQLERROR EXIT SQL.SQLCODE
CONNECT testuser/password@DBName
SELECT ID, DESCRIPTION FROM T1 WHERE ID LIKE '8X%';
EXIT

Try using SQL*Plus to execute the script:

./sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The results from the SQL statement should display in the Telnet window, and then the normal DiskStation prompt should appear.  If the database instance was not running at the time of the execution, you would see something like this rather than the results of the SQL statement:

ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Process ID: 0
Session ID: 0 Serial number: 0

Nagios apparently expects its plugins to return one of four exit codes, as explained in this article, to determine whether or not the checked object is behaving as expected:

Exit Code Status
        0 OK
        1 WARNING
        2 CRITICAL
        3 UNKNOWN

The exit function in Perl permits returning a number, so that is one problem solved.  I want the Perl plugin to accept a command line parameter from Nagios so that I am able to check more than one database using the same Perl script – the Nagios command line parameter will specify the script to execute.  This article explains how to receive a command line parameter from Nagios, so that is a second problem that is solved.

Let’s put together a simple (OK, not simple for me) script that executes a SQL*Plus script that is specified by Nagios (the sqltest.sql file that was just created), and then returns 0 if no ORA- type error message is returned when the Oracle Instant Client’s SQL*Plus executes a script.  If an ORA- type error message is returned, then the script’s exit code is set to 2 and the ORA- errors are returned to Nagios.  Nagios apparently only accepts one line of output from the script, reading what is written by Perl’s print command.  As we saw earlier, the script may return multiple nested ORA- type errors if the database instance is down, so we need these multiple error messages to appear on one line of output from the Perl script.

vi /opt/libexec/check_oracle.pl

This is the script that I crafted after spending, quite literally, hours days searching the Internet for Perl command syntax that would execute on the DiskStation:

#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(:config no_ignore_case);
my $output = "";
my ($script);

my $result = GetOptions(
 "s|script=s"         => \$script,
  );

$ENV{"LD_LIBRARY_PATH"} = "/volume1/downloads/oracle/instantclient_11_2/";

my $ExitValue = 0;
print "Test Script: " . $script . ":  ";
open my $in, "/volume1/downloads/oracle/instantclient_11_2/sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/$script |";

while (my $line = <$in>)
  {
  if ($line =~ /^(ORA-\d{5})/)
    {
    #$line =~ s/\v//g;
    chomp($line);
    $output = $output . $line . " ; ";
    $ExitValue = 2;
    }
  }

close($in);
print $output;
exit($ExitValue);

The script defines the LD_LIBRARY_PATH environment variable, so the Nagios user should (hopefully) have no problems executing the Perl script.  There are apparently a half-dozen different ways to remove the end of line characters from the SQL*Plus output, chomp($line); worked for my test, but $line =~ s/\v//g; did not work quite as expected.

Save the script and exit vi.  Next, we need to make the Perl script executable:

chmod +x /opt/libexec/check_oracle.pl

In the earlier article we had to modify a couple of the Nagios configuration files – this time we will need to modify some of the same files, starting with the commands.cfg file.

vi /opt/etc/objects/commands.cfg

Locate the ‘check_ping’ command definition that we modified in the earlier article.  Below the } character for that command definition, add the following, which will tell Nagios that our Perl script exists, and that the script command should be recognized by Nagios as check_oracle_ch:

# check_oracle_ch command definition by Charles Hooper
define command{
        command_name    check_oracle_ch
        command_line    $USER1$/check_oracle.pl -s $ARG1$
        }

Save the commands.cfg file and exit vi.  Let’s create another Nagios configuration file that will be used to list the Oracle Databases (and their scripts) that will be checked:

vi /opt/etc/objects/oracle.cfg

On a new line in the oracle.cfg file, add the following:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server123       ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST   ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest.sql    ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Additionally, we need to instruct Nagios that there is now an additional configuration file to read when starting.  Save the oracle.cfg file and exit vi.  Execute the following command:

vi /opt/etc/nagios.cfg

Below the cfg_file=/opt/etc/objects/switch.cfg line in the file (added in the previous article), add the following line:

cfg_file=/opt/etc/objects/oracle.cfg

Save the nagios.cfg file and exit vi.  Since we have modified the Nagios configuration, we must restart Nagios.  Find the first Nagios process ID, then kill that process (as was described in the previous article):

ps

kill 2683

Verify that Nagios does not return an error message when reading the configuration files:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are reported, then start Nagios:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Nagios should be writing log entries to a file name nagios.log (as defined in the nagios.cfg file).  The tail command permits viewing the last few (100 in this case) lines from a specified file:

tail -n 100 /opt/var/nagios.log

If Perl is not installed, Nagios may send an email with the following in the Additional Info section:

(Return code of 127 is out of bounds - plugin may be missing)

If the database instance is down, Nagios should send an email, possibly with the following in the Additional Info section (note that the script name is listed first – a different script will be used to test each database):

Test Script: sqltest.sql:  ORA-01034: ORACLE not available : ORA-27101: shared memory realm does not exist :

When the database instance is running, and the SQL statement(s) in the script executes successfully for the first time, Nagios should send an email with just the test script listed in the Additional Info section:

Test Script: sqltest.sql:

Now that the we have confirmed that Nagios and the Perl script work OK together, create additional SQL scripts in the /volume1/downloads/oracle/instantclient_11_2/ directory to connect to the other databases to be monitored (strongly consider changing the permissions for these files, since the files include usernames and passwords for your databases).  For each of those SQL scripts, create an additional service entry in the oracle.cfg file.  For example, if a script named sqltest42.sql is created to test a database on server42, the service entry might be created as shown below:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server42        ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST42 ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest42.sql  ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Don’t forget to verify the nagios configuration before restarting Nagios.

Anyone up to the challenge of rewriting my Working with Oracle’s Time Model Data 3 script in Perl so that the script will work on the DiskStation?





Install Nagios on a Synology DiskStation DS1813+ or DS412+

19 07 2013

July 19, 2013 (Modified July 27, 2013, July 28, 2013, November 19, 2013)

(Forward to the Next Post in the Series)

—-

Update July 27, 2013:

  • I now have Nagios running on an ARM based Synology DiskStation DS212+.  Most of the steps are the same as outlined below, however there are a few additional errors that must be addressed (see below additional steps).
  • All of the ./configure commands should have included –prefix=/opt (rather than –prefix=/usr/local or completely omitting that parameter).  That change eliminates the need to copy the Nagios plugins to the correct location.  Possibly related, the -i parameter was unnecessary for the snmp and Nagios plugins make and make install commands when the ./configure command included the –prefix=/opt prefix.
  • The wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20×86%20Linux%203.2.11&#8230; download step for the gcc compiler is apparently unnecessary, at least on the Synology DiskStation DS212+ (see below).

—-

This article describes how to compile and run Nagios on a Synology DiskStation DS1813+ (64 bit) or Synology DiskStation DS412+ (32 bit, the 32 bit steps should also apply to the DS1812+) NAS, both of which utilize Intel Atom processors (cat /proc/cpuinfo indicates that the DS412+ is using a 2.13GHz Atom D2700, while the DS1813+ is using a 2.13GHz Atom D2701), and utilize the DSM 4.2 operating system.  Not all Synology DiskStation NAS devices use Intel based CPUs – some of the less expensive DiskStations use ARM type processors (see this link to determine the type of CPU installed in a specific DiskStation).  It may be possible to produce a working version of Nagios on NAS devices that do not have Intel 32 bit or 64 bit processors, but I have not yet fully tested the procedure.

Warning: A lot of what follows is based on experimentation, with the end goal of having Nagios running on a Synology DiskStation having the ability to ping devices on the network or the Internet, with an email sent to an administrator when a device stops responding to ping requests, and to send a second email when the device resumes responding to ping requests.  This functionality represents a small fraction of Nagios’ capabilities through the use of plugins.  File paths vary from one Linux distribution to the next, so that adds a bit of challenge to make certain that the files are placed in the required directory.  Copying a file to the wrong directory may temporarily disable the DiskStation and require the reinstallation of the Synology DSM operating system.  The directions below are not final, and quite likely do not represent the most efficient approaches to accomplish the end goal – but the directions will hopefully be “close enough to correct” to allow the average reader of this blog to ping and send email alerts from a DiskStation.

I have relied on the free Nagios network monitoring solution since 2002 to provide an early warning of problems associated with network attached equipment including servers, production floor computers, switches, printers, wireless access points, IP cameras, Internet connection stability, etc.  While I rely on Nagios’ alerting system, I am not an expert at configuring the Nagios network monitoring system; the Nagios configuration documentation may be downloaded here.

First, make certain that the Telnet Service (or SSH Service if that is preferred) is enabled on the DiskStation.  In the DiskStation’s Control Panel, click Terminal.

InstallNagiosDiskStation1

Place a checkmark next to Enable Telnet service (if the item is not already checked), and then click the Apply button.

InstallNagiosDiskStation2

Verify that the computer that you intend to use has a Telnet client.  For Windows 7, access the Programs link in the Control Panel, and then click the Turn Windows features on or off link.  Make certain that there is a checkmark next to Telnet Client, then click the OK button.

InstallNagiosDiskStation3

Open a command line (in Windows, Start – Run – type  cmd  and press the Enter key).  On the command line, type telnet followed by either the name of the DiskStation or the IP address of the DiskStation, then press the Enter key.  When prompted for a username, type root and press the Enter key.  Type the admin user’s password (that is used to access the DSM interface in a web browser) and press the Enter key.

InstallNagiosDiskStation4

The command line on the DiskStation is very similar to the command line on a Unix or Linux computer, and is somewhat similar to a Windows command line or MS-DOS command line (use / rather than \, use ls rather than dir, use vi rather than edit):

InstallNagiosDiskStation5

We first need to add ipkg support to the DiskStation, detailed directions may be viewed at this link.  The exact directions may be different for other DiskStation models, but the following directions work for both the DS1813+ and DS412+ (note that all files downloaded from the Internet will be placed on volume1 in the downloads directory – copy and paste the lines to the Telnet session, one line at a time):

cd /volume1
mkdir downloads
cd downloads
wget http://ipkg.nslu2-linux.org/feeds/optware/syno-i686/cross/unstable/syno-i686-bootstrap_1.2-7_i686.xsh
chmod +x syno-i686-bootstrap_1.2-7_i686.xsh
sh syno-i686-bootstrap_1.2-7_i686.xsh

The vi editor is used on the DiskStation to modify files; that vi editor is a bit challenging to use at first sight, so you may need help with a couple of basic commands (see this quick reference for other commands).  The commands in vi are case sensitive (i is not the same as I).  When a file is opened, press the i key on the keyboard to allow making changes to the file (such as typing commands, or deleting commands).  When finished making changes to the file press the Esc key.  Once the Esc key is pressed, type ZZ to save the changed file and quit, or :q! to quit without saving the changes.

Next, we must modify the file that establishes the environment for the root user, when that user connects to the DiskStation.  This change is needed as part of the ipkg installation.  Edit the .profile file used by the root user:

vi /root/.profile

Add a # character in front of the two lines that contain the word PATH, then save the file (see the brief directions above to switch between command and insert mode in vi):

InstallNagiosDiskStation6

Next, reboot the DiskStation by clicking the Restart button in the Synology DSM interface (note: it should be possible to type reboot in the Telnet interface, however the DiskStation locked up the one time I attempted to execute that command).

InstallNagiosDiskStation7

Once the DiskStation reboots, reconnect to the DiskStation using Telnet, connecting as the root user, just as was done earlier.

The ipkg command should now work on the command line.  First, request that an updated list of available packages is downloaded, then display that list of packages:

ipkg update
ipkg list

Next, download a couple of packages that will be used by the Nagios network monitoring tool.  Note that using ipkg to install packages is a lot easier than compiling source code, so have fun with the ipkg utility.  When installing the optware-devel package, an error may appear stating that there is an incompatibility between wget and wget-ssl – just ignore that error for now.

ipkg update wget-ssl
ipkg install optware-devel
ipkg install gcc
ipkg install libtool
ipkg install mysql

Next, we need to compile a file and copy a couple of files:

cd /opt/share/libtool/libltdl/
./configure --prefix=/opt
make all
make install

cp /usr/syno/apache/modules/mod_ext_filter.so /opt/libexec/mod_ext_filter.so
cp /usr/syno/apache/modules/*.* /opt/libexec/

Now, install the Apache package:

ipkg install apache

If an error message is displayed on screen about mod_ext_filter.so, then modify the /opt/etc/apache2/httpd.conf file and add a # in front of the line LoadModule ext_filter_module libexec/mod_ext_filter.so and save the file.  Re-execute the ipkg install apache command (note that the up arrow on the keyboard may be pressed to quickly retype one of the previously executed commands).

InstallNagiosDiskStation8

Using the DiskStation’s Control Panel, create a nagios group and a nagcmd group (the nagcmd group probably will not be used for anything specific).  These groups do not require any special DiskStation permissions.

InstallNagiosDiskStation9

Using the DiskStation’s Control Panel, create a nagios user and add that user to the nagios and nagcmd groups.  The nagios user does not require any specific DiskStation permissions.

Next, switch back to the Telnet session, download the Nagios source code, and compile the source code:

DiskStation DS212+ Notes:

The following ./configure call was used on the DS212+:

./configure --prefix=/opt --with-command-group=nagios --disable-nanosleep --enable-nanosleep=no

The ./configure aborted with the following error message:

checking for pthread_create in -lpthread... no
checking for pthread_mutex_init in -lpthread... no
checking for pthread_create in -lpthreads... no
checking for pthread_create in -llthread... no
checking if we need -pthread for threads... no
checking for library containing nanosleep... no
Error: nanosleep() needed for timing operations.

The test that threw the error is located roughly 63% of the way through the configure file (on roughly line 5635).  If the exit 1 line in the configure file is commented out, then the configure step will complete.  However, the make all command will then fail with the following error messages:

/volume1/downloads/nagios/base/nebmods.c:363: undefined reference to `dlclose'
nebmods.o: In function `neb_load_module':
/volume1/downloads/nagios/base/nebmods.c:218: undefined reference to `dlopen'
/volume1/downloads/nagios/base/nebmods.c:249: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:266: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:299: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:225: undefined reference to `dlerror'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v2@GLIBC_PRIVATE'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v2@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v1@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v1@GLIBC_PRIVATE'
collect2: ld returned 1 exit status
make[1]: *** [nagios] Error 1
make[1]: Leaving directory `/volume1/downloads/nagios/base'
make: *** [all] Error 2

After a bit of searching on the Internet, I found a page that suggested making the following changes (note that I unsuccessfully tried a couple of other steps that may have also partially corrected the issue):

mkdir /opt/arm-none-linux-gnueabi/lib_disabled
mv /opt/arm-none-linux-gnueabi/lib/libpthread* /opt/arm-none-linux-gnueabi/lib_disabled

cp /lib/libpthread.so.0 /opt/arm-none-linux-gnueabi/lib/
cd /opt/arm-none-linux-gnueabi/lib/
ln -s libpthread.so.0 libpthread.so
ln -s libpthread.so.0 libpthread-2.5.so

After making the above changes, I was able to run the configure and make all commands without receiving an error.

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz
tar xzf nagios-3.5.0.tar.gz
cd nagios
./configure --prefix=/opt --with-command-group=nagios
make all
make install
make install-init
make install-config
make install-commandmode

We apparently need to copy a couple of files to different locations at this point:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Undo the changes that were earlier made to the /root/.profile file, where # characters were added in front of any line that contained the word PATH.  Remove those # characters and save the file:

vi /root/.profile

(This part still needs some fine tuning to make the web interface work with Nagios.)  Edit the Nagios Makefile and change the line beginning with HTTPD_CONF to show HTTPD_CONF=/opt/etc/apache2/conf.d  Then save the file.

cd /volume1/downloads/nagios
vi Makefile

InstallNagiosDiskStation10

Execute the following command:

make install-webconf

Create a nagiosadmin user for the web administration, specify a password when prompted:

htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Update November 19, 2013:

GabrielM reported in a comment below that it may be necessary to specify the full path to the htpasswd program:

/usr/syno/apache/bin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Install a couple of additional ipkg packages that will be used by Nagios (the last package adds a ping utility that may be used by Nagios – the security permissions on the DiskStation prevent non-root users from using the built-in ping utility):

ipkg install openssl
ipkg install openssl-dev
ipkg install sendmail
ipkg install inetutils

A step that may or may not be required is to download a functioning C++ compiler (some of the commands below point to files provided with the C++ compiler) – it appears that there should already be a compiler on the DiskStation at this point (in /opt/bin), so the successful completion of this task of downloading a usable C++ compiler might not be required.

DiskStation DS212+ Notes:

These wget and tar steps were completely skipped on the DS212+

For the DiskStation DS1813+ 64 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Cedarview%29/gcc420_glibc236_x64_cedarview-GPL.tgz
tar zxpf gcc420_glibc236_x64_cedarview-GPL.tgz -C /usr/local/

For the DiskStation DS412+ 32 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.2%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Bromolow%29/gcc421_glibc236_x86_bromolow-GPL.tgz
tar zxpf gcc421_glibc236_x86_bromolow-GPL.tgz -C /usr/local/

Now the net-snmp source code is downloaded and extracted:

DiskStation DS212+ Notes:

The ./configure call on the DS212 (might also work on the other DiskStation models):

./configure –prefix=/opt

The make call threw several errors, including:

/bin/sh: arm-none-linux-gnueabi-ld: not found
make[2]: *** [../blib/arch/auto/NetSNMP/default_store/default_store.so] Error 127

Before running the make command on the DS212+, execute the following command:

ln -s /opt/bin/ld /opt/bin/arm-none-linux-gnueabi-ld

The -i parameter may be omitted when running the make and make install commands.

cd /volume1/downloads
wget http://sourceforge.net/projects/net-snmp/files/net-snmp/5.7.2/net-snmp-5.7.2.tar.gz
tar xzf net-snmp-5.7.2.tar.gz
cd net-snmp-5.7.2

For the DiskStation DS1813+ 64 bit, execute the following to compile the net-snmp source (note that this command uses the compiler that was downloaded):

env CC=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-gcc \
LD=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ld \
RANLIB=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ranlib \
CFLAGS="-I/usr/local/x86_64-linux-gnu/include" \
LDFLAGS="-L/usr/local/x86_64-linux-gnu/lib" \
./configure --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux --prefix=/usr/local

For the DiskStation DS412+ 32 bit, execute the following to compile the net-snmp source (note: I could not use any of the different compilers that I tried downloading due to the compilers crashing with one of two error messages, so this command uses the compiler in /opt/bin):

env CC=/opt/bin/i686-linux-gnu-gcc \
LD=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ld \
RANLIB=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-linux-gnu/lib" \
./configure --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local

Several prompts will appear on the screen when either of the two commands is executed.  I entered the following for the prompts:

Default version of SNMP to use (3): 3
System Contact Information: (Enter)
System Location (Unknown): (Enter)
Location to write logfile (/var/log/snmpd.log): /opt/var/snmpd.log
Location to write persistent information (/var/net-snmp): (Enter)

Two additional commands to execute:

make -i
make install -i

Now we need to download the source code for the Nagios plugins (check_apt, check_breeze, check_by_ssh, check_clamd, check_cluster, check_dhcp, check_disk, check_disk_smb, check_dns, check_dummy, check_file_age, check_flexlm, check_ftp, check_http, check_icmp, check_ide_smart, check_ifoperstatup, check_ifstatus, check_imap, check_ircd, check_jabber, check_ldap, check_ldaps, check_load, check_log, check_mailq, check_mrtg, check_mrtgtraf, check_mysql, check_mysql_query, check_nagios, check_nntp, check_nntps, check_nt, check_ntp, check_ntp_peer, check_ntp_time, check_nwstat, check_oracle, check_overcr, check_ping, check_pop, check_procs, check_real, check_rpc, check_sensors, check_simap, check_smtp, check_snmp, check_spop, check_ssh, check_ssmtp, check_swap, check_tcp, check_time, check_udp, check_ups, check_users, check_wave) that allow Nagios to perform various monitoring tasks:

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
tar xzf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16/

Update November 19, 2013:

GabrielM reported in a comment below that the occasionally changing “current version” of the Nagios plugins makes it difficult to download the plugins from the source shown above.  If you open the http://prdownloads.sourceforge.net/sourceforge/nagiosplug/ web page in a web browser, the web browser will be redirected to http://sourceforge.net/projects/nagiosplug/files/ which contains the following statement:

“The Nagios Plugins are no longer distributed via SourceForge. For downloads and other information, please visit: https://www.nagios-plugins.org/
Source: README.md, updated 2013-10-01″

If you follow that link and then click the Download heading at the top of the page, there should be a link on the page that allows access to the current version of the Nagios plugins.  That link is currently: https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

The command that GabrielM provided should work:

wget https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

DiskStation DS212+ Notes:

The following configure, make, and make install commands were used:

./configure --prefix=/opt --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w"
make
make install

For the DiskStation DS1813+ 64 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux
make -i 
make install -i

For the DiskStation DS412+ 32 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local
make -i 
make install -i

Copy the Nagios plugins to the location expected by Nagios:

DiskStation DS212+ Notes:

The plugins were installed in the correct location on the DS212+

cp /usr/local/nagios/libexec/*.* /opt/libexec
cp /usr/local/nagios/libexec/* /opt/libexec
cp /usr/local/libexec/check_* /opt/libexec

Update November 19, 2013:

GabrielM reported in a comment below that the third command above may fail.  Depending on the compile options used, the first two commands or the third command may fail.  The first two commands are intended to accomplish the same task as the third command; the first two commands or the last command are expected to fail, but all three commands should not fail.  I should have explained this potential area of concern better.

Copy the Nagios startup script to the correct location so that Nagios will automatically start when the DiskStation is rebooted:

cp /usr/local/etc/rc.d/nagios /opt/etc/init.d/S81nagios

Verify that the ownership of the nagios directory is set correctly:

DiskStation DS212+ Notes:

The file is actually in the /opt/bin directory, so use this command instead:

chown nagios:nagios /opt/bin/nagios/nagios -R
chown nagios:nagios /usr/local/nagios -R

In addition to the main /opt/etc/nagios.cfg Nagios file, there are several other configuration files that are potentially used by Nagios (defined in the nagios.cfg file):

/opt/etc/objects/commands.cfg
/opt/etc/objects/contacts.cfg
/opt/etc/objects/timeperiods.cfg
/opt/etc/objects/templates.cfg
/opt/etc/objects/localhost.cfg
/opt/etc/objects/windows.cfg
/opt/etc/objects/server.cfg
/opt/etc/objects/switch.cfg
/opt/etc/objects/printer.cfg

We need to make a couple of adjustments in the  /opt/etc/objects/commands.cfg file.

vi /opt/etc/objects/commands.cfg

Change the ‘notify-host-by-email’ command definition section as follows:

define command{
    command_name notify-host-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘notify-service-by-email’ command definition section as follows:

define command{
    command_name notify-service-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘check_ping’ command definition section as follows (feel free to read the documentation for check_ping and specify different values):

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000,25% -c 5000,90% -p 3 
        }

Save the file and exit vi.

At this point, the Nagios network monitoring utility will likely experience an error similar to the following when attempting to send an alert email:

output=collect: Cannot write ./dfr6BFFPC7027203 (bfcommit, uid=1026, gid=25): Permission denied

Execute the following commands, which should fix the above problem:

chmod g+w /opt/var/spool/clientmqueue
chmod 444 /opt/etc/mail/*.cf
chmod 7555 /opt/sbin/sendmail

We will need to use su to test the execution of various commands as the nagios user.  Without this fix (described here), you might see the following error message:

su: warning: cannot change directory to /var/services/homes/nagios: No such file or directory su: /sbin/nologin: No such file or directory

Enter the following commands:

mkdir /var/services/homes
mkdir /var/services/homes/nagios
chown nagios:nagios /var/services/homes/nagios -R
vi /etc/passwd

Locate the line in the passwd file for the Nagios user.  Near the end of the line, /sbin/nologin should appear.  Replace that text with /bin/ash then save and exit vi.

Verify that the Nagios user is able to execute the check_ping plugin.  Replace MyDeviceHere with either an IP address or a network device name that is on your network:

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the ping command (called by check_ping) is not able to resolve a network device name, and the fully qualified dns name was not specified (MyDeviceHere.MyDomainHere.com), edit the /etc/resolv.conf file:

vi /etc/resolv.conf

On a new line in the file, add the following line (replacing MyDomainHere.com with your dns domain name for the network):

search MyDomainHere.com

Verify that sendmail works for the Nagios user.  At the prompt that appears, type a short message, press the Enter key, type a period, then press the Enter key again – replace MyEmailAddressHere@MyDomainHere.com with your email address):

su - nagios -c "/opt/sbin/sendmail -vt MyEmailAddressHere@MyDomainHere.com"

—-

It is important to always verify the Nagios configuration before starting (or restarting after a configuration change) Nagios.  To verify the configuration type the following:

/opt/bin/nagios -v /opt/etc/nagios.cfg

To start up Nagios as a background task (daemon), execute the following:

/opt/bin/nagios -d /opt/etc/nagios.cfg

To stop Nagios that is executing as a background task, type:

ps

InstallNagiosDiskStation11

Then search though the list of processes for the first line that shows /opt/bin/nagios -d /opt/etc/nagios.cfg.  The number at the left of that line, 31152 in this case, is used to stop Nagios.  To stop Nagios, type the following (replace 31152 with the number shown on your screen):

kill 31152

Side note: I tried installing quite a few different C++ compilers that supposedly work with the Synology DSM (see here).  As such, I had to find a way to remove a directory, that directory’s subdirectories, and files.  The following command will completely remove the /usr/local/i686-linux-gnu directory, should the need arise:

rm -rf /usr/local/i686-linux-gnu

At this point, Nagios will hopefully run as a background task, and it should be able to ping and send email alerts.  However, if you were following the above directions, we have not yet instructed Nagios which devices to monitor, and to whom the alert emails should be sent.  The next step is to define the email contacts by modifying the /opt/etc/objects/contacts.cfg file (see the documentation for assistance):

vi /opt/etc/objects/contacts.cfg

After setting up the contacts, we should probably tell Nagios which devices to monitor.  If there are a lot of devices on your network to be monitored, you might find that using Microsoft Excel rather than vi to create the object definitions makes the task more manageable.  Set up a simple worksheet with four columns.  Column A will be used to specify the short host_name for the object to be monitored.  Column B will be used to specify the alias (long description for the object).  Column C will be used to either specify the IP address for the device or the network name for the device.  Column D will be used to identify the group to which the object belongs and the file name to which the definition is saved (the Excel macro supports the following groups: ap, camera, computer, external, other, printer, server, switch).

InstallNagiosDiskStation13

The Excel macro is set up to read a tab delimited file, rather than reading the object description directly from the Excel worksheet.  Highlight all of the rows in the worksheet except for the top header row, and press Ctrl C (or edit – Copy) to copy the definitions to the Windows clipboard in tab delimited format.  Start Notepad (Start – Run – Notepad), and then press Ctrl V (or edit – Paste) to paste the tab delimited object descriptions into Notepad.  The Excel macro code expects the text file to be saved as nagioshosts.txt.

The Excel macro code follows (I image that not many computers still have a second floppy drive installed, so change the B:\Hardware Documentation\Synology\ path as appropriate for your environment):

Private Sub cmdProcessText_Click()
    Dim intFileNumRead As Integer
    Dim intFileNumAP As Integer
    Dim intFileNumCamera As Integer
    Dim intFileNumComputer As Integer
    Dim intFileNumExternal As Integer
    Dim intFileNumOther As Integer
    Dim intFileNumPrinter As Integer
    Dim intFileNumServer As Integer
    Dim intFileNumSwitch As Integer
    Dim intFileNumWrite As Integer

    Dim strLine As String
    Dim strItem() As String

    intFileNumRead = FreeFile
    Open "B:\Hardware Documentation\Synology\nagioshosts.txt" For Input As #intFileNumRead

    intFileNumAP = FreeFile
    Open "B:\Hardware Documentation\Synology\ap.cfg" For Output As intFileNumAP
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "# ap.cfg - lists the wireless access points to be monitored"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "define hostgroup{"; Chr(10);
    Print #intFileNumAP, "        hostgroup_name  ap                      ; The name of the hostgroup"; Chr(10);
    Print #intFileNumAP, "        alias           Local Access Points       ; Long name of the group"; Chr(10);
    Print #intFileNumAP, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumCamera = FreeFile
    Open "B:\Hardware Documentation\Synology\camera.cfg" For Output As intFileNumCamera
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "# camera.cfg - lists the IP cameras to be monitored"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "define hostgroup{"; Chr(10);
    Print #intFileNumCamera, "        hostgroup_name  camera                  ; The name of the hostgroup"; Chr(10);
    Print #intFileNumCamera, "        alias           Local IP Cameras          ; Long name of the group"; Chr(10);
    Print #intFileNumCamera, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumComputer = FreeFile
    Open "B:\Hardware Documentation\Synology\computer.cfg" For Output As intFileNumComputer
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "# computer.cfg - lists the shop floor computers to be monitored"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "define hostgroup{"; Chr(10);
    Print #intFileNumComputer, "        hostgroup_name  computer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumComputer, "        alias           Domain Computers          ; Long name of the group"; Chr(10);
    Print #intFileNumComputer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumExternal = FreeFile
    Open "B:\Hardware Documentation\Synology\external.cfg" For Output As intFileNumExternal
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "# external.cfg - lists the devices external to the LAN network to be monitored"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "define hostgroup{"; Chr(10);
    Print #intFileNumExternal, "        hostgroup_name  external               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumExternal, "        alias           Monitored devices External to the Network ; Long name of the group"; Chr(10);
    Print #intFileNumExternal, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumOther = FreeFile
    Open "B:\Hardware Documentation\Synology\other.cfg" For Output As intFileNumOther
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "# other.cfg - lists the miscellaneous devices to be monitored"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "define hostgroup{"; Chr(10);
    Print #intFileNumOther, "        hostgroup_name  other                 ; The name of the hostgroup"; Chr(10);
    Print #intFileNumOther, "        alias           Miscellaneous Devices ; Long name of the group"; Chr(10);
    Print #intFileNumOther, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumPrinter = FreeFile
    Open "B:\Hardware Documentation\Synology\printer.cfg" For Output As intFileNumPrinter
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "# printer.cfg - lists the printer devices to be monitored"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "define hostgroup{"; Chr(10);
    Print #intFileNumPrinter, "        hostgroup_name  printer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumPrinter, "        alias           Printers and Copiers  ; Long name of the group"; Chr(10);
    Print #intFileNumPrinter, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumServer = FreeFile
    Open "B:\Hardware Documentation\Synology\server.cfg" For Output As intFileNumServer
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "# server.cfg - lists the servers to be monitored"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "define hostgroup{"; Chr(10);
    Print #intFileNumServer, "        hostgroup_name  server               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumServer, "        alias           Server and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumServer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumSwitch = FreeFile
    Open "B:\Hardware Documentation\Synology\switch.cfg" For Output As intFileNumSwitch
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "# switch.cfg - lists the network equipment type devices to be monitored"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "define hostgroup{"; Chr(10);
    Print #intFileNumSwitch, "        hostgroup_name  switch               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumSwitch, "        alias           Switche and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumSwitch, "        }"; Chr(10); Chr(10); Chr(10);

    Do While Not (EOF(intFileNumRead))
        Line Input #intFileNumRead, strLine
        strItem = Split(strLine, vbTab)
        'strItem(0) = host_name
        'strItem(1) = alias
        'strItem(2) = address
        'strItem(3) = hostgroups
        Select Case strItem(3)
            Case "ap"
                intFileNumWrite = intFileNumAP
            Case "camera"
                intFileNumWrite = intFileNumCamera
            Case "computer"
                intFileNumWrite = intFileNumComputer
            Case "external"
                intFileNumWrite = intFileNumExternal
            Case "other"
                intFileNumWrite = intFileNumOther
            Case "printer"
                intFileNumWrite = intFileNumPrinter
            Case "server"
                intFileNumWrite = intFileNumServer
            Case "switch"
                intFileNumWrite = intFileNumSwitch
        End Select

        Print #intFileNumWrite, "define host{"; Chr(10);
        Select Case strItem(3)
            Case "ap"
                Print #intFileNumWrite, "        use             ap              ; Inherit default values from a template"; Chr(10);
            Case "camera"
                Print #intFileNumWrite, "        use             camera          ; Inherit default values from a template"; Chr(10);
            Case "computer"
                Print #intFileNumWrite, "        use             computer        ; Inherit default values from a template"; Chr(10);
            Case "external"
                Print #intFileNumWrite, "        use             external        ; Inherit default values from a template"; Chr(10);
            Case "other"
                Print #intFileNumWrite, "        use             other           ; Inherit default values from a template"; Chr(10);
            Case "printer"
                Print #intFileNumWrite, "        use             printer         ; Inherit default values from a template"; Chr(10);
            Case "server"
                Print #intFileNumWrite, "        use             server          ; Inherit default values from a template"; Chr(10);
            Case "switch"
                Print #intFileNumWrite, "        use             switch          ; Inherit default values from a template"; Chr(10);
        End Select
        Print #intFileNumWrite, "        host_name       "; strItem(0); "         ; The name we're giving to this device"; Chr(10);
        Print #intFileNumWrite, "        alias           "; strItem(1); "         ; A longer name associated with the device"; Chr(10);
        Print #intFileNumWrite, "        address         "; strItem(2); "         ; IP address of the device"; Chr(10);
        Print #intFileNumWrite, "        hostgroups      "; strItem(3); "         ; Host groups this device is associated with"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);

        Print #intFileNumWrite, "define service{"; Chr(10);
        Print #intFileNumWrite, "        use                     generic-service ; Inherit values from a template"; Chr(10);
        Print #intFileNumWrite, "        host_name               "; strItem(0); "        ; The name of the host the service is associated with"; Chr(10);
        Print #intFileNumWrite, "        service_description     PING            ; The service description"; Chr(10);
        Print #intFileNumWrite, "        check_command           check_ping!3000,25%!5000,90%    ; The command used to monitor the service"; Chr(10);
        Print #intFileNumWrite, "        normal_check_interval   5               ; Check the service every 5 minutes under normal conditions"; Chr(10);
        Print #intFileNumWrite, "        retry_check_interval    1               ; Re-check the service every minute until its final/hard state is determined"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);
    Loop

    Close #intFileNumRead
    Close #intFileNumAP
    Close #intFileNumCamera
    Close #intFileNumComputer
    Close #intFileNumExternal
    Close #intFileNumOther
    Close #intFileNumPrinter
    Close #intFileNumServer
    Close #intFileNumSwitch
End Sub

The files that are created use Unix/Linux standard line feed end of line marker characters, rather than the Windows standard carriage return/line feed combination characters.  As such, opening the generated files using Notepad is not advised.  Copy the generated files back to the /opt/etc/objects/ path on the DiskStation (copy the files to a Shared Folder on the DiskStation, then use the cp command to copy the files from the share location to /opt/etc/objects/ – the Shared Folders are typically created as a subdirectory in the /volume1/ directory).

If you decided to use some of the non-standard Nagios group names (as I did), those non-standard group names must be defined in the /opt/etc/objects/templates.cfg file:

vi /opt/etc/objects/templates.cfg

A portion of the additional entries that I made in this file include the following:

define host{
       name                    ap      ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, access points are monitored round t
       check_interval          5               ; Actively check the access point every 5 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each access point 10 times (max)
       check_command           check_ping      ; Default command to check if access points are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   30              ; Resend notifications every 30 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              ap ; Host groups that access points should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

define host{
       name                    camera  ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, cameras are monitored round t
       check_interval          60              ; Actively check the device every 60 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each device 10 times (max)
       check_command           check_ping      ; Default command to check if device are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   240             ; Resend notifications every 240 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              camera ; Host groups that cameras should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

Nagios will not know that it should read the additional configuration files until it is told to do so by modifying the /opt/etc/nagios.cfg file.

vi /opt/etc/nagios.cfg

Add the following lines to the nagios.cfg file:

# Charles Hooper's object types
cfg_file=/opt/etc/objects/ap.cfg
cfg_file=/opt/etc/objects/camera.cfg
cfg_file=/opt/etc/objects/computer.cfg
cfg_file=/opt/etc/objects/external.cfg
cfg_file=/opt/etc/objects/other.cfg
cfg_file=/opt/etc/objects/printer.cfg
cfg_file=/opt/etc/objects/server.cfg
cfg_file=/opt/etc/objects/switch.cfg

We have made a large number of changes to the configuration files, so it is important to verify that there are no errors in the configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are found in the configuration, terminate (kill) nagios and then restart as described above.

—-

Update July 28, 2013:

When attempting to start Nagios in daemon mode (/opt/bin/nagios -d /opt/etc/nagios.cfg) I encountered a couple of problems related to permissions for the Nagios user.  The nagios process was not listed when I used the ps command.  I then tried executing the following commands:

touch /opt/var/nagios.log
chown nagios:nagios /opt/var/nagios.log

Nagios was then able to start in daemon mode, but wrote messages similar to the following in the /opt/var/nagios.log file:

[1375058364] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058364] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058364] Bailing out due to errors encountered while attempting to daemonize… (PID=11451)
[1375058656] Nagios 3.5.0 starting… (PID=12936)
[1375058656] Local time is Sun Jul 28 20:44:16 EDT 2013
[1375058656] LOG VERSION: 2.0
[1375058656] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058656] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058656] Bailing out due to errors encountered while attempting to daemonize… (PID=12936)
[1375060107] Error: Unable to create temp file for writing status data: Permission denied
[1375060117] Error: Unable to create temp file for writing status data: Permission denied
[1375060127] Error: Unable to create temp file for writing status data: Permission denied
[1375060137] Error: Unable to create temp file for writing status data: Permission denied
[1375060147] Error: Unable to create temp file for writing status data: Permission denied
[1375060157] Error: Unable to create temp file for writing status data: Permission denied

I tried to set the permissions for a couple of other files, only to find another long list of Permission denied messages:

touch /opt/var/objects.cache
touch /opt/var/nagios.lock
touch /opt/var/nagios.tmp
chown nagios:nagios /opt/var/objects.cache
chown nagios:nagios /opt/var/nagios.lock
chown nagios:nagios /opt/var/nagios.tmp

I then recalled that I had seen similar messages on the DiskStation DS412+.  I then tried a different approach, creating a nagios directory in the /opt/var directory, creating a couple of subdirectories in that directory, and then assigning nagios as the owner of that directory structure:

mkdir /opt/var/nagios
mkdir /opt/var/nagios/archives
mkdir /opt/var/nagios/spool
mkdir /opt/var/nagios/spool/checkresults
chown nagios:nagios /opt/var/nagios -R
vi /opt/etc/nagios.cfg

In the nagios.cfg file, I made the following changes:

log_file=/opt/var/nagios/nagios.log
status_file=/opt/var/nagios/status.dat
lock_file=/opt/var/nagios/nagios.lock
temp_file=/opt/var/nagios/nagios.tmp
log_archive_path=/opt/var/nagios/archives
check_result_path=/opt/var/nagios/spool/checkresults
state_retention_file=/opt/var/nagios/retention.dat
debug_file=/opt/var/nagios/nagios.debug

After saving the file and exiting vi, I restarted Nagios in daemon mode.  Reading the last 100 lines of the Nagios log file is now accomplished with this command:

tail -n 100 /opt/var/nagios/nagios.log

—-

There are a lot of seemingly interesting Nagios plugins, including check_oracle (I believe that this plugin requires the Oracle client to be installed – good luck with that install).  On one of the DiskStations the check_snmp plugin did not compile, while on the other DiskStation the check_http plugin did not compile.

It might be interesting to see what solutions readers are able to develop from the above starting point.  The above information is the result of many hours of experimentation as well as a couple minutes reading through sections of the Nagios documentation (it reads like the Oracle Database documentation, so it should be an easy read once I am in the right mood) and hopelessly scanning the ‘net for information about obscure error messages.  Have fun, and try not to put the DiskStation out of service due to a mistaken file copy.

Update November 19, 2013:

Installing an updated version of the Synology DSM operating system may temporarily disable Nagios.  Make backups of all Nagios confirguration files (copying the files with the cp command to a directory in /volume1 is generally safe) before installing different versions of the Synology DSM operating system.

The DSM 4.3 operating system installation apparently removed the /var/services/homes directory.  That directory removal makes it impossible for the Nagios user to login to run various commands.  I assume that the removal of the homes directory is intentional, so a work around for that problem:

mkdir /var/services/home
mkdir /var/services/home/nagios
chown nagios:nagios /var/services/home/nagios -R
vi /etc/passwd

In the /etc/passwd file, change all /homes/ entries to /home/ then save and exit vi.

The installation of the different DSM version (including versions before 4.3) will likely also replace/remove the libltdl.* files located in /opt/local/lib and /usr/lib, so we need to copy those files back into the correct directories:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Once the above items are copied, try executing the check_ping command as the nagios user (replace MyDeviceHere with either an IP address or the name of a device on your network).

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the DiskStation reports that the check_ping command was not found, then copy that file back to the /opt/libexec/ directory.  If the above command was successful, try verifying the Nagios configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If the verification was successful, start Nagios as a daemon:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Execute the ps command and verify that the above command is listed in the running processes:

ps

Finally, verify that Nagios is still set to start automatically as a daemon:

ls /opt/etc/init.d/S81nagios

If a file is listed when the above command is executed, then Nagios should now be fully repaired.

-





Value of Improving One’s Knowledge

24 03 2013

March 24, 2013

As I type this blog article I am in the middle of reading a second book on the topic of Windows Server 2008 R2.  I bought the books several months ago, possibly even a year or two ago, and just had not found the time to invest in reading yet another dry IT-related 1,500 page book.  I would much prefer to read an exciting Oracle Database book, but those books seem to be in short supply (and I really do need to finish these two Windows Server 2008 R2 books so that I can move onto the Windows Server 2012 books, the Exchange Server 2013 books, and the Oracle Database 12c books that will be released eventually).

Three years ago I put together an article about finding the motivation to keep reading through various IT books, and offered a suggested reading list for people interested specifically in various aspects of Oracle Database (totally unrelated, but I recall that the word aspect was a pet peeve of one of my college English professors… he would say, “What is an aspect?” – I probably have used that word about five times since then).  I think that my list of suggested books is still a decent list.  However, Kyle Hailey recently put together a related article, “Where to begin with Oracle and SQL” that is excellent, and a bit more up to date than my list – the only book on Kyle’s list that I have not read is written by Dan Tow.

So, for people working in the information technology (IT) disciplines, how much effort do you put into improving your knowledge, or just maintaining that knowledge?  Do you actively try to break things, in the hope of discovering a solution?  Do you participate in discussion forums, such as the OTN forums or Usenet?  Do you read magazines, books, and various online articles?  Or, do you simply continue to perform the same actions that you have for the last five, 10, or 20 years (if the RULE worked just fine back then, why change)?

I try to read at least a couple of IT focused books a year.  For many of the Oracle Database specific books read since mid-2008 I have tried to write formal reviews of those books, posted to Amazon.com.  One of those reviews stretched to 35.5 typewritten pages!  Early yesterday morning a reader left the following comment attached to that review:

Mar 22, 2013 11:43:09 PM PDT SavvyShopper says:
Charles, you seem Very Intelligent(?) and should then Write a Book, fixing all these issues, than writing so much … :) How do you get so much time , if you are really “Working” ??

Oddly, the person deleted their comment a mere two hours after I posted a reply.  My first reaction to the comment was “must spending time expanding one’s knowledge be mutually exclusive to being employed?”  That was not the first time that someone using Amazon left a similar comment attached to one of my reviews:

Mar 31, 2011 12:37:49 PM PDT Wanda A. Cadogan says:
And don’t you ever wonder how guys like Mr. Hooper have so much time to write a 24 page review?

Is it really that hard to believe that guys like Mr. Hooper take notes (sometimes extensive notes) while reading technical publications?

I noticed that SavvyShopper had previously reviewed a couple of Oracle Database related books, as well as a couple of woodworking products.  Quite a while ago I toyed a bit with woodworking (a bit in woodworking…. cuts both ways), as a matter of fact I thought that I was headed to college to become a woodworking instructor.  So, I possibly have an odd connection with SavvyShopper, might this be an opportunity for a teachable moment?  I noticed too that in one of his reviews that SavvyShopper appeared to be struggling with the concept that a softwood can in fact be harder than a hardwood.  I am a little disappointed that SavvyShopper deleted his comment after all the effort that I put into formulating a reply:

SavvyShopper,

Thank you for leaving your comment.

Some of my book reviews, especially when there are many errors in the book being reviewed, tend to be very long.  I read computer related books to improve my knowledge, whether that means that I am simply reminded how something works, or I am learning something entirely new.   Investing effort (and occasionally an excessive amount of time) into improving my skill set is important to me, and I hope that it is important to other people in the IT related professions.  The Oracle Database book reviews that I post to Amazon are mostly a by-product of that self-improvement process.

I am still happily employed at the same place where I have worked for more than a decade as an IT professional.  Making time for expanding one’s computer skills through reading sometimes means sacrificing time spent with hobbies or other outside of work activities.  I believe that it is that sacrifice that is an important ingredient in the formula for long-term gainful employment.

Woodworking is a hobby of mine, and judging by your reviews on Amazon, woodworking may also be one of your hobbies.  I see that you gave a thin kerf Forrest Woodworker II saw blade a 2 out of 5 rating (coincidentally the same rating that I gave to this book), primarily because you found that the blade did not impress you when cutting pine and redwood while building one project.  Pine, much more than redwood, leaves deposits of pitch on the face and sides of saw blade teeth (kerosene (or Simple Green) and an old toothbrush will remove the pitch deposits, but I think that there are commercial spray on products on the market now that serve the same purpose).  This buildup of pitch effectively reduces the saw blade’s ability to cut smoothly.  The thin kerf body of the saw blade that you reviewed is less resistant to flexing in a heavy cut, so combined with the pitch build up on the blade, you probably did experience poor quality cuts.  If you mounted the saw blade on an inexpensive saw with a weak/bad bearing setup, that will also explain the poor quality cut that you received.  If you want a treat, mount a regular kerf Woodworker II on an older, properly tuned 3HP or 5HP Unisaw (or Powermatic Model 66) and witness the glass smooth cuts in (even three inch thick) hardwood such as poplar, oak, walnut, or hard maple.

Getting back on topic of your comment, I currently have very little extra time to put into writing.  While probably not your intention, I do not see a book that summarizes mistakes found in other books and the accompanying corrections as selling more than a couple of dozen copies; based on some Oracle book authors remarkable ability to abuse the DMCA (Digital Millennium Copyright Act), such a book could also become a legal nightmare for the publishing company.  I have been approached a couple of times to see if I had an interest in writing a book on the topic of Oracle Database.  At this time, I simply cannot make that commitment (writer’s block, lack of available free time, required reading to stay current in the computer-related field, wanting to spend more time working with non-computer related routers and bits, etc.).

A couple of years ago I contributed to an Oracle Database specific book with fellow OakTable Network members.  The technical standards of the members of that group are extremely high (do a Google search, if you are interested).  As such, I would estimate that I spent roughly eight hours per page writing, testing, proofreading, rewriting, proofreading, testing, rewriting, and proofreading again.  It was quite an experience, especially when Oracle Corporation announced the general availability of a new major version release a day or two before the first draft of the section of the book had to be submitted to the publisher – more testing, rewriting, and proofreading of the book section followed after that first draft was submitted.

Thank you again for leaving the comment.

So, do you go out of your way to learn something new every day, or are you more likely to “let it roll” for five, 10, or 20 years?  For the record, I have a rather large stack of unread woodworking magazines (most of the unread magazines range from 0 to 7 years old) sitting on the shelf that are waiting for the time when spending time expanding one’s knowledge is mutually exclusive to being employed (in the IT industry).

A new logo in development – resawn from 1 inch x 2 inch quarter sawn oak.

Bookmatch Arrangement #1:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #2:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #3:

OLYMPUS DIGITAL CAMERA

The New Logo:

OLYMPUS DIGITAL CAMERA





What does it Mean when a Select Statement in Oracle is using 100% CPU?

14 02 2013

February 14, 2013

A couple of days ago I noticed that an interesting set of search keywords were used to access this blog.  The search keywords:

What does it mean when a select statement in Oracle is using 100% cpu

I had two thoughts when I first saw that set of search keywords:

  • Well Done!
  • What an inconsiderate Oracle developer!

I feel a bit conflicted about the two first thought answers.  Think about the above bullet points for a moment.

I was interested enough in the keywords that were phrased in the form of a question, that I went though the search keywords used to access this blog for the last two weeks, and found the following related search keywords:

  • ora-01861 Oracle causing cpu 100% usage
  • Oracle one session 100% cpu
  • Oracle cpu utilisation 100%, is it a good or bad sign
  • Oracle database server cpu utilization is 100

Interesting, and possibly thought provoking.  I recalled an article that I wrote on this blog roughly three years ago.

Possible related thoughts of a temporarily rogue DBA:

  1. I fixed your problem without changing the execution plan… take a look at the EXPLAIN PLAN output for yourself.
  2. I fixed your problem without changing the execution plan… the server is more efficient now that we have consolidated three more database instances onto the server.
  3. I fixed your problem without changing the execution plan… that PGA_AGGREGATE_TARGET parameter was clear on the other side of 100 MB, so I set it to 10 MB.
  4. I fixed your problem without changing the execution plan… a group of the guys in the office found a way to load the game Doom on the server.
  5. I fixed your problem without changing the execution plan… I took half of the hard drives out of the array and moved the server to a 10 Mb hub.

Took a close look at the bulletpointed search terms.  If someone were to ask you one of the following questions, how would you respond if you were earnestly attempting to help the person?

  1. What does it mean when a Select statement in Oracle is using 100% CPU?
  2. The server admin reported that the Oracle database server cpu utilization hits 100% too frequently, what should I do about the problem?
  3. Oracle is throwing an ORA-01861 error, and that is causing 100% CPU usage.  What should I do?

Share your thoughts.

Summary of Suggestions/Causes Shared by Readers (Last Updated February 14, 2013):

  • Intended or unintended Cartesian product between row sources – does the query have a DISTINCT clause.  (Rodger)
  • Intended or unintended data type conversions – is the data model storing numbers or date values in VARCHAR2 columns.  (Jeremy Kendrick)




Connecting to an Oracle Database with Visual Basic 6.0 on Windows 8 64 Bit

25 11 2012

November 25, 2012 (Modified December 7, 2012)

Compatibility problems?  Visual Basic 6.0, released in 1998, is not officially compatible with Windows 8 Pro 64 bit… or Windows 7, or Windows Vista.  But I still like the language a lot for its simplicity, rapid development, and significant pre-existing code base within my company.  Of course, Oracle Database 11.2.0.3 and the Oracle Client 11.2.0.3 are not officially supported on Windows 8, so maybe the quest is an exercise in futility.

Roughly a month ago a thread appeared on the OTN forums asking how to connect Visual Basic 2010 to Oracle Database Personal Edition on Windows 7 Pro 64 bit running on the same computer.  Several people offered very good advice to guide the OP.  The OP eventually asked how to connect Visual Basic 6 to Oracle Database Personal Edition running on the same Windows 7 Pro 64 bit computer.  As I mentioned, Visual Basic 6 is not officially compatible with Windows 7, and the fact that it is a 32 bit application running on a 64 bit Windows computer means that the 32 bit Oracle Client must also be installed on the computer.

Visual Basic 6.0 (note that you should install service pack 6 for Visual Basic 6, even if the installer locks up at the very end) will run just fine on Windows 7 Pro 64 bit (and probably on Windows 8 Pro 64 bit also), although drawing form objects is a little slow.  The automatic Windows updates may also pose an issue.  From time to time it may be necessary to re-register the 32 bit MSCOMCTL.OCX file to avoid error messages when opening Visual Basic projects.  To re-register that file, from a Windows command prompt (Run – cmd.exe) , enter the following:

cd \windows\syswow64
regsvr32 MSCOMCTL.OCX

Using Visual Basic 6.0, or any development environment for that matter, requires that certain components be selected for installation when installing the Oracle Client.  At a minimum, Oracle ODBC Driver 11.2.0.x.0 (for ODBC type connections), and Oracle Provider for OLE DB 11.2.0.x.0 (for OLE DB type connections) must be installed with the Oracle Client.

Let’s set up a simple Visual Basic project to demonstrate connecting to Oracle Database 11.2.0.3 running on the same Windows 8 computer.  First, when installing Visual Basic 6, you may want to install and register a couple of additional uncommonly used controls.  Those controls are located in the \COMMON\TOOLS\VB\CONTROLS folder on the Visual Basic install CD.  On a 64 bit computer, the controls should be copied to the C:\Windows\Syswow64 folder and then registered in that location using regsvr32 as demonstrated above with the MSCOMCTL.OCX file.  In some cases, a license file must also be imported into the Windows registry – those files have a .REG extension in the same location on the CD:

On to building the project.  First, we need to add a reference to allow the project to use ADO and the Oracle Provider for OLE DB that was installed with the Oracle Client.  From the Project menu, select References…:

Next, we need to add the most recent version of the Microsoft ActiveX Data Objects Library, version 6.1 for Windows 7 and Windows 8, version 6.0 for Vista, or version 2.8 for Windows XP.  After selecting the correct version, click the OK button:

Now let’s add a couple of additional controls to the project.  I first started using the Microsoft Grid Control in Visual Basic 2.0, so for old time’s sake let’s add that control to the project.  We may also want to add a status bar to the project’s form, so let’s also add Microsoft Windows Common Controls 6.0 (SP6).  Then click the OK button:

Now, draw the form controls on the form.  We need three command buttons named cmdLateBinding, cmdEarlyBinding, and cmdQueryDatabase, a grid control named grdOutput, and optionally a status bar named stbStatus with its Style property set to Simple:

Let’s add the code to the Late Binding (cmdLateBinding) button (this is essentially the same code that I provided in the OTN thread, and does not require the addition of the Microsoft ActiveX Data Objects Library in the references for the project – so this code is a good simple test to make certain that everything is installed correctly):

Dim intResult
Dim strDatabase
Dim strUserName
Dim strPassword
Dim strSQL
Dim dbDatabase
Dim snpData

'On Error Resume Next

strDatabase = "MyDB" 'From tnsnames.ora
strUserName = "MyUserID"
strPassword = "MyPassword"

Set snpData = CreateObject("ADODB.Recordset")
Set dbDatabase = CreateObject("ADODB.Connection")

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";"
dbDatabase.Open

If (dbDatabase.State = 1) And (Err = 0) Then
    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  SYSDATE CURRENT_DATE" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  DUAL"

    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        If Not (snpData.EOF) Then
            Do While Not (snpData.EOF)
                MsgBox snpData("current_date")

                snpData.MoveNext
            Loop
        End If
        snpData.Close
    End If
Else
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
End If

dbDatabase.Close
Set snpData = Nothing
Set dbDatabase = Nothing

Let’s run the project and click the Late Binding button to see what happens.  We should expect to see some sort of connection failure message, unless there is an entry in the tnsnames.ora for MyDB:

Well, that error message certainly was unexpected (the OP in the OTN thread may have encountered the same error message)!  Let’s click on the Debug button:

ADODB cannot find the OraOLEDB.Oracle provider.  I am sure that I installed that component when installing the 32 bit Oracle Client on the computer.  The Visual Basic code that was added to the Late Binding command button is very generic.  In fact, as written the code can be placed in a plain text file with a .VBS extension and executed as a VBScript file with either the wscript or cscript command.  Copy the code from the Late Binding command button into a plain text file (start the Notepad program and paste the code), then save the file as “LateBinding.vbs” (including the quotes).

Now to test the script that was just created.  Open a Windows command prompt (Windows key and R, type cmd, press the Enter key).  Change to the folder where the LateBinding.vbs file was saved, then type:

wscript LateBinding.vbs

Note that this time the computer indicated an ORA-12154: TNS:could not resolve the connect identifier error, rather than a Provider could not be found error – we were hoping to obtain the same ORA-12154 error as we did when working in Visual Basic 6.  What changed?  Well, this is a 64 bit computer, so the 64 bit wscript program was used, which used the 64 bit Oracle client (actually the 64 bit Oracle Database home files).

Let’s try again, this time with the 32 bit version of the wscript program.  To execute the script with the 32 bit wscript, execute the following:
c:\windows\SysWOW64\wscript LateBinding.vbs

Notice that the error now appears as Provider could not be found, which is the same error message that was observed in Visual Basic 6.  We reproduced the problem!  Now how do we fix it?  I suppose that we should check with the Oracle Client installer to verify that the Oracle Provider for OLE DB 11.2.0.x.0 component was in fact installed.  Let’s re-run the Oracle installer to check the Inventory of the installed components.  Just open the Start menu, click the Oracle – OraClient11g_home1 folder, then expand Oracle Installation Products, then click Universal Installer:

Uh, sure.  Windows 8 and its tablet-like Start menu… good luck finding the Universal Installer for the Oracle Client. (Edit November 26, 2012: I was trying to be humorous here.  If you have a touch screen, you can swipe your finger up from the bottom of the screen to reveal the option to display all items that normally appear on the pre-Windows 8 Start menu, separated by folder/Oracle home.  If you do not have a touch screen, hold down the Windows key and press the W key while in this new Windows 8 start screen to display the same list of items separated by folder/Oracle home.)  I always change the base install folder to C:\Oracle when installing Oracle components, so to start up the Universal Installer for the Oracle Client, I can just run:
C:\oracle\product\11.2.0\client_1\oui\bin\setup.exe

Oracle Provider for OLE DB 11.2.0.3.0 is in the list of installed components (click the Installed Products button in the Oracle Installer to see the above list), so why doesn’t the Oracle Provider for OLE DB from the 32 bit client work as expected?  That component requires that the OraOLEDB11.DLL file is found in the expected location, in my case, here:
C:\Oracle\product\11.2.0\client_1\bin\OraOLEDB11.DLL

Yes, the file is in that location.  Is the Oracle Client 11.2.0.3 just not compatible with Windows 8 Pro?  The Oracle Client 11.2.0.3 seemed to work as expected on a computer that was upgraded from Windows 7 Pro to Windows 8 Pro, including the OLE DB functionality, so why will that functionality NOT work on a new computer with Windows 8 Pro installed?

While comparing the Windows registry entries on a Windows 7 Pro 64 bit computer with the Windows registry entries on the Windows 8 Pro computer where the OLE DB functionality did not work, I noticed a problem – missing entries in the Windows 8 Pro’s Windows registry.  From the Windows 7 Pro 64 bit computer, I exported the three missing Windows registry sections:

3F63C36E-51A3-11D2-BB7D-00C04FA30080.reg:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}]
@="OraOLEDB.Oracle"
"OLEDB_SERVICES"=dword:ffffffff

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors]
@="Extended Error Lookup Service"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\OLE DB Provider]
@="Oracle Provider for OLE DB"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ProgID]
@="OraOLEDB.Oracle.1"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\TypeLib]
@="{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.Oracle"

3FC8E6E4-53FF-11D2-BB7D-00C04FA30080.reg:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\ProdID]
@="OraOLEDB.ErrorLookup.1"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.ErrorLookup.1"

0BB9AFD1-51A1-11D2-BB7D-00C04FA30080.reg (just the win32 section was missing):

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\TypeLib\{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}\1.0\win32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"

After verifying that the Oracle Client’s home on the Windows 8 Pro computer was in fact located in C:\Oracle\product\11.2.0\client_1 (it may not be on your computer, change the above registry files as necessary for your computer – use two \\ characters for each \ character that normally appears in the path to the Oracle Client’s home), I imported the registry entries into the Windows 8 Pro computer’s registry by double-clicking each of the files that were exported from the Windows 7 Pro computer.

The result after importing the registry entries and re-running the script with the 32 bit wscript:

Well, that is not good, another error message.  But wait, that was the same error message displayed when the 64 bit wscript was used to execute the script file.  Maybe we are making progress.  Let’s fix the LateBinding.vbs script file so that the strDatabase variable is set to a valid database name found in the 32 bit client’s tnsnames.ora file (I will use a database named sample, which has the Oracle sample schema loaded), the strUserName variable is set to a valid database username in the database (I will use the sh user), and the strPassword variable is set to the correct password for the username.

Let’s try executing the script again:

Oh, another error message.  ORA-28000: the account is locked.  I guess that some Oracle rules still apply.  Start up a SQL*Plus session in another Windows command prompt, connect to the database as the SYS AS SYSDBA user, and unlock the SH account (edit November 26, 2012: note that this account should probably be locked again once we finish experimenting):

ALTER USER SH ACCOUNT UNLOCK;

Let’s try executing the script again:

It Worked!  The 32 bit version of wscript on Windows 8 Pro 64 bit was able to connect to the 11.2.0.3 64 bit database running on the same computer and select the current date and time from the database.  But, does the Late Binding example code in Visual Basic 6 work?  Let’s check:

Nice!

[Headache Induced Rant] You may notice a nearly 3 hour time difference between the time output by wscript and the time output by Visual Basic 6.  That is not an error – I spent nearly three hours trying to put this blog article together in WordPress (total time putting together this article was close to 5 hours) on the Windows 8 computer AFTER successfully connecting to the database using the 32 bit version of wscript.  The Sony laptop designer who thought it would be a good idea to place the laptop’s left and right mouse buttons underneath the corners of the touch pad and then thought that it would be a good idea to enable zoom control when one finger is barely touching the location of the left mouse button while the other finger is attempting to scroll up and down the WordPress editor’s page should be taken out behind the building and slapped a couple of times.  I quite literally had to undo an unintended page zoom very close to 50 times while putting together this blog article.  The darn laptop has a touch screen… if I really want to zoom the screen, I will either manually select the zoom feature in the application or put two fingers on the screen and then separate those fingers.  (Edit November 26, 2012: Sony does a good job of burying the option to disable this touch pad feature – it is found in the same application that disables the touch pad’s tap to left-click “feature”.)

And which Microsoft developer thought that it would be a great idea to have a large clock pop up on screen in the lower left (taking away the typing entry focus from the application that was being used) on occasion when someone is trying to type on the computer?  That developer should also be taken out behind the building and slapped. [/Headache Induced Rant]

The above example code used early late (edit: July 24, 2014) binding, which defines variables as variants and then later assigns object types to the variables.  That technique, which now does work and is compatible with wscript, tends to result in poor performance.  Poor performance should be considered an application bug, so let’s build an example that uses early binding (note that the References mentioned earlier in this article must have been selected).  Add the following code to the Early Binding button (note that the code is attempting to connect to the sample database as sh user):

Dim intResult As Integer
Dim strSQL As String
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String

Dim snpData As ADODB.Recordset
Dim dbDatabase As ADODB.Connection

On Error Resume Next

Set dbDatabase = New ADODB.Connection

strDatabase = "sample" '"MyDB" 'From tnsnames.ora
strUserName = "sh" '"MyUserID"
strPassword = "sh" '"MyPassword"

'Connect to the database, specify that the default array fetch size should be 100 rows
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"
dbDatabase.ConnectionTimeout = 40
dbDatabase.CursorLocation = adUseClient
dbDatabase.Open

If (dbDatabase.State <> 1) Or (Err <> 0) Then
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
Else
    Set snpData = New ADODB.Recordset

    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  COUNT(*) NUM_ROWS" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  CUSTOMERS"
    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        If Not (snpData.EOF) Then
            MsgBox "Number of rows in the Customers table: " & Format(snpData("num_rows")), vbInformation
        Else
            MsgBox "No rows returned from the query.", vbCritical
        End If
        snpData.Close
    Else
        MsgBox "The query could not be executed.", vbCritical
    End If
End If

'When finished, clean up
If dbDatabase.State = 1 Then
    dbDatabase.Close
End If

Set snpData = Nothing
Set dbDatabase = Nothing

Let’s see the result of executing the above code:

Wow, 55,500 rows in that sample table, that is a lot of rows.

Finally, let’s add the code to the Query Database button.  This code will use early binding, bind variables, and present the result rows in the grid control.

Dim i As Integer
Dim intResult As Integer
Dim strSQL As String
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String
Dim strOut As String

Dim snpData As ADODB.Recordset
Dim comData As ADODB.Command

On Error Resume Next

Set dbDatabase = New ADODB.Connection

strDatabase = "sample" '"MyDB" 'From tnsnames.ora
strUserName = "sh" '"MyUserID"
strPassword = "sh" '"MyPassword"

'Connect to the database, specify that the default array fetch size should be 100 rows
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"
dbDatabase.ConnectionTimeout = 40
dbDatabase.CursorLocation = adUseClient
dbDatabase.Open

If (dbDatabase.State <> 1) Or (Err <> 0) Then
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
Else
    Set comData = New ADODB.Command
    With comData
        strSQL = "SELECT" & vbCrLf
        strSQL = strSQL & "  CUST_ID," & vbCrLf
        strSQL = strSQL & "  CUST_FIRST_NAME," & vbCrLf
        strSQL = strSQL & "  CUST_LAST_NAME," & vbCrLf
        strSQL = strSQL & "  CUST_GENDER," & vbCrLf
        strSQL = strSQL & "  CUST_YEAR_OF_BIRTH," & vbCrLf
        strSQL = strSQL & "  CUST_MARITAL_STATUS," & vbCrLf
        strSQL = strSQL & "  CUST_STREET_ADDRESS," & vbCrLf
        strSQL = strSQL & "  CUST_POSTAL_CODE," & vbCrLf
        strSQL = strSQL & "  CUST_CITY," & vbCrLf
        strSQL = strSQL & "  CUST_STATE_PROVINCE," & vbCrLf
        strSQL = strSQL & "  COUNTRY_ID" & vbCrLf
        strSQL = strSQL & "FROM" & vbCrLf
        strSQL = strSQL & "  CUSTOMERS" & vbCrLf
        strSQL = strSQL & "WHERE" & vbCrLf
        strSQL = strSQL & "  CUST_FIRST_NAME= ?"

        .CommandText = strSQL
        .CommandType = adCmdText
        .CommandTimeout = 30
        .Parameters.Append .CreateParameter("first_name", adVarChar, adParamInput, 20, "")
        .ActiveConnection = dbDatabase
    End With

    comData("first_name") = "Charles"
    Set snpData = comData.Execute

    If Not (snpData Is Nothing) Then
        If snpData.State = 1 Then
            If Not (snpData.EOF) Then
                grdOutput.Cols = snpData.Fields.Count
                grdOutput.FixedRows = 0
                grdOutput.FixedCols = 0
                grdOutput.Rows = 1

                strOut = ""
                For i = 0 To snpData.Fields.Count - 1
                    strOut = strOut & snpData(i).Name & vbTab
                Next i
                grdOutput.AddItem strOut 'Add the new heading row
                grdOutput.RemoveItem 0 'Remove whatever is on the first row

                Do While Not (snpData.EOF)
                    strOut = ""
                    strOut = strOut & Format(snpData("cust_id")) & vbTab
                    If Not (IsNull(snpData("cust_first_name"))) Then
                        strOut = strOut & snpData("cust_first_name") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_last_name"))) Then
                        strOut = strOut & snpData("cust_last_name") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_gender"))) Then
                        strOut = strOut & snpData("cust_gender") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_year_of_birth"))) Then
                        strOut = strOut & Format(snpData("cust_year_of_birth")) & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_marital_status"))) Then
                        strOut = strOut & snpData("cust_marital_status") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_street_address"))) Then
                        strOut = strOut & snpData("cust_street_address") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_postal_code"))) Then
                        strOut = strOut & snpData("cust_postal_code") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_city"))) Then
                        strOut = strOut & snpData("cust_city") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_state_province"))) Then
                        strOut = strOut & snpData("cust_state_province") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("country_id"))) Then
                        strOut = strOut & snpData("country_id") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    grdOutput.AddItem strOut 'Add the new data row to the grid control

                    snpData.MoveNext
                Loop
            Else
                MsgBox "No rows were returned by the query.", vbInformation
            End If
        Else
            MsgBox "The query could not be executed.", vbCritical
        End If
    Else
        MsgBox "The query could not be executed.", vbCritical
    End If
    dbDatabase.Close
End If

If grdOutput.Rows > 1 Then
    grdOutput.FixedRows = 1
End If

Set snpData = Nothing
Set dbDatabase = Nothing

What does the output look like when the above code is executed?

Nice!  Now the problem.  It is a very bad habit to repeatedly connect to and disconnect from the database, and such a habit may result in a variety of problems, including performance problems.  Ideally, the code to connect to the database should be relocated to a different part of the program, possibly the Form_Load subroutine.

Added December 7, 2012:

I was able to reproduce the above mentioned problem on a Windows 7 Pro computer when the 11.2.0.3 client was installed (without first installing the 11.2.0.1 or 11.2.0.2 client).  Even after importing the above mentioned registry entries, the program (and the VBScript) reported that the “Provider cannot be found”.  As mentioned, after verifying that the above indicated Oracle components were installed, I imported the following registry script (save as a plain text file, rename with a .reg extension, then double-click the file), which combines the three above mentioned registry scripts into a single file:

Windows Registry Editor Version 5.00
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}]
@="OraOLEDB.Oracle"
"OLEDB_SERVICES"=dword:ffffffff
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors]
@="Extended Error Lookup Service"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\OLE DB Provider]
@="Oracle Provider for OLE DB"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ProgID]
@="OraOLEDB.Oracle.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\TypeLib]
@="{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.Oracle"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\ProdID]
@="OraOLEDB.ErrorLookup.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.ErrorLookup.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\TypeLib\{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}\1.0\win32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"

The missing element that I neglected to mention in the original version of this article is that the Oracle 11.2.0.3 Client installer apparently forgets to register the OraOLEDB11.DLL file that is in the Oracle home.  The simple work-around for this problem is to manually register the DLL file, with a command similar to the following executed at a Windows command line (replace C:\Oracle\product\11.2.0\client_1 with the correct location of the Oracle client):

regsvr32 C:\Oracle\product\11.2.0\client_1\bin\OraOLEDB11.DLL

Please note that I have not fully verified that importing the three missing registry sections shown above fixes all problems with the 32 bit Oracle Client 11.2.0.3 on Windows 8 Pro 64 bit.





oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0

22 10 2012

October 22, 2012

Lately, I have been busy with a couple of items that are not completely Oracle Database specific.  I tried experimenting with high definition 1080p security cameras.  After wasting an unacceptable amount of money to purchase two Y-Cam Wireless High Definition 1080p video cameras (model YCBLHD6), and spending in excess of 38 hours fighting with the cameras to make them work as advertised, I wrote a video review of the product.  After writing a review, Y-Cam’s tech support representative contacted me again, so I pulled the cameras back out of the box and wasted another five hours trying to make the cameras work wirelessly – a feature that was lost after installing the third firmware version on the cameras.  The cameras magically connected wirelessly very early the next morning – but wireless communiation was only one of several problems that I was experiencing with this model of security camera.  What might cause a procedure to work magically when that same procedure failed several times in the past – and might this somehow have an odd relation to Oracle Database?  This is the explanation that I provided to Y-Cam’s tech support representative and posted as a follow up on Y-Cam’s support forum:

The camera’s Ethernet port and wireless adapter both have the same hardcoded MAC address. This MAC address is used by network switches to determine which devices are attached to each of the switch ports so that network traffic addressed to a specific MAC address is only sent down the switch port where that MAC address is known to be connected. By contrast a network hub broadcasts network traffic to all ports, rather than just the port to which a hardware device’s MAC address is attached; a hub is less likely to have an issue with two different network adapters having the same MAC address (Y-CAM support reported that they confirmed that the camera with the latest firmware worked with a non-broadcasted WPA2 protected wireless network). The network switch maintains a MAC address table for each switch port – when the camera’s Ethernet port is disconnected and the camera is power cycled, the switch’s MAC address tables may not be immediately updated to remove the camera’s MAC address from the wired Ethernet port so that the MAC address may be added to the switch port used for wireless connectivity. Power cycling the network switch clears the switch’s MAC address tables, allowing the camera’s MAC address to be added to the switch port used for wireless connectivity.

In short, it is necessary to power cycle the switch (or the Cisco Linksys E2000 router acting as only a wireless access point with integrated 4 port gigabit switch in my case) when moving the camera from a wired Ethernet connection to a wireless connection.

Makes a bit of sense… now if only motion detection worked as advertised and stability improved to the point that the camera does not require two or three reboots per day.  Thankfully, most other consumer products tend to work out of the box; imagine the ramifications if every time the weather conditions included rain, a vehicle would sputter, shake violently, and reduce power to 10% of normal when the driver attempted to accelerate.

Changing gears a bit.  You might have noticed that the title of this article appears to be some sort of error message, probably produced by an Oracle product.  I am in the process of writing a program that interfaces with an Oracle Database, rapidly retrieving a variety of information from various database tables – using bind variables in the SQL statements, of course:

The problem?  My program seems to crash randomly on computers running either the 32 bit or 64 bit version of Windows 7 (the program works without issue on computers running the 32 bit version of Windows XP).  The randomness of the crashes makes it a bit difficult to troubleshoot this particular problem, but the vast majority of the crashes are associated with populating the treeview control.  Is the crash caused by a typo (bug) in my program logic?  Is the crash caused by the Windows 7 version of the treeview control?  Is the crash caused by Oracle Database 11.2.0.2, as a result of receiving several different SQL statements in rapid-fire fashion?  Could there be another explanation?

To start the investigation, I first enabled a 10046 trace at level 12.  I then entered an employee ID and started selecting different date ranges until the program crashed.  The crash message varied (as did the point at which the crash happened), and while Windows offered to contact the developer to report the error, I declined the offer.  Unfortunately, I did not capture any of the exact error messages, although I did open one of the debug files created during a crash – “Unhandled exception at 0x0fc01b3d”:

mov cx, word ptr [edi] – Thanks Windows, I think that I will go back to the safety of the 10046 trace files (my best guess is that this instruction is attempting to copy a value from a variable memory pointer).

-

Digging into the 10046 trace file, I found that this was the last SQL statement executed by my program before the crash:

CLOSE #1301509800:c=0,e=10,dep=0,type=1,tim=4032143047926
=====================
PARSING IN CURSOR #1301508512 len=1452 dep=0 uid=172 oct=3 lid=172 tim=4032143048002 hv=929599856 ad='3eb8207a0' sqlid='4y6m7dsvqj4bh'
SELECT 
  EMPLOYEE_ID, 
  SHIFT_DATE, 
  CLOCK_IN, 
  CLOCK_OUT, 
  HOURS_WORKED, 
  HOURS_VACATION, 
  HOURS_HOLIDAY, 
  HOURS_BEREAVEMENT, 
  HOURS_JURY_DUTY, 
  DECODE(SHIFT_DATE_CODE,'VAC',1,0) VACATION, 
  DECODE(SHIFT_DATE_CODE,'HOL',1,0) HOLIDAY, 
  DECODE(SHIFT_DATE_CODE,'BER',1,0) BEREAVEMENT, 
  DECODE(SHIFT_DATE_CODE,'JUR',1,0) JURY, 
  DECODE(SHIFT_DATE_CODE,'ABS',1,0) ABSENT, 
  DECODE(SHIFT_DATE_CODE,'EXC',1,0) EXCUSED, 
  SUM(DECODE(SHIFT_DATE_CODE,'VAC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) VACATION_COUNT, 
  SUM(HOURS_VACATION) OVER (PARTITION BY EMPLOYEE_ID) VACATION_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'HOL',1,0)) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_COUNT, 
  SUM(HOURS_HOLIDAY) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'BER',1,0)) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_COUNT, 
  SUM(HOURS_BEREAVEMENT) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'JUR',1,0)) OVER (PARTITION BY EMPLOYEE_ID) JURY_COUNT, 
  SUM(HOURS_JURY_DUTY) OVER (PARTITION BY EMPLOYEE_ID) JURY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'ABS',1,0)) OVER (PARTITION BY EMPLOYEE_ID) ABSENT_COUNT, 
  SUM(DECODE(SHIFT_DATE_CODE,'EXC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) EXCUSED_COUNT 
FROM 
  PAY_HOURS
WHERE 
  EMPLOYEE_ID= :1 
  AND SHIFT_DATE BETWEEN :2 AND :3 
  AND COALESCE(SHIFT_DATE_CODE,'ZZZ')<>'NOL' 
ORDER BY 
  COALESCE(SHIFT_DATE_CODE,'ZZZ'), 
  SHIFT_DATE
END OF STMT
PARSE #1301508512:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048001
BINDS #1301508512:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=4d936d40  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=4d936d60  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=4d936d68  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1301508512:c=0,e=129,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048323
WAIT #1301508512: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143048369
FETCH #1301508512:c=0,e=33,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048448
STAT #1301508512 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=43 us cost=6 size=50 card=1)'
STAT #1301508512 id=2 cnt=0 pid=1 pos=1 obj=0 op='WINDOW BUFFER (cr=3 pr=0 pw=0 time=34 us cost=6 size=50 card=1)'
STAT #1301508512 id=3 cnt=0 pid=2 pos=1 obj=0 op='FILTER  (cr=3 pr=0 pw=0 time=21 us)'
STAT #1301508512 id=4 cnt=0 pid=3 pos=1 obj=70154 op='TABLE ACCESS BY INDEX ROWID PAY_HOURS (cr=3 pr=0 pw=0 time=20 us cost=4 size=50 card=1)'
STAT #1301508512 id=5 cnt=0 pid=4 pos=1 obj=71786 op='INDEX RANGE SCAN SYS_C0022057 (cr=3 pr=0 pw=0 time=20 us cost=3 size=0 card=1)'
WAIT #1301508512: nam='SQL*Net message from client' ela= 1448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050020
PARSE #1301509800:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=266456858,tim=4032143050081
WAIT #1301509800: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050142

*** 2012-10-19 09:43:33.522
WAIT #1301509800: nam='SQL*Net message from client' ela= 5761077 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032148811261  

Great, I will take a closer look at the part of my program that executes that SQL statement.  But wait, someone famous once said: “Insanity is repeating the same mistakes and expecting different results.”  Let’s run my program again with a 10046 trace at level 12 enabled.  This time, I found that the following was the last SQL statement executed by my program before the crash:

PARSING IN CURSOR #1078422680 len=1257 dep=0 uid=172 oct=3 lid=172 tim=4031964081030 hv=4064672182 ad='3a68828c8' sqlid='4cj96t7t4bydq'
SELECT 
  LT.TRANSACTION_ID, 
  WO.PART_ID, 
  LT.WORKORDER_BASE_ID, 
  LT.WORKORDER_LOT_ID, 
  LT.WORKORDER_SUB_ID, 
  LT.OPERATION_SEQ_NO, 
  LT.TYPE, 
  LT.INDIRECT_ID, 
  LT.RESOURCE_ID, 
  LT.CLOCK_IN, 
  NVL2(LT.HOURS_WORKED,LT.CLOCK_OUT,NULL) CLOCK_OUT, 
  NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)) HOURS_WORKED, 
  LT.GOOD_QTY, 
  COUNT(*) OVER (PARTITION BY LT.EMPLOYEE_ID) LABOR_TICKETS, 
  SUM(NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2))) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_HOURS, 
  SUM(DECODE(LT.TYPE,'I',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_INDIRECT_HOURS, 
  SUM(DECODE(LT.TYPE,'S',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_SETUP_HOURS, 
  SUM(DECODE(LT.TYPE,'R',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_RUN_HOURS 
FROM 
  LABOR_TICKET LT, 
  WORK_ORDER WO 
WHERE 
  LT.EMPLOYEE_ID= :1 
  AND LT.SHIFT_DATE BETWEEN :2 AND :3 
  AND WO.TYPE(+)='W' 
  AND WO.SUB_ID(+)='0' 
  AND LT.WORKORDER_TYPE=WO.TYPE(+) 
  AND LT.WORKORDER_BASE_ID=WO.BASE_ID(+) 
  AND LT.WORKORDER_LOT_ID=WO.LOT_ID(+) 
  AND LT.WORKORDER_SPLIT_ID=WO.SPLIT_ID(+)
END OF STMT
PARSE #1078422680:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3199062715,tim=4031964081030
BINDS #1078422680:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=40476838  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=40476858  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=40476860  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1078422680:c=0,e=188,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4162394183,tim=4031964081369
WAIT #1078422680: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031964081423
FETCH #1078422680:c=0,e=79,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=4162394183,tim=4031964081539
STAT #1078422680 id=1 cnt=1 pid=0 pos=1 obj=0 op='WINDOW BUFFER (cr=4 pr=0 pw=0 time=81 us cost=9 size=101 card=1)'
STAT #1078422680 id=2 cnt=1 pid=1 pos=1 obj=0 op='FILTER  (cr=4 pr=0 pw=0 time=35 us)'
STAT #1078422680 id=3 cnt=1 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 time=33 us cost=8 size=101 card=1)'
STAT #1078422680 id=4 cnt=1 pid=3 pos=1 obj=69863 op='TABLE ACCESS BY INDEX ROWID LABOR_TICKET (cr=4 pr=0 pw=0 time=28 us cost=7 size=73 card=1)'
STAT #1078422680 id=5 cnt=1 pid=4 pos=1 obj=70966 op='INDEX RANGE SCAN IND_LT_SHIFT_DATE (cr=3 pr=0 pw=0 time=21 us cost=6 size=0 card=1)'
STAT #1078422680 id=6 cnt=0 pid=3 pos=2 obj=70072 op='TABLE ACCESS BY INDEX ROWID WORK_ORDER (cr=0 pr=0 pw=0 time=3 us cost=1 size=28 card=1)'
STAT #1078422680 id=7 cnt=0 pid=6 pos=1 obj=71722 op='INDEX UNIQUE SCAN SYS_C0021989 (cr=0 pr=0 pw=0 time=2 us cost=0 size=0 card=1)'

*** 2012-10-19 09:40:38.424
WAIT #1078422680: nam='SQL*Net message from client' ela= 9641490 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031973723245 

A different SQL statement was found in the trace file when my program crashed… do the SQL statements have anything in common?  Both SQL statements are using analytic functions, and both are using date-type bind variables.  Interesting… what was that saying about insanity?  Let’s try another test of my program with an enabled 10046 trace to see the last SQL statement found in the trace before the crash:

PARSING IN CURSOR #881825936 len=210 dep=1 uid=0 oct=3 lid=0 tim=4030801312382 hv=864012087 ad='3eddb9a08' sqlid='96g93hntrzjtr'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #881825936:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312381
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=2
EXEC #881825936:c=0,e=95,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312551
WAIT #881825936: nam='db file sequential read' ela= 6526 file#=1 block#=98252 blocks=1 obj#=427 tim=4030801319112
WAIT #881825936: nam='db file sequential read' ela= 759 file#=1 block#=100271 blocks=1 obj#=425 tim=4030801319923
FETCH #881825936:c=0,e=7396,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801319963
STAT #881825936 id=1 cnt=1 pid=0 pos=1 obj=425 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=2 pw=0 time=7394 us)'
STAT #881825936 id=2 cnt=1 pid=1 pos=1 obj=427 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=1 pw=0 time=6583 us)'
CLOSE #881825936:c=0,e=46,dep=1,type=3,tim=4030801320046
...
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=11
EXEC #881825936:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321733
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801321767
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801321802
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=33
EXEC #881825936:c=0,e=152,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321992
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322027
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322062
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=35
EXEC #881825936:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801322195
FETCH #881825936:c=0,e=18,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322229
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322265
WAIT #881751720: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030801323194

*** 2012-10-19 09:21:14.806
WAIT #881751720: nam='SQL*Net message from client' ela= 8851219 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030810174461 

You might have noticed that the trace file showed dep=1 in the PARSING IN CURSOR line, which indicates that the SQL statement was executed automatically outside of my program’s control – likely when attempting to optimize another SQL statement that was submitted by my program.

Let’s test that insanity theory again.  Another execution of my program with an enabled 10046 trace at level 12 showed this as the last SQL statement executed:

PARSING IN CURSOR #767713824 len=503 dep=0 uid=172 oct=3 lid=172 tim=4026166358219 hv=2534735629 ad='3b58041b0' sqlid='7x2t5dqbj9zsd'
SELECT 
  N.ID NCMT_ID, 
  N.NCMT_DATE, 
  N.PART_ID, 
  N.WORKORDER_BASE_ID, 
  N.WORKORDER_LOT_ID, 
  N.WORKORDER_PIECE_NO, 
  N.QTY, 
  N.DISCREPANCY_TYPE, 
  N.RESOURCE_ID, 
  N.CUSTOMER_ID, 
  NVL(N.TOTAL_COST,0) NCMT_COST, 
  COUNT(N.ID) OVER (PARTITION BY N.EMPLOYEE_ID) NCMT_COUNT, 
  SUM(NVL(N.TOTAL_COST,0)) OVER (PARTITION BY N.EMPLOYEE_ID) TOTAL_COST 
FROM 
  NON_CONFORMING N 
WHERE 
  N.EMPLOYEE_ID= :1 
  AND N.NCMT_DATE BETWEEN :2 AND :3 
  AND UPPER(N.ERROR_BY)='OPERATOR' 
ORDER BY 
  ID
END OF STMT
PARSE #767713824:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026166358219
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc25d58  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc25d78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc25d80  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
...
PARSE #767713824:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045440
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc21f50  bln=32  avl=07  flg=05
  value="DUDE2"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc21f70  bln=07  avl=07  flg=01
  value="6/22/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc21f78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #767713824:c=0,e=125,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045593
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237045634
FETCH #767713824:c=0,e=57,p=0,cr=10,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045727
WAIT #767713824: nam='SQL*Net message from client' ela= 3080 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237048857
CLOSE #767713824:c=0,e=7,dep=0,type=3,tim=4026237048899
PARSE #767713824:c=0,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237048951
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237051177

*** 2012-10-19 08:06:40.151
WAIT #767713824: nam='SQL*Net message from client' ela= 98732762 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026335783988 

Well, that SQL statement also used date-type bind variables and analytic functions… maybe there is a problem with Oracle Database 11.2.0.2?  But, why is this suddenly a problem… what has changed?  Nearly two years ago I wrote a blog article that described an ODBC update problem that was present in the Oracle 11.2.0.1 Client – the first Oracle Client version that could be easily downloaded and installed on Windows 7 (it is also possible to patch the downloadable 10.2.0.3/4 client to 10.2.0.5).  The Oracle 11.2.0.1 patch 7 (10155837 for 32 bit Windows clients) fixes the false ODBC lock violation errors that are reported in Microsoft Access (and various other programs) when UPDATE SQL statements are executed, and those SQL statements contain bind variables.  That same Oracle Client version and patch version are still installed on the computer (and all other computers here that run Windows 7), so that is not the source of the change.  On a side note, have you found it difficult to locate updated 11.2.0.2 or 11.2.0.3 client versions in MOS (Metalink)?

So, the 10046 trace failed to find a consistent source for the crashes, and the Oracle client has not been changed on the Windows 7 computer in nearly two years.  Program bug?  Analytic query bug?  Bind variable related bug?  I went digging for the sqlnet.log file to see if that file provided any clues.  This is what I found:

Fri Oct 19 13:26:52 2012
Directory does not exist for read/write [C:\Oracle\product\11.2.0\client_1\log] [C:\Oracle\product\11.2.0\client_1\log\diag\clients]
Errors in file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc  (incident=401):
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
Incident details in: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc

That file’s contents proved to be a bit helpful, I think.  Let’s take a look at the contents of the file mentioned above:

Dump file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc
Dump continued from file: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
========= Dump for incident 401 (oci 24550 [3221225477]) ========
Tracing is in restricted mode!
----- Short Call Stack Trace -----
_dbgexPhaseII()+850<-_dbgexProcessError()+2061<-_dbgeExecuteForError()+43<-_dbgePostErrorDirect()+2368<-_kpeDbgSignalHandler()+225<-_skgesig_Win_UnhandledExceptionFilter()+140<-75D6003D<-7295FA2E<-0040AD0D<-77309EF0<-77309EC0<-00000000<-773079DD<-77306E72<-772FE0ED<-275C83DD<-275D219F<-275D1D88<-275D0E0E<-2759C677<-004CB035<-004DC4A3<-72991D31<-00410FD8<-7299202F----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
End of Incident Dump

Might the above indicate that my program logic is not the cause of the problem, but instead that it is the Oracle client that is crashing?  I started digging through various websites looking for a similar oci-24500 error message, but I found nothing that was helpful.  I searched through the bug database and technical article database on MOS (Metalink), and found nothing that was helpful.  By chance, I widened the search on MOS a bit and stumbled upon Mark Powell’s message in this Oracle Support Community thread (you need a MOS account to access the message).  That message pointed to two other MOS articles, at least one of which stated that the problem is (possibly – not exactly the same error message) corrected when the Oracle 11.2.0.3 Client is installed.  I struggled finding the 11.2.0.3 Client in MOS, and then found that it is located on disk 3 of 6 of the 11.2.0.3 server download (direct link – just specify operating system version and download disk 3 – p10404530_112030_platform_3of6.zip (requires a MOS account to download)).  The 11.2.0.3 client version magically corrected the seemingly random crashes in my programprogrammer error averted.

Insanity?  No more so than an expensive security camera that fails at motion detection, or a vehicle that decelerates when the driver commands the vehicle to accelerate during a light rain.  ;-)

P.S.: Sorry about the formatting of the blog article.  A WordPress update apparently disabled my custom CSS that expanded the article to the full width of the browser window – I am still in search of a fix for that problem.





Undo Quiz – Bringing the Right Tools for the Job

17 09 2012

September 17, 2012

Sometimes humor is a good recipe for undo.  A mid-50s man was driving down the road, riding a purple mule, when he saw a house with a new window.  Uninvited, the man decided to take a closer look using an obvious tool of choice, a tire iron (YPDNTI).  Elapsed time, roughly 60 seconds.  The man is obviously familiar with Oracle Database’s concept of transactions and issuing a ROLLBACK, but did not account for video recording in several autonomous transactions.  Even through a COMMIT was not issued, that does not imply that nothing happened in those 60 seconds. 

Riding a purple mule… strange.

On to the quiz.  Assume that there are three and only three users connected to an Oracle database (preferrably a test database of sorts).  The first session is connected as the SYS user, and the other two sessions are connected as a normal database user (the same user ID, but with access permissions for DBA_SOURCE).  Assume that session 1 creates a new undo tablespace that cannot grow beyond 1MB in size:

CREATE UNDO TABLESPACE SMALL_UNDO2 DATAFILE 'C:\Oracle\OraData\SAMPLE\small_undo2.dbf' SIZE 1M AUTOEXTEND OFF; 

And then that tablespace is set as the system default tablespace:

ALTER SYSTEM SET UNDO_TABLESPACE=SMALL_UNDO2;

In session 2, table T10 is created based on DBA_SOURCE:

CREATE TABLE T10 AS
SELECT
  *
FROM
  DBA_SOURCE;

In my case, I used the Oracle sample schema in Oracle Database 11.2.0.3 (tablespace for T10 is ASSM AUTOALLOCATE).  Let’s check the amount of space consumed (not necessarily all used for data strorage yet) for the T10 table:

SELECT
  ROUND(SUM(BYTES)/1024/1024,2) SIZE_MB
FROM
  USER_EXTENTS
WHERE
  SEGMENT_NAME='T10';

   SIZE_MB
----------
        88 

88MB – a nice even number.  With a 1MB maximum size for the undo tablespace, let’s see if the user is able to select all of the rows from the T10 table:

SET ARRAYSIZE 1000
SET AUTOTRACE TRACEONLY STATISTICS

SELECT
  *
FROM
  T10;

633054 rows selected.

Statistics
---------------------------------------------------
         11  recursive calls
          0  db block gets
      11352  consistent gets
      10635  physical reads
          0  redo size
   66576339  bytes sent via SQL*Net to client
       7322  bytes received via SQL*Net from client
        635  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     633054  rows processed 

That worked, but why?  Why didn’t the session receive an error when performing the 11,352 consistent gets to provide a consistent view of the data?

Maybe on the second execution we will see an error?

SELECT
  *
FROM
  T10;

633054 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11271  consistent gets
          0  physical reads
          0  redo size
   66576339  bytes sent via SQL*Net to client
       7322  bytes received via SQL*Net from client
        635  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     633054  rows processed

—–

The first quiz question is simple: How it is possible for Oracle Database to provide a read consistent view of a table that requires approximately 88MB of storage space when the file supporting the undo tablespace cannot exceed 1MB?  Maybe someone will be kind enough to share the answer with the OP in this message thread:
dbaforums.org/oracle/index.php?s=439a689327b6fbd09ba0018b320041d8&showtopic=21972

The second quiz question is also simple: Why did the first select of the table require 81 more consistent gets than the second select from that table?

—–

Continuing with the test case, this time working in session 3.  Let’s delete some rows from the table created in session 2:

DELETE FROM
  T10
WHERE
  ROWNUM<=300000; 

Oracle Database responses with the following message:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=300000;
  T10
  *
ERROR at line 2:
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO2'

The third question is slightly more challenging: Why is Oracle attempting to extend a segment when the session is deleting rows; also, what does the number 8 signify?

Let’s try again with a smaller number of rows:

DELETE FROM
  T10
WHERE
  ROWNUM<=30;

30 rows deleted. 

That worked.  Interestingly, the first time I tried the test case, I named the tablespace SMALL_UNDO and specified “RETENTION GUARANTEE” when creating the undo tablespace, and the following message appeared when executing the above command:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=30;
  T10
  *
ERROR at line 2:
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO'

After the above message appeared, session 3 successfully deleted 3 rows, 30 rows, and then 300 rows from table T10 without receiving an error:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=3;

3 rows deleted.

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=30;

30 rows deleted.

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=300;

300 rows deleted. 

The fourth question is again slightly more challenging: why was the 30 row delete from table T10 successful when the SMALL_UNDO2 undo tablespace was the default undo tablespace, yet that delete failed when the SMALL_UNDO tablespace was set as the default tablespace – until 3 rows were first deleted from the table?

Going back to the forum thread, another interesting question arises.  The fifth question is again challenging: What type of compression is used in undo logs?

I was curious about the last question.  In session 1 I executed the following:

SELECT
  R.NAME,
  T.XIDUSN,
  T.XIDSLOT,
  T.XIDSQN
FROM
  V$TRANSACTION T,
  V$ROLLNAME R
WHERE
  T.XIDUSN=R.USN; 

NAME                               XIDUSN    XIDSLOT     XIDSQN
------------------------------ ---------- ---------- ----------
_SYSSMU15_897375467$                   15          1          8

ALTER SESSION SET TRACEFILE_IDENTIFIER='UNDO_BLOCKS';
ALTER SYSTEM DUMP UNDO BLOCK '_SYSSMU15_897375467$' XID 15 1 8;

A portion of the generated trace file follows:

********************************************************************************
Undo Segment:  _SYSSMU15_897375467$ (15)
xid: 0x000f.001.00000008
Low Blk   :   (0, 0) 
High Blk  :   (2, 7) 
Object Id :   ALL 
Layer     :   ALL 
Opcode    :   ALL 
Level     :   2 

********************************************************************************
UNDO BLK:  Extent: 1   Block: 1   dba (file#, block#): 6,0x00000071
xid: 0x000f.001.00000008  seq: 0x4   cnt: 0x2c  irb: 0xa   icl: 0x0   flg: 0x0000

 Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x1f24     0x02 0x1e64     0x03 0x1dbc     0x04 0x1d34     0x05 0x1c68     
0x06 0x1b9c     0x07 0x1ae4     0x08 0x1a5c     0x09 0x1998     0x0a 0x18d8     
0x0b 0x1830     0x0c 0x17a8     0x0d 0x16f0     0x0e 0x162c     0x0f 0x15a4     
0x10 0x14ec     0x11 0x1424     0x12 0x1378     0x13 0x12f0     0x14 0x1234     
0x15 0x1168     0x16 0x10b8     0x17 0x1030     0x18 0x0f74     0x19 0x0ecc     
0x1a 0x0e44     0x1b 0x0d88     0x1c 0x0cbc     0x1d 0x0be8     0x1e 0x0b44     
0x1f 0x0a88     0x20 0x09bc     0x21 0x08e8     0x22 0x0840     0x23 0x0784     
0x24 0x06b8     0x25 0x05e4     0x26 0x0540     0x27 0x0484     0x28 0x03b8     
0x29 0x02e4     0x2a 0x023c     0x2b 0x0180     0x2c 0x00b4     

*-----------------------------
* Rec #0xa  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x09   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.09
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 16(0x10) size/delt: 89
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1f
col  4: [60]
 20 20 20 20 70 72 61 67 6d 61 20 42 55 49 4c 54 49 4e 28 27 58 4f 52 27 2c
 38 2c 20 33 2c 20 39 29 3b 20 2d 2d 20 50 45 4d 53 5f 49 4e 54 45 47 45 52
 2c 20 49 4e 54 5f 58 4f 52 0a

*-----------------------------
* Rec #0x9  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x08   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.08
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 15(0xf) size/delt: 90
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1e
col  4: [61]
 20 20 66 75 6e 63 74 69 6f 6e 20 58 4f 52 20 28 4c 45 46 54 20 42 4f 4f 4c
 45 41 4e 2c 20 52 49 47 48 54 20 42 4f 4f 4c 45 41 4e 29 20 72 65 74 75 72
 6e 20 42 4f 4f 4c 45 41 4e 3b 0a

*-----------------------------
* Rec #0x8  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x07   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.07
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 14(0xe) size/delt: 30
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1d
col  4: [ 1]  0a

*-----------------------------
* Rec #0x7  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x06   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.06
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 13(0xd) size/delt: 78
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1c
col  4: [49]
 20 20 2d 2d 20 20 77 68 65 6e 20 74 68 65 79 20 6f 63 63 75 72 20 69 6e 20
 63 6f 6e 64 69 74 69 6f 6e 61 6c 20 73 74 61 74 65 6d 65 6e 74 73 2e 0a 

It would not surprise me too much if a guide existed to assist in decompressing the undo logs (whatever that term implies).








Follow

Get every new post delivered to your Inbox.

Join 143 other followers