Install the Oracle Client on a Synology DiskStation DS1813+ or DS412+ for Nagios to Monitor Databases

26 07 2013

July 26, 2013

(Back to the Previous Post in the Series)

In the previous article of this series I provided steps to install and run the Nagios network monitoring utility on either a Synology DiskStation DS1813+ or a DS412+ for the purpose of pinging network devices to verify that the devices respond to ping requests.  That article may have seemed a bit out of place on a blog that is primarily intended to include notes about using Oracle Database.  In the previous article I hinted that there is a check_oracle plugin for Nagios, suggesting that a Synology DiskStation DS1813+ or DS412+ could be used to monitor Oracle databases.

This blog article does NOT use the check_oracle plugin.  There are a small handful of official Nagios plugins for Oracle Database, but this article does not use any of those plugins either.  Instead, I took a chance at crafting a custom Perl script to monitor Oracle databases, the first time I have ever used Perl.  I thought back to some of the earlier blog articles where I introduced a couple of Oracle Database monitoring scripts that were written in VBScript, including Working with Oracle’s Time Model Data 3 and Oracle Statistics Chart Viewer – I started seeing the possibilities of using Nagios to alert DBAs about potential issues.  Yet, I have never used Perl, and at the time did not even have a clue how to concatenate strings in the language (it turns out that there are more than two techniques).

Verify that Perl is installed on the DiskStation by accessing the Package Center in the DiskStation’s DSM web interface.  If there is an Install button under the Perl heading, that indicates that Perl is not yet installed – click the Install button and wait until that button changes to Installed.

InstallOracleClient-1

Let’s start by installing the Oracle Client on the DiskStation.  I will use the Oracle Instant Client version 11.2.0.3 (the 12.1.0.1 Instant Client throws a different error message than did the 11.2.0.3 and 11.1.0.7 clients, so I reverted back to the 11.2.0.3 version).  To download the Oracle Instant Client for Linux, visit this link.  You will need an OTN account to download the client files – I downloaded the files using a desktop computer, and then transferred the files to a custom created Config share that I set up on the DiskStation.  This article requires the Basic Instant Client package and the SQL*Plus add-on package for the Instant Client.

Connect to the DiskStation using Telnet as the root user (see the previous article for Telnet connection directions).  For consistency, the .zip files for the Oracle Instant Client will be copied to the downloads directory that was created in the previous article.  We will create an oracle directory in the downloads directory:

mkdir /volume1/downloads/oracle/

Next, the downloaded files for the Oracle Instant Client are copied from the config share that I created on the DiskStation (to allow transporting files from a desktop computer) to the /volume1/downloads/oracle directory that was just created, and then the Oracle Instant Client files are unzipped, which automatically creates the instantclient_11_2 directory:

cp /volume1/config/instantclient-basic-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cp /volume1/config/instantclient-sqlplus-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cd /volume1/downloads/oracle/
unzip -o instantclient-basic-linux-11.2.0.3.0.zip
unzip -o instantclient-sqlplus-linux-11.2.0.3.0.zip
cd instantclient_11_2

The Oracle Instant Client 11.2.0.3 (and 11.1.0.7) require a file named libaio.so.1 that is not installed by default on the DiskStation.  After searching the Internet for a while, I found a copy of that file in the /lib directory on a Red Hat Enterprise Linux 3 server.  Using a desktop computer, I copied that file to the DiskStation’s config share, as well as a copy of the tnsnames.ora file that includes the database connection information for the databases that must be monitored.  The libaio.so.1 file will be placed in the /volume1/downloads/oracle/instantclient_11_2 directory, and the tnsnames.ora file will be placed in the /etc directory (one of the directories that is automatically searched).

cp /volume1/config/libaio.so.1 .
cp /volume1/config/tnsnames.ora /etc

Let’s start up SQL*Plus:

./sqlplus /nolog
./sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory

An error…

ls
BASIC_README    adrci   glogin.sql   libclntsh.so.11.1  libocci.so.11.1  libocijdbc11.so  libsqlplusic.so  ojdbc6.jar  sqltest.sql  xstreams.jar
SQLPLUS_README  genezi  libaio.so.1  libnnz11.so        libociei.so      libsqlplus.so    ojdbc5.jar       sqlplus     uidrvci

The file mentioned in the error message exists in the correct directory.  That error is caused by an undefined environment variable.  Fixing the error and trying again:

LD_LIBRARY_PATH="/volume1/downloads/oracle/instantclient_11_2"
export LD_LIBRARY_PATH
./sqlplus /nolog

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 26 13:22:16 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL>

Try to connect to one of the databases that are defined in the tnsnames.ora file, and then try executing a simple SQL statement before exiting SQL*Plus (change testuser, password, and DBName as appropriate for your environment):

CONNECT testuser/password@DBName
Connected.
SELECT SYSDATE FROM DUAL;

SYSDATE
---------
26-JUL-13

EXIT

Now that we have verified that the Instant Client version of SQL*Plus works from the DiskStation, we will create a simple SQL script to verify that the Instant Client version of SQL*Plus is able to accept script names from a command line.  The script will be created in the same directory where the Oracle Instant Client is located (see the basic directions for using vi, as found in the previous article):

vi /volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The first line in the script instructs SQL*Plus to abort the execution of the script when an error is encountered; if the connection attempt fails, there is no point in attempting to execute any SQL statements that follow.  The second line connects to the database (change testuser, password, and DBName as appropriate for your environment).  The third line executes a simple SQL statement, selecting from a table (rather than the virtual table DUAL) – change the SQL statement to a valid statement for your database.  The last statement exits SQL*Plus once the SQL statement finishes executing:

WHENEVER SQLERROR EXIT SQL.SQLCODE
CONNECT testuser/password@DBName
SELECT ID, DESCRIPTION FROM T1 WHERE ID LIKE '8X%';
EXIT

Try using SQL*Plus to execute the script:

./sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The results from the SQL statement should display in the Telnet window, and then the normal DiskStation prompt should appear.  If the database instance was not running at the time of the execution, you would see something like this rather than the results of the SQL statement:

ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Process ID: 0
Session ID: 0 Serial number: 0

Nagios apparently expects its plugins to return one of four exit codes, as explained in this article, to determine whether or not the checked object is behaving as expected:

Exit Code Status
        0 OK
        1 WARNING
        2 CRITICAL
        3 UNKNOWN

The exit function in Perl permits returning a number, so that is one problem solved.  I want the Perl plugin to accept a command line parameter from Nagios so that I am able to check more than one database using the same Perl script – the Nagios command line parameter will specify the script to execute.  This article explains how to receive a command line parameter from Nagios, so that is a second problem that is solved.

Let’s put together a simple (OK, not simple for me) script that executes a SQL*Plus script that is specified by Nagios (the sqltest.sql file that was just created), and then returns 0 if no ORA- type error message is returned when the Oracle Instant Client’s SQL*Plus executes a script.  If an ORA- type error message is returned, then the script’s exit code is set to 2 and the ORA- errors are returned to Nagios.  Nagios apparently only accepts one line of output from the script, reading what is written by Perl’s print command.  As we saw earlier, the script may return multiple nested ORA- type errors if the database instance is down, so we need these multiple error messages to appear on one line of output from the Perl script.

vi /opt/libexec/check_oracle.pl

This is the script that I crafted after spending, quite literally, hours days searching the Internet for Perl command syntax that would execute on the DiskStation:

#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(:config no_ignore_case);
my $output = "";
my ($script);

my $result = GetOptions(
 "s|script=s"         => \$script,
  );

$ENV{"LD_LIBRARY_PATH"} = "/volume1/downloads/oracle/instantclient_11_2/";

my $ExitValue = 0;
print "Test Script: " . $script . ":  ";
open my $in, "/volume1/downloads/oracle/instantclient_11_2/sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/$script |";

while (my $line = <$in>)
  {
  if ($line =~ /^(ORA-\d{5})/)
    {
    #$line =~ s/\v//g;
    chomp($line);
    $output = $output . $line . " ; ";
    $ExitValue = 2;
    }
  }

close($in);
print $output;
exit($ExitValue);

The script defines the LD_LIBRARY_PATH environment variable, so the Nagios user should (hopefully) have no problems executing the Perl script.  There are apparently a half-dozen different ways to remove the end of line characters from the SQL*Plus output, chomp($line); worked for my test, but $line =~ s/\v//g; did not work quite as expected.

Save the script and exit vi.  Next, we need to make the Perl script executable:

chmod +x /opt/libexec/check_oracle.pl

In the earlier article we had to modify a couple of the Nagios configuration files – this time we will need to modify some of the same files, starting with the commands.cfg file.

vi /opt/etc/objects/commands.cfg

Locate the ‘check_ping’ command definition that we modified in the earlier article.  Below the } character for that command definition, add the following, which will tell Nagios that our Perl script exists, and that the script command should be recognized by Nagios as check_oracle_ch:

# check_oracle_ch command definition by Charles Hooper
define command{
        command_name    check_oracle_ch
        command_line    $USER1$/check_oracle.pl -s $ARG1$
        }

Save the commands.cfg file and exit vi.  Let’s create another Nagios configuration file that will be used to list the Oracle Databases (and their scripts) that will be checked:

vi /opt/etc/objects/oracle.cfg

On a new line in the oracle.cfg file, add the following:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server123       ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST   ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest.sql    ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Additionally, we need to instruct Nagios that there is now an additional configuration file to read when starting.  Save the oracle.cfg file and exit vi.  Execute the following command:

vi /opt/etc/nagios.cfg

Below the cfg_file=/opt/etc/objects/switch.cfg line in the file (added in the previous article), add the following line:

cfg_file=/opt/etc/objects/oracle.cfg

Save the nagios.cfg file and exit vi.  Since we have modified the Nagios configuration, we must restart Nagios.  Find the first Nagios process ID, then kill that process (as was described in the previous article):

ps

kill 2683

Verify that Nagios does not return an error message when reading the configuration files:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are reported, then start Nagios:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Nagios should be writing log entries to a file name nagios.log (as defined in the nagios.cfg file).  The tail command permits viewing the last few (100 in this case) lines from a specified file:

tail -n 100 /opt/var/nagios.log

If Perl is not installed, Nagios may send an email with the following in the Additional Info section:

(Return code of 127 is out of bounds - plugin may be missing)

If the database instance is down, Nagios should send an email, possibly with the following in the Additional Info section (note that the script name is listed first – a different script will be used to test each database):

Test Script: sqltest.sql:  ORA-01034: ORACLE not available : ORA-27101: shared memory realm does not exist :

When the database instance is running, and the SQL statement(s) in the script executes successfully for the first time, Nagios should send an email with just the test script listed in the Additional Info section:

Test Script: sqltest.sql:

Now that the we have confirmed that Nagios and the Perl script work OK together, create additional SQL scripts in the /volume1/downloads/oracle/instantclient_11_2/ directory to connect to the other databases to be monitored (strongly consider changing the permissions for these files, since the files include usernames and passwords for your databases).  For each of those SQL scripts, create an additional service entry in the oracle.cfg file.  For example, if a script named sqltest42.sql is created to test a database on server42, the service entry might be created as shown below:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server42        ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST42 ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest42.sql  ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Don’t forget to verify the nagios configuration before restarting Nagios.

Anyone up to the challenge of rewriting my Working with Oracle’s Time Model Data 3 script in Perl so that the script will work on the DiskStation?





Install Nagios on a Synology DiskStation DS1813+ or DS412+

19 07 2013

July 19, 2013 (Modified July 27, 2013, July 28, 2013, November 19, 2013)

(Forward to the Next Post in the Series)

—-

Update July 27, 2013:

  • I now have Nagios running on an ARM based Synology DiskStation DS212+.  Most of the steps are the same as outlined below, however there are a few additional errors that must be addressed (see below additional steps).
  • All of the ./configure commands should have included –prefix=/opt (rather than –prefix=/usr/local or completely omitting that parameter).  That change eliminates the need to copy the Nagios plugins to the correct location.  Possibly related, the -i parameter was unnecessary for the snmp and Nagios plugins make and make install commands when the ./configure command included the –prefix=/opt prefix.
  • The wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20×86%20Linux%203.2.11&#8230; download step for the gcc compiler is apparently unnecessary, at least on the Synology DiskStation DS212+ (see below).

—-

This article describes how to compile and run Nagios on a Synology DiskStation DS1813+ (64 bit) or Synology DiskStation DS412+ (32 bit, the 32 bit steps should also apply to the DS1812+) NAS, both of which utilize Intel Atom processors (cat /proc/cpuinfo indicates that the DS412+ is using a 2.13GHz Atom D2700, while the DS1813+ is using a 2.13GHz Atom D2701), and utilize the DSM 4.2 operating system.  Not all Synology DiskStation NAS devices use Intel based CPUs – some of the less expensive DiskStations use ARM type processors (see this link to determine the type of CPU installed in a specific DiskStation).  It may be possible to produce a working version of Nagios on NAS devices that do not have Intel 32 bit or 64 bit processors, but I have not yet fully tested the procedure.

Warning: A lot of what follows is based on experimentation, with the end goal of having Nagios running on a Synology DiskStation having the ability to ping devices on the network or the Internet, with an email sent to an administrator when a device stops responding to ping requests, and to send a second email when the device resumes responding to ping requests.  This functionality represents a small fraction of Nagios’ capabilities through the use of plugins.  File paths vary from one Linux distribution to the next, so that adds a bit of challenge to make certain that the files are placed in the required directory.  Copying a file to the wrong directory may temporarily disable the DiskStation and require the reinstallation of the Synology DSM operating system.  The directions below are not final, and quite likely do not represent the most efficient approaches to accomplish the end goal – but the directions will hopefully be “close enough to correct” to allow the average reader of this blog to ping and send email alerts from a DiskStation.

I have relied on the free Nagios network monitoring solution since 2002 to provide an early warning of problems associated with network attached equipment including servers, production floor computers, switches, printers, wireless access points, IP cameras, Internet connection stability, etc.  While I rely on Nagios’ alerting system, I am not an expert at configuring the Nagios network monitoring system; the Nagios configuration documentation may be downloaded here.

First, make certain that the Telnet Service (or SSH Service if that is preferred) is enabled on the DiskStation.  In the DiskStation’s Control Panel, click Terminal.

InstallNagiosDiskStation1

Place a checkmark next to Enable Telnet service (if the item is not already checked), and then click the Apply button.

InstallNagiosDiskStation2

Verify that the computer that you intend to use has a Telnet client.  For Windows 7, access the Programs link in the Control Panel, and then click the Turn Windows features on or off link.  Make certain that there is a checkmark next to Telnet Client, then click the OK button.

InstallNagiosDiskStation3

Open a command line (in Windows, Start – Run – type  cmd  and press the Enter key).  On the command line, type telnet followed by either the name of the DiskStation or the IP address of the DiskStation, then press the Enter key.  When prompted for a username, type root and press the Enter key.  Type the admin user’s password (that is used to access the DSM interface in a web browser) and press the Enter key.

InstallNagiosDiskStation4

The command line on the DiskStation is very similar to the command line on a Unix or Linux computer, and is somewhat similar to a Windows command line or MS-DOS command line (use / rather than \, use ls rather than dir, use vi rather than edit):

InstallNagiosDiskStation5

We first need to add ipkg support to the DiskStation, detailed directions may be viewed at this link.  The exact directions may be different for other DiskStation models, but the following directions work for both the DS1813+ and DS412+ (note that all files downloaded from the Internet will be placed on volume1 in the downloads directory – copy and paste the lines to the Telnet session, one line at a time):

cd /volume1
mkdir downloads
cd downloads
wget http://ipkg.nslu2-linux.org/feeds/optware/syno-i686/cross/unstable/syno-i686-bootstrap_1.2-7_i686.xsh
chmod +x syno-i686-bootstrap_1.2-7_i686.xsh
sh syno-i686-bootstrap_1.2-7_i686.xsh

The vi editor is used on the DiskStation to modify files; that vi editor is a bit challenging to use at first sight, so you may need help with a couple of basic commands (see this quick reference for other commands).  The commands in vi are case sensitive (i is not the same as I).  When a file is opened, press the i key on the keyboard to allow making changes to the file (such as typing commands, or deleting commands).  When finished making changes to the file press the Esc key.  Once the Esc key is pressed, type ZZ to save the changed file and quit, or :q! to quit without saving the changes.

Next, we must modify the file that establishes the environment for the root user, when that user connects to the DiskStation.  This change is needed as part of the ipkg installation.  Edit the .profile file used by the root user:

vi /root/.profile

Add a # character in front of the two lines that contain the word PATH, then save the file (see the brief directions above to switch between command and insert mode in vi):

InstallNagiosDiskStation6

Next, reboot the DiskStation by clicking the Restart button in the Synology DSM interface (note: it should be possible to type reboot in the Telnet interface, however the DiskStation locked up the one time I attempted to execute that command).

InstallNagiosDiskStation7

Once the DiskStation reboots, reconnect to the DiskStation using Telnet, connecting as the root user, just as was done earlier.

The ipkg command should now work on the command line.  First, request that an updated list of available packages is downloaded, then display that list of packages:

ipkg update
ipkg list

Next, download a couple of packages that will be used by the Nagios network monitoring tool.  Note that using ipkg to install packages is a lot easier than compiling source code, so have fun with the ipkg utility.  When installing the optware-devel package, an error may appear stating that there is an incompatibility between wget and wget-ssl – just ignore that error for now.

ipkg update wget-ssl
ipkg install optware-devel
ipkg install gcc
ipkg install libtool
ipkg install mysql

Next, we need to compile a file and copy a couple of files:

cd /opt/share/libtool/libltdl/
./configure --prefix=/opt
make all
make install

cp /usr/syno/apache/modules/mod_ext_filter.so /opt/libexec/mod_ext_filter.so
cp /usr/syno/apache/modules/*.* /opt/libexec/

Now, install the Apache package:

ipkg install apache

If an error message is displayed on screen about mod_ext_filter.so, then modify the /opt/etc/apache2/httpd.conf file and add a # in front of the line LoadModule ext_filter_module libexec/mod_ext_filter.so and save the file.  Re-execute the ipkg install apache command (note that the up arrow on the keyboard may be pressed to quickly retype one of the previously executed commands).

InstallNagiosDiskStation8

Using the DiskStation’s Control Panel, create a nagios group and a nagcmd group (the nagcmd group probably will not be used for anything specific).  These groups do not require any special DiskStation permissions.

InstallNagiosDiskStation9

Using the DiskStation’s Control Panel, create a nagios user and add that user to the nagios and nagcmd groups.  The nagios user does not require any specific DiskStation permissions.

Next, switch back to the Telnet session, download the Nagios source code, and compile the source code:

DiskStation DS212+ Notes:

The following ./configure call was used on the DS212+:

./configure --prefix=/opt --with-command-group=nagios --disable-nanosleep --enable-nanosleep=no

The ./configure aborted with the following error message:

checking for pthread_create in -lpthread... no
checking for pthread_mutex_init in -lpthread... no
checking for pthread_create in -lpthreads... no
checking for pthread_create in -llthread... no
checking if we need -pthread for threads... no
checking for library containing nanosleep... no
Error: nanosleep() needed for timing operations.

The test that threw the error is located roughly 63% of the way through the configure file (on roughly line 5635).  If the exit 1 line in the configure file is commented out, then the configure step will complete.  However, the make all command will then fail with the following error messages:

/volume1/downloads/nagios/base/nebmods.c:363: undefined reference to `dlclose'
nebmods.o: In function `neb_load_module':
/volume1/downloads/nagios/base/nebmods.c:218: undefined reference to `dlopen'
/volume1/downloads/nagios/base/nebmods.c:249: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:266: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:299: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:225: undefined reference to `dlerror'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v2@GLIBC_PRIVATE'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v2@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v1@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v1@GLIBC_PRIVATE'
collect2: ld returned 1 exit status
make[1]: *** [nagios] Error 1
make[1]: Leaving directory `/volume1/downloads/nagios/base'
make: *** [all] Error 2

After a bit of searching on the Internet, I found a page that suggested making the following changes (note that I unsuccessfully tried a couple of other steps that may have also partially corrected the issue):

mkdir /opt/arm-none-linux-gnueabi/lib_disabled
mv /opt/arm-none-linux-gnueabi/lib/libpthread* /opt/arm-none-linux-gnueabi/lib_disabled

cp /lib/libpthread.so.0 /opt/arm-none-linux-gnueabi/lib/
cd /opt/arm-none-linux-gnueabi/lib/
ln -s libpthread.so.0 libpthread.so
ln -s libpthread.so.0 libpthread-2.5.so

After making the above changes, I was able to run the configure and make all commands without receiving an error.

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz
tar xzf nagios-3.5.0.tar.gz
cd nagios
./configure --prefix=/opt --with-command-group=nagios
make all
make install
make install-init
make install-config
make install-commandmode

We apparently need to copy a couple of files to different locations at this point:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Undo the changes that were earlier made to the /root/.profile file, where # characters were added in front of any line that contained the word PATH.  Remove those # characters and save the file:

vi /root/.profile

(This part still needs some fine tuning to make the web interface work with Nagios.)  Edit the Nagios Makefile and change the line beginning with HTTPD_CONF to show HTTPD_CONF=/opt/etc/apache2/conf.d  Then save the file.

cd /volume1/downloads/nagios
vi Makefile

InstallNagiosDiskStation10

Execute the following command:

make install-webconf

Create a nagiosadmin user for the web administration, specify a password when prompted:

htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Update November 19, 2013:

GabrielM reported in a comment below that it may be necessary to specify the full path to the htpasswd program:

/usr/syno/apache/bin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Install a couple of additional ipkg packages that will be used by Nagios (the last package adds a ping utility that may be used by Nagios – the security permissions on the DiskStation prevent non-root users from using the built-in ping utility):

ipkg install openssl
ipkg install openssl-dev
ipkg install sendmail
ipkg install inetutils

A step that may or may not be required is to download a functioning C++ compiler (some of the commands below point to files provided with the C++ compiler) – it appears that there should already be a compiler on the DiskStation at this point (in /opt/bin), so the successful completion of this task of downloading a usable C++ compiler might not be required.

DiskStation DS212+ Notes:

These wget and tar steps were completely skipped on the DS212+

For the DiskStation DS1813+ 64 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Cedarview%29/gcc420_glibc236_x64_cedarview-GPL.tgz
tar zxpf gcc420_glibc236_x64_cedarview-GPL.tgz -C /usr/local/

For the DiskStation DS412+ 32 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.2%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Bromolow%29/gcc421_glibc236_x86_bromolow-GPL.tgz
tar zxpf gcc421_glibc236_x86_bromolow-GPL.tgz -C /usr/local/

Now the net-snmp source code is downloaded and extracted:

DiskStation DS212+ Notes:

The ./configure call on the DS212 (might also work on the other DiskStation models):

./configure –prefix=/opt

The make call threw several errors, including:

/bin/sh: arm-none-linux-gnueabi-ld: not found
make[2]: *** [../blib/arch/auto/NetSNMP/default_store/default_store.so] Error 127

Before running the make command on the DS212+, execute the following command:

ln -s /opt/bin/ld /opt/bin/arm-none-linux-gnueabi-ld

The -i parameter may be omitted when running the make and make install commands.

cd /volume1/downloads
wget http://sourceforge.net/projects/net-snmp/files/net-snmp/5.7.2/net-snmp-5.7.2.tar.gz
tar xzf net-snmp-5.7.2.tar.gz
cd net-snmp-5.7.2

For the DiskStation DS1813+ 64 bit, execute the following to compile the net-snmp source (note that this command uses the compiler that was downloaded):

env CC=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-gcc \
LD=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ld \
RANLIB=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ranlib \
CFLAGS="-I/usr/local/x86_64-linux-gnu/include" \
LDFLAGS="-L/usr/local/x86_64-linux-gnu/lib" \
./configure --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux --prefix=/usr/local

For the DiskStation DS412+ 32 bit, execute the following to compile the net-snmp source (note: I could not use any of the different compilers that I tried downloading due to the compilers crashing with one of two error messages, so this command uses the compiler in /opt/bin):

env CC=/opt/bin/i686-linux-gnu-gcc \
LD=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ld \
RANLIB=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-linux-gnu/lib" \
./configure --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local

Several prompts will appear on the screen when either of the two commands is executed.  I entered the following for the prompts:

Default version of SNMP to use (3): 3
System Contact Information: (Enter)
System Location (Unknown): (Enter)
Location to write logfile (/var/log/snmpd.log): /opt/var/snmpd.log
Location to write persistent information (/var/net-snmp): (Enter)

Two additional commands to execute:

make -i
make install -i

Now we need to download the source code for the Nagios plugins (check_apt, check_breeze, check_by_ssh, check_clamd, check_cluster, check_dhcp, check_disk, check_disk_smb, check_dns, check_dummy, check_file_age, check_flexlm, check_ftp, check_http, check_icmp, check_ide_smart, check_ifoperstatup, check_ifstatus, check_imap, check_ircd, check_jabber, check_ldap, check_ldaps, check_load, check_log, check_mailq, check_mrtg, check_mrtgtraf, check_mysql, check_mysql_query, check_nagios, check_nntp, check_nntps, check_nt, check_ntp, check_ntp_peer, check_ntp_time, check_nwstat, check_oracle, check_overcr, check_ping, check_pop, check_procs, check_real, check_rpc, check_sensors, check_simap, check_smtp, check_snmp, check_spop, check_ssh, check_ssmtp, check_swap, check_tcp, check_time, check_udp, check_ups, check_users, check_wave) that allow Nagios to perform various monitoring tasks:

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
tar xzf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16/

Update November 19, 2013:

GabrielM reported in a comment below that the occasionally changing “current version” of the Nagios plugins makes it difficult to download the plugins from the source shown above.  If you open the http://prdownloads.sourceforge.net/sourceforge/nagiosplug/ web page in a web browser, the web browser will be redirected to http://sourceforge.net/projects/nagiosplug/files/ which contains the following statement:

“The Nagios Plugins are no longer distributed via SourceForge. For downloads and other information, please visit: https://www.nagios-plugins.org/
Source: README.md, updated 2013-10-01″

If you follow that link and then click the Download heading at the top of the page, there should be a link on the page that allows access to the current version of the Nagios plugins.  That link is currently: https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

The command that GabrielM provided should work:

wget https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

DiskStation DS212+ Notes:

The following configure, make, and make install commands were used:

./configure --prefix=/opt --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w"
make
make install

For the DiskStation DS1813+ 64 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux
make -i 
make install -i

For the DiskStation DS412+ 32 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local
make -i 
make install -i

Copy the Nagios plugins to the location expected by Nagios:

DiskStation DS212+ Notes:

The plugins were installed in the correct location on the DS212+

cp /usr/local/nagios/libexec/*.* /opt/libexec
cp /usr/local/nagios/libexec/* /opt/libexec
cp /usr/local/libexec/check_* /opt/libexec

Update November 19, 2013:

GabrielM reported in a comment below that the third command above may fail.  Depending on the compile options used, the first two commands or the third command may fail.  The first two commands are intended to accomplish the same task as the third command; the first two commands or the last command are expected to fail, but all three commands should not fail.  I should have explained this potential area of concern better.

Copy the Nagios startup script to the correct location so that Nagios will automatically start when the DiskStation is rebooted:

cp /usr/local/etc/rc.d/nagios /opt/etc/init.d/S81nagios

Verify that the ownership of the nagios directory is set correctly:

DiskStation DS212+ Notes:

The file is actually in the /opt/bin directory, so use this command instead:

chown nagios:nagios /opt/bin/nagios/nagios -R
chown nagios:nagios /usr/local/nagios -R

In addition to the main /opt/etc/nagios.cfg Nagios file, there are several other configuration files that are potentially used by Nagios (defined in the nagios.cfg file):

/opt/etc/objects/commands.cfg
/opt/etc/objects/contacts.cfg
/opt/etc/objects/timeperiods.cfg
/opt/etc/objects/templates.cfg
/opt/etc/objects/localhost.cfg
/opt/etc/objects/windows.cfg
/opt/etc/objects/server.cfg
/opt/etc/objects/switch.cfg
/opt/etc/objects/printer.cfg

We need to make a couple of adjustments in the  /opt/etc/objects/commands.cfg file.

vi /opt/etc/objects/commands.cfg

Change the ‘notify-host-by-email’ command definition section as follows:

define command{
    command_name notify-host-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘notify-service-by-email’ command definition section as follows:

define command{
    command_name notify-service-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘check_ping’ command definition section as follows (feel free to read the documentation for check_ping and specify different values):

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000,25% -c 5000,90% -p 3 
        }

Save the file and exit vi.

At this point, the Nagios network monitoring utility will likely experience an error similar to the following when attempting to send an alert email:

output=collect: Cannot write ./dfr6BFFPC7027203 (bfcommit, uid=1026, gid=25): Permission denied

Execute the following commands, which should fix the above problem:

chmod g+w /opt/var/spool/clientmqueue
chmod 444 /opt/etc/mail/*.cf
chmod 7555 /opt/sbin/sendmail

We will need to use su to test the execution of various commands as the nagios user.  Without this fix (described here), you might see the following error message:

su: warning: cannot change directory to /var/services/homes/nagios: No such file or directory su: /sbin/nologin: No such file or directory

Enter the following commands:

mkdir /var/services/homes
mkdir /var/services/homes/nagios
chown nagios:nagios /var/services/homes/nagios -R
vi /etc/passwd

Locate the line in the passwd file for the Nagios user.  Near the end of the line, /sbin/nologin should appear.  Replace that text with /bin/ash then save and exit vi.

Verify that the Nagios user is able to execute the check_ping plugin.  Replace MyDeviceHere with either an IP address or a network device name that is on your network:

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the ping command (called by check_ping) is not able to resolve a network device name, and the fully qualified dns name was not specified (MyDeviceHere.MyDomainHere.com), edit the /etc/resolv.conf file:

vi /etc/resolv.conf

On a new line in the file, add the following line (replacing MyDomainHere.com with your dns domain name for the network):

search MyDomainHere.com

Verify that sendmail works for the Nagios user.  At the prompt that appears, type a short message, press the Enter key, type a period, then press the Enter key again – replace MyEmailAddressHere@MyDomainHere.com with your email address):

su - nagios -c "/opt/sbin/sendmail -vt MyEmailAddressHere@MyDomainHere.com"

—-

It is important to always verify the Nagios configuration before starting (or restarting after a configuration change) Nagios.  To verify the configuration type the following:

/opt/bin/nagios -v /opt/etc/nagios.cfg

To start up Nagios as a background task (daemon), execute the following:

/opt/bin/nagios -d /opt/etc/nagios.cfg

To stop Nagios that is executing as a background task, type:

ps

InstallNagiosDiskStation11

Then search though the list of processes for the first line that shows /opt/bin/nagios -d /opt/etc/nagios.cfg.  The number at the left of that line, 31152 in this case, is used to stop Nagios.  To stop Nagios, type the following (replace 31152 with the number shown on your screen):

kill 31152

Side note: I tried installing quite a few different C++ compilers that supposedly work with the Synology DSM (see here).  As such, I had to find a way to remove a directory, that directory’s subdirectories, and files.  The following command will completely remove the /usr/local/i686-linux-gnu directory, should the need arise:

rm -rf /usr/local/i686-linux-gnu

At this point, Nagios will hopefully run as a background task, and it should be able to ping and send email alerts.  However, if you were following the above directions, we have not yet instructed Nagios which devices to monitor, and to whom the alert emails should be sent.  The next step is to define the email contacts by modifying the /opt/etc/objects/contacts.cfg file (see the documentation for assistance):

vi /opt/etc/objects/contacts.cfg

After setting up the contacts, we should probably tell Nagios which devices to monitor.  If there are a lot of devices on your network to be monitored, you might find that using Microsoft Excel rather than vi to create the object definitions makes the task more manageable.  Set up a simple worksheet with four columns.  Column A will be used to specify the short host_name for the object to be monitored.  Column B will be used to specify the alias (long description for the object).  Column C will be used to either specify the IP address for the device or the network name for the device.  Column D will be used to identify the group to which the object belongs and the file name to which the definition is saved (the Excel macro supports the following groups: ap, camera, computer, external, other, printer, server, switch).

InstallNagiosDiskStation13

The Excel macro is set up to read a tab delimited file, rather than reading the object description directly from the Excel worksheet.  Highlight all of the rows in the worksheet except for the top header row, and press Ctrl C (or edit – Copy) to copy the definitions to the Windows clipboard in tab delimited format.  Start Notepad (Start – Run – Notepad), and then press Ctrl V (or edit – Paste) to paste the tab delimited object descriptions into Notepad.  The Excel macro code expects the text file to be saved as nagioshosts.txt.

The Excel macro code follows (I image that not many computers still have a second floppy drive installed, so change the B:\Hardware Documentation\Synology\ path as appropriate for your environment):

Private Sub cmdProcessText_Click()
    Dim intFileNumRead As Integer
    Dim intFileNumAP As Integer
    Dim intFileNumCamera As Integer
    Dim intFileNumComputer As Integer
    Dim intFileNumExternal As Integer
    Dim intFileNumOther As Integer
    Dim intFileNumPrinter As Integer
    Dim intFileNumServer As Integer
    Dim intFileNumSwitch As Integer
    Dim intFileNumWrite As Integer

    Dim strLine As String
    Dim strItem() As String

    intFileNumRead = FreeFile
    Open "B:\Hardware Documentation\Synology\nagioshosts.txt" For Input As #intFileNumRead

    intFileNumAP = FreeFile
    Open "B:\Hardware Documentation\Synology\ap.cfg" For Output As intFileNumAP
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "# ap.cfg - lists the wireless access points to be monitored"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "define hostgroup{"; Chr(10);
    Print #intFileNumAP, "        hostgroup_name  ap                      ; The name of the hostgroup"; Chr(10);
    Print #intFileNumAP, "        alias           Local Access Points       ; Long name of the group"; Chr(10);
    Print #intFileNumAP, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumCamera = FreeFile
    Open "B:\Hardware Documentation\Synology\camera.cfg" For Output As intFileNumCamera
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "# camera.cfg - lists the IP cameras to be monitored"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "define hostgroup{"; Chr(10);
    Print #intFileNumCamera, "        hostgroup_name  camera                  ; The name of the hostgroup"; Chr(10);
    Print #intFileNumCamera, "        alias           Local IP Cameras          ; Long name of the group"; Chr(10);
    Print #intFileNumCamera, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumComputer = FreeFile
    Open "B:\Hardware Documentation\Synology\computer.cfg" For Output As intFileNumComputer
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "# computer.cfg - lists the shop floor computers to be monitored"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "define hostgroup{"; Chr(10);
    Print #intFileNumComputer, "        hostgroup_name  computer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumComputer, "        alias           Domain Computers          ; Long name of the group"; Chr(10);
    Print #intFileNumComputer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumExternal = FreeFile
    Open "B:\Hardware Documentation\Synology\external.cfg" For Output As intFileNumExternal
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "# external.cfg - lists the devices external to the LAN network to be monitored"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "define hostgroup{"; Chr(10);
    Print #intFileNumExternal, "        hostgroup_name  external               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumExternal, "        alias           Monitored devices External to the Network ; Long name of the group"; Chr(10);
    Print #intFileNumExternal, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumOther = FreeFile
    Open "B:\Hardware Documentation\Synology\other.cfg" For Output As intFileNumOther
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "# other.cfg - lists the miscellaneous devices to be monitored"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "define hostgroup{"; Chr(10);
    Print #intFileNumOther, "        hostgroup_name  other                 ; The name of the hostgroup"; Chr(10);
    Print #intFileNumOther, "        alias           Miscellaneous Devices ; Long name of the group"; Chr(10);
    Print #intFileNumOther, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumPrinter = FreeFile
    Open "B:\Hardware Documentation\Synology\printer.cfg" For Output As intFileNumPrinter
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "# printer.cfg - lists the printer devices to be monitored"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "define hostgroup{"; Chr(10);
    Print #intFileNumPrinter, "        hostgroup_name  printer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumPrinter, "        alias           Printers and Copiers  ; Long name of the group"; Chr(10);
    Print #intFileNumPrinter, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumServer = FreeFile
    Open "B:\Hardware Documentation\Synology\server.cfg" For Output As intFileNumServer
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "# server.cfg - lists the servers to be monitored"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "define hostgroup{"; Chr(10);
    Print #intFileNumServer, "        hostgroup_name  server               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumServer, "        alias           Server and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumServer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumSwitch = FreeFile
    Open "B:\Hardware Documentation\Synology\switch.cfg" For Output As intFileNumSwitch
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "# switch.cfg - lists the network equipment type devices to be monitored"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "define hostgroup{"; Chr(10);
    Print #intFileNumSwitch, "        hostgroup_name  switch               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumSwitch, "        alias           Switche and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumSwitch, "        }"; Chr(10); Chr(10); Chr(10);

    Do While Not (EOF(intFileNumRead))
        Line Input #intFileNumRead, strLine
        strItem = Split(strLine, vbTab)
        'strItem(0) = host_name
        'strItem(1) = alias
        'strItem(2) = address
        'strItem(3) = hostgroups
        Select Case strItem(3)
            Case "ap"
                intFileNumWrite = intFileNumAP
            Case "camera"
                intFileNumWrite = intFileNumCamera
            Case "computer"
                intFileNumWrite = intFileNumComputer
            Case "external"
                intFileNumWrite = intFileNumExternal
            Case "other"
                intFileNumWrite = intFileNumOther
            Case "printer"
                intFileNumWrite = intFileNumPrinter
            Case "server"
                intFileNumWrite = intFileNumServer
            Case "switch"
                intFileNumWrite = intFileNumSwitch
        End Select

        Print #intFileNumWrite, "define host{"; Chr(10);
        Select Case strItem(3)
            Case "ap"
                Print #intFileNumWrite, "        use             ap              ; Inherit default values from a template"; Chr(10);
            Case "camera"
                Print #intFileNumWrite, "        use             camera          ; Inherit default values from a template"; Chr(10);
            Case "computer"
                Print #intFileNumWrite, "        use             computer        ; Inherit default values from a template"; Chr(10);
            Case "external"
                Print #intFileNumWrite, "        use             external        ; Inherit default values from a template"; Chr(10);
            Case "other"
                Print #intFileNumWrite, "        use             other           ; Inherit default values from a template"; Chr(10);
            Case "printer"
                Print #intFileNumWrite, "        use             printer         ; Inherit default values from a template"; Chr(10);
            Case "server"
                Print #intFileNumWrite, "        use             server          ; Inherit default values from a template"; Chr(10);
            Case "switch"
                Print #intFileNumWrite, "        use             switch          ; Inherit default values from a template"; Chr(10);
        End Select
        Print #intFileNumWrite, "        host_name       "; strItem(0); "         ; The name we're giving to this device"; Chr(10);
        Print #intFileNumWrite, "        alias           "; strItem(1); "         ; A longer name associated with the device"; Chr(10);
        Print #intFileNumWrite, "        address         "; strItem(2); "         ; IP address of the device"; Chr(10);
        Print #intFileNumWrite, "        hostgroups      "; strItem(3); "         ; Host groups this device is associated with"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);

        Print #intFileNumWrite, "define service{"; Chr(10);
        Print #intFileNumWrite, "        use                     generic-service ; Inherit values from a template"; Chr(10);
        Print #intFileNumWrite, "        host_name               "; strItem(0); "        ; The name of the host the service is associated with"; Chr(10);
        Print #intFileNumWrite, "        service_description     PING            ; The service description"; Chr(10);
        Print #intFileNumWrite, "        check_command           check_ping!3000,25%!5000,90%    ; The command used to monitor the service"; Chr(10);
        Print #intFileNumWrite, "        normal_check_interval   5               ; Check the service every 5 minutes under normal conditions"; Chr(10);
        Print #intFileNumWrite, "        retry_check_interval    1               ; Re-check the service every minute until its final/hard state is determined"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);
    Loop

    Close #intFileNumRead
    Close #intFileNumAP
    Close #intFileNumCamera
    Close #intFileNumComputer
    Close #intFileNumExternal
    Close #intFileNumOther
    Close #intFileNumPrinter
    Close #intFileNumServer
    Close #intFileNumSwitch
End Sub

The files that are created use Unix/Linux standard line feed end of line marker characters, rather than the Windows standard carriage return/line feed combination characters.  As such, opening the generated files using Notepad is not advised.  Copy the generated files back to the /opt/etc/objects/ path on the DiskStation (copy the files to a Shared Folder on the DiskStation, then use the cp command to copy the files from the share location to /opt/etc/objects/ – the Shared Folders are typically created as a subdirectory in the /volume1/ directory).

If you decided to use some of the non-standard Nagios group names (as I did), those non-standard group names must be defined in the /opt/etc/objects/templates.cfg file:

vi /opt/etc/objects/templates.cfg

A portion of the additional entries that I made in this file include the following:

define host{
       name                    ap      ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, access points are monitored round t
       check_interval          5               ; Actively check the access point every 5 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each access point 10 times (max)
       check_command           check_ping      ; Default command to check if access points are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   30              ; Resend notifications every 30 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              ap ; Host groups that access points should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

define host{
       name                    camera  ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, cameras are monitored round t
       check_interval          60              ; Actively check the device every 60 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each device 10 times (max)
       check_command           check_ping      ; Default command to check if device are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   240             ; Resend notifications every 240 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              camera ; Host groups that cameras should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

Nagios will not know that it should read the additional configuration files until it is told to do so by modifying the /opt/etc/nagios.cfg file.

vi /opt/etc/nagios.cfg

Add the following lines to the nagios.cfg file:

# Charles Hooper's object types
cfg_file=/opt/etc/objects/ap.cfg
cfg_file=/opt/etc/objects/camera.cfg
cfg_file=/opt/etc/objects/computer.cfg
cfg_file=/opt/etc/objects/external.cfg
cfg_file=/opt/etc/objects/other.cfg
cfg_file=/opt/etc/objects/printer.cfg
cfg_file=/opt/etc/objects/server.cfg
cfg_file=/opt/etc/objects/switch.cfg

We have made a large number of changes to the configuration files, so it is important to verify that there are no errors in the configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are found in the configuration, terminate (kill) nagios and then restart as described above.

—-

Update July 28, 2013:

When attempting to start Nagios in daemon mode (/opt/bin/nagios -d /opt/etc/nagios.cfg) I encountered a couple of problems related to permissions for the Nagios user.  The nagios process was not listed when I used the ps command.  I then tried executing the following commands:

touch /opt/var/nagios.log
chown nagios:nagios /opt/var/nagios.log

Nagios was then able to start in daemon mode, but wrote messages similar to the following in the /opt/var/nagios.log file:

[1375058364] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058364] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058364] Bailing out due to errors encountered while attempting to daemonize… (PID=11451)
[1375058656] Nagios 3.5.0 starting… (PID=12936)
[1375058656] Local time is Sun Jul 28 20:44:16 EDT 2013
[1375058656] LOG VERSION: 2.0
[1375058656] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058656] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058656] Bailing out due to errors encountered while attempting to daemonize… (PID=12936)
[1375060107] Error: Unable to create temp file for writing status data: Permission denied
[1375060117] Error: Unable to create temp file for writing status data: Permission denied
[1375060127] Error: Unable to create temp file for writing status data: Permission denied
[1375060137] Error: Unable to create temp file for writing status data: Permission denied
[1375060147] Error: Unable to create temp file for writing status data: Permission denied
[1375060157] Error: Unable to create temp file for writing status data: Permission denied

I tried to set the permissions for a couple of other files, only to find another long list of Permission denied messages:

touch /opt/var/objects.cache
touch /opt/var/nagios.lock
touch /opt/var/nagios.tmp
chown nagios:nagios /opt/var/objects.cache
chown nagios:nagios /opt/var/nagios.lock
chown nagios:nagios /opt/var/nagios.tmp

I then recalled that I had seen similar messages on the DiskStation DS412+.  I then tried a different approach, creating a nagios directory in the /opt/var directory, creating a couple of subdirectories in that directory, and then assigning nagios as the owner of that directory structure:

mkdir /opt/var/nagios
mkdir /opt/var/nagios/archives
mkdir /opt/var/nagios/spool
mkdir /opt/var/nagios/spool/checkresults
chown nagios:nagios /opt/var/nagios -R
vi /opt/etc/nagios.cfg

In the nagios.cfg file, I made the following changes:

log_file=/opt/var/nagios/nagios.log
status_file=/opt/var/nagios/status.dat
lock_file=/opt/var/nagios/nagios.lock
temp_file=/opt/var/nagios/nagios.tmp
log_archive_path=/opt/var/nagios/archives
check_result_path=/opt/var/nagios/spool/checkresults
state_retention_file=/opt/var/nagios/retention.dat
debug_file=/opt/var/nagios/nagios.debug

After saving the file and exiting vi, I restarted Nagios in daemon mode.  Reading the last 100 lines of the Nagios log file is now accomplished with this command:

tail -n 100 /opt/var/nagios/nagios.log

—-

There are a lot of seemingly interesting Nagios plugins, including check_oracle (I believe that this plugin requires the Oracle client to be installed – good luck with that install).  On one of the DiskStations the check_snmp plugin did not compile, while on the other DiskStation the check_http plugin did not compile.

It might be interesting to see what solutions readers are able to develop from the above starting point.  The above information is the result of many hours of experimentation as well as a couple minutes reading through sections of the Nagios documentation (it reads like the Oracle Database documentation, so it should be an easy read once I am in the right mood) and hopelessly scanning the ‘net for information about obscure error messages.  Have fun, and try not to put the DiskStation out of service due to a mistaken file copy.

Update November 19, 2013:

Installing an updated version of the Synology DSM operating system may temporarily disable Nagios.  Make backups of all Nagios confirguration files (copying the files with the cp command to a directory in /volume1 is generally safe) before installing different versions of the Synology DSM operating system.

The DSM 4.3 operating system installation apparently removed the /var/services/homes directory.  That directory removal makes it impossible for the Nagios user to login to run various commands.  I assume that the removal of the homes directory is intentional, so a work around for that problem:

mkdir /var/services/home
mkdir /var/services/home/nagios
chown nagios:nagios /var/services/home/nagios -R
vi /etc/passwd

In the /etc/passwd file, change all /homes/ entries to /home/ then save and exit vi.

The installation of the different DSM version (including versions before 4.3) will likely also replace/remove the libltdl.* files located in /opt/local/lib and /usr/lib, so we need to copy those files back into the correct directories:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Once the above items are copied, try executing the check_ping command as the nagios user (replace MyDeviceHere with either an IP address or the name of a device on your network).

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the DiskStation reports that the check_ping command was not found, then copy that file back to the /opt/libexec/ directory.  If the above command was successful, try verifying the Nagios configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If the verification was successful, start Nagios as a daemon:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Execute the ps command and verify that the above command is listed in the running processes:

ps

Finally, verify that Nagios is still set to start automatically as a daemon:

ls /opt/etc/init.d/S81nagios

If a file is listed when the above command is executed, then Nagios should now be fully repaired.

-





Value of Improving One’s Knowledge

24 03 2013

March 24, 2013

As I type this blog article I am in the middle of reading a second book on the topic of Windows Server 2008 R2.  I bought the books several months ago, possibly even a year or two ago, and just had not found the time to invest in reading yet another dry IT-related 1,500 page book.  I would much prefer to read an exciting Oracle Database book, but those books seem to be in short supply (and I really do need to finish these two Windows Server 2008 R2 books so that I can move onto the Windows Server 2012 books, the Exchange Server 2013 books, and the Oracle Database 12c books that will be released eventually).

Three years ago I put together an article about finding the motivation to keep reading through various IT books, and offered a suggested reading list for people interested specifically in various aspects of Oracle Database (totally unrelated, but I recall that the word aspect was a pet peeve of one of my college English professors… he would say, “What is an aspect?” – I probably have used that word about five times since then).  I think that my list of suggested books is still a decent list.  However, Kyle Hailey recently put together a related article, “Where to begin with Oracle and SQL” that is excellent, and a bit more up to date than my list – the only book on Kyle’s list that I have not read is written by Dan Tow.

So, for people working in the information technology (IT) disciplines, how much effort do you put into improving your knowledge, or just maintaining that knowledge?  Do you actively try to break things, in the hope of discovering a solution?  Do you participate in discussion forums, such as the OTN forums or Usenet?  Do you read magazines, books, and various online articles?  Or, do you simply continue to perform the same actions that you have for the last five, 10, or 20 years (if the RULE worked just fine back then, why change)?

I try to read at least a couple of IT focused books a year.  For many of the Oracle Database specific books read since mid-2008 I have tried to write formal reviews of those books, posted to Amazon.com.  One of those reviews stretched to 35.5 typewritten pages!  Early yesterday morning a reader left the following comment attached to that review:

Mar 22, 2013 11:43:09 PM PDT SavvyShopper says:
Charles, you seem Very Intelligent(?) and should then Write a Book, fixing all these issues, than writing so much … :) How do you get so much time , if you are really “Working” ??

Oddly, the person deleted their comment a mere two hours after I posted a reply.  My first reaction to the comment was “must spending time expanding one’s knowledge be mutually exclusive to being employed?”  That was not the first time that someone using Amazon left a similar comment attached to one of my reviews:

Mar 31, 2011 12:37:49 PM PDT Wanda A. Cadogan says:
And don’t you ever wonder how guys like Mr. Hooper have so much time to write a 24 page review?

Is it really that hard to believe that guys like Mr. Hooper take notes (sometimes extensive notes) while reading technical publications?

I noticed that SavvyShopper had previously reviewed a couple of Oracle Database related books, as well as a couple of woodworking products.  Quite a while ago I toyed a bit with woodworking (a bit in woodworking…. cuts both ways), as a matter of fact I thought that I was headed to college to become a woodworking instructor.  So, I possibly have an odd connection with SavvyShopper, might this be an opportunity for a teachable moment?  I noticed too that in one of his reviews that SavvyShopper appeared to be struggling with the concept that a softwood can in fact be harder than a hardwood.  I am a little disappointed that SavvyShopper deleted his comment after all the effort that I put into formulating a reply:

SavvyShopper,

Thank you for leaving your comment.

Some of my book reviews, especially when there are many errors in the book being reviewed, tend to be very long.  I read computer related books to improve my knowledge, whether that means that I am simply reminded how something works, or I am learning something entirely new.   Investing effort (and occasionally an excessive amount of time) into improving my skill set is important to me, and I hope that it is important to other people in the IT related professions.  The Oracle Database book reviews that I post to Amazon are mostly a by-product of that self-improvement process.

I am still happily employed at the same place where I have worked for more than a decade as an IT professional.  Making time for expanding one’s computer skills through reading sometimes means sacrificing time spent with hobbies or other outside of work activities.  I believe that it is that sacrifice that is an important ingredient in the formula for long-term gainful employment.

Woodworking is a hobby of mine, and judging by your reviews on Amazon, woodworking may also be one of your hobbies.  I see that you gave a thin kerf Forrest Woodworker II saw blade a 2 out of 5 rating (coincidentally the same rating that I gave to this book), primarily because you found that the blade did not impress you when cutting pine and redwood while building one project.  Pine, much more than redwood, leaves deposits of pitch on the face and sides of saw blade teeth (kerosene (or Simple Green) and an old toothbrush will remove the pitch deposits, but I think that there are commercial spray on products on the market now that serve the same purpose).  This buildup of pitch effectively reduces the saw blade’s ability to cut smoothly.  The thin kerf body of the saw blade that you reviewed is less resistant to flexing in a heavy cut, so combined with the pitch build up on the blade, you probably did experience poor quality cuts.  If you mounted the saw blade on an inexpensive saw with a weak/bad bearing setup, that will also explain the poor quality cut that you received.  If you want a treat, mount a regular kerf Woodworker II on an older, properly tuned 3HP or 5HP Unisaw (or Powermatic Model 66) and witness the glass smooth cuts in (even three inch thick) hardwood such as poplar, oak, walnut, or hard maple.

Getting back on topic of your comment, I currently have very little extra time to put into writing.  While probably not your intention, I do not see a book that summarizes mistakes found in other books and the accompanying corrections as selling more than a couple of dozen copies; based on some Oracle book authors remarkable ability to abuse the DMCA (Digital Millennium Copyright Act), such a book could also become a legal nightmare for the publishing company.  I have been approached a couple of times to see if I had an interest in writing a book on the topic of Oracle Database.  At this time, I simply cannot make that commitment (writer’s block, lack of available free time, required reading to stay current in the computer-related field, wanting to spend more time working with non-computer related routers and bits, etc.).

A couple of years ago I contributed to an Oracle Database specific book with fellow OakTable Network members.  The technical standards of the members of that group are extremely high (do a Google search, if you are interested).  As such, I would estimate that I spent roughly eight hours per page writing, testing, proofreading, rewriting, proofreading, testing, rewriting, and proofreading again.  It was quite an experience, especially when Oracle Corporation announced the general availability of a new major version release a day or two before the first draft of the section of the book had to be submitted to the publisher – more testing, rewriting, and proofreading of the book section followed after that first draft was submitted.

Thank you again for leaving the comment.

So, do you go out of your way to learn something new every day, or are you more likely to “let it roll” for five, 10, or 20 years?  For the record, I have a rather large stack of unread woodworking magazines (most of the unread magazines range from 0 to 7 years old) sitting on the shelf that are waiting for the time when spending time expanding one’s knowledge is mutually exclusive to being employed (in the IT industry).

A new logo in development – resawn from 1 inch x 2 inch quarter sawn oak.

Bookmatch Arrangement #1:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #2:

OLYMPUS DIGITAL CAMERA

Bookmatch Arrangement #3:

OLYMPUS DIGITAL CAMERA

The New Logo:

OLYMPUS DIGITAL CAMERA





What does it Mean when a Select Statement in Oracle is using 100% CPU?

14 02 2013

February 14, 2013

A couple of days ago I noticed that an interesting set of search keywords were used to access this blog.  The search keywords:

What does it mean when a select statement in Oracle is using 100% cpu

I had two thoughts when I first saw that set of search keywords:

  • Well Done!
  • What an inconsiderate Oracle developer!

I feel a bit conflicted about the two first thought answers.  Think about the above bullet points for a moment.

I was interested enough in the keywords that were phrased in the form of a question, that I went though the search keywords used to access this blog for the last two weeks, and found the following related search keywords:

  • ora-01861 Oracle causing cpu 100% usage
  • Oracle one session 100% cpu
  • Oracle cpu utilisation 100%, is it a good or bad sign
  • Oracle database server cpu utilization is 100

Interesting, and possibly thought provoking.  I recalled an article that I wrote on this blog roughly three years ago.

Possible related thoughts of a temporarily rogue DBA:

  1. I fixed your problem without changing the execution plan… take a look at the EXPLAIN PLAN output for yourself.
  2. I fixed your problem without changing the execution plan… the server is more efficient now that we have consolidated three more database instances onto the server.
  3. I fixed your problem without changing the execution plan… that PGA_AGGREGATE_TARGET parameter was clear on the other side of 100 MB, so I set it to 10 MB.
  4. I fixed your problem without changing the execution plan… a group of the guys in the office found a way to load the game Doom on the server.
  5. I fixed your problem without changing the execution plan… I took half of the hard drives out of the array and moved the server to a 10 Mb hub.

Took a close look at the bulletpointed search terms.  If someone were to ask you one of the following questions, how would you respond if you were earnestly attempting to help the person?

  1. What does it mean when a Select statement in Oracle is using 100% CPU?
  2. The server admin reported that the Oracle database server cpu utilization hits 100% too frequently, what should I do about the problem?
  3. Oracle is throwing an ORA-01861 error, and that is causing 100% CPU usage.  What should I do?

Share your thoughts.

Summary of Suggestions/Causes Shared by Readers (Last Updated February 14, 2013):

  • Intended or unintended Cartesian product between row sources – does the query have a DISTINCT clause.  (Rodger)
  • Intended or unintended data type conversions – is the data model storing numbers or date values in VARCHAR2 columns.  (Jeremy Kendrick)




Connecting to an Oracle Database with Visual Basic 6.0 on Windows 8 64 Bit

25 11 2012

November 25, 2012 (Modified December 7, 2012)

Compatibility problems?  Visual Basic 6.0, released in 1998, is not officially compatible with Windows 8 Pro 64 bit… or Windows 7, or Windows Vista.  But I still like the language a lot for its simplicity, rapid development, and significant pre-existing code base within my company.  Of course, Oracle Database 11.2.0.3 and the Oracle Client 11.2.0.3 are not officially supported on Windows 8, so maybe the quest is an exercise in futility.

Roughly a month ago a thread appeared on the OTN forums asking how to connect Visual Basic 2010 to Oracle Database Personal Edition on Windows 7 Pro 64 bit running on the same computer.  Several people offered very good advice to guide the OP.  The OP eventually asked how to connect Visual Basic 6 to Oracle Database Personal Edition running on the same Windows 7 Pro 64 bit computer.  As I mentioned, Visual Basic 6 is not officially compatible with Windows 7, and the fact that it is a 32 bit application running on a 64 bit Windows computer means that the 32 bit Oracle Client must also be installed on the computer.

Visual Basic 6.0 (note that you should install service pack 6 for Visual Basic 6, even if the installer locks up at the very end) will run just fine on Windows 7 Pro 64 bit (and probably on Windows 8 Pro 64 bit also), although drawing form objects is a little slow.  The automatic Windows updates may also pose an issue.  From time to time it may be necessary to re-register the 32 bit MSCOMCTL.OCX file to avoid error messages when opening Visual Basic projects.  To re-register that file, from a Windows command prompt (Run – cmd.exe) , enter the following:

cd \windows\syswow64
regsvr32 MSCOMCTL.OCX

Using Visual Basic 6.0, or any development environment for that matter, requires that certain components be selected for installation when installing the Oracle Client.  At a minimum, Oracle ODBC Driver 11.2.0.x.0 (for ODBC type connections), and Oracle Provider for OLE DB 11.2.0.x.0 (for OLE DB type connections) must be installed with the Oracle Client.

Let’s set up a simple Visual Basic project to demonstrate connecting to Oracle Database 11.2.0.3 running on the same Windows 8 computer.  First, when installing Visual Basic 6, you may want to install and register a couple of additional uncommonly used controls.  Those controls are located in the \COMMON\TOOLS\VB\CONTROLS folder on the Visual Basic install CD.  On a 64 bit computer, the controls should be copied to the C:\Windows\Syswow64 folder and then registered in that location using regsvr32 as demonstrated above with the MSCOMCTL.OCX file.  In some cases, a license file must also be imported into the Windows registry – those files have a .REG extension in the same location on the CD:

On to building the project.  First, we need to add a reference to allow the project to use ADO and the Oracle Provider for OLE DB that was installed with the Oracle Client.  From the Project menu, select References…:

Next, we need to add the most recent version of the Microsoft ActiveX Data Objects Library, version 6.1 for Windows 7 and Windows 8, version 6.0 for Vista, or version 2.8 for Windows XP.  After selecting the correct version, click the OK button:

Now let’s add a couple of additional controls to the project.  I first started using the Microsoft Grid Control in Visual Basic 2.0, so for old time’s sake let’s add that control to the project.  We may also want to add a status bar to the project’s form, so let’s also add Microsoft Windows Common Controls 6.0 (SP6).  Then click the OK button:

Now, draw the form controls on the form.  We need three command buttons named cmdLateBinding, cmdEarlyBinding, and cmdQueryDatabase, a grid control named grdOutput, and optionally a status bar named stbStatus with its Style property set to Simple:

Let’s add the code to the Late Binding (cmdLateBinding) button (this is essentially the same code that I provided in the OTN thread, and does not require the addition of the Microsoft ActiveX Data Objects Library in the references for the project – so this code is a good simple test to make certain that everything is installed correctly):

Dim intResult
Dim strDatabase
Dim strUserName
Dim strPassword
Dim strSQL
Dim dbDatabase
Dim snpData

'On Error Resume Next

strDatabase = "MyDB" 'From tnsnames.ora
strUserName = "MyUserID"
strPassword = "MyPassword"

Set snpData = CreateObject("ADODB.Recordset")
Set dbDatabase = CreateObject("ADODB.Connection")

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";"
dbDatabase.Open

If (dbDatabase.State = 1) And (Err = 0) Then
    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  SYSDATE CURRENT_DATE" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  DUAL"

    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        If Not (snpData.EOF) Then
            Do While Not (snpData.EOF)
                MsgBox snpData("current_date")

                snpData.MoveNext
            Loop
        End If
        snpData.Close
    End If
Else
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
End If

dbDatabase.Close
Set snpData = Nothing
Set dbDatabase = Nothing

Let’s run the project and click the Late Binding button to see what happens.  We should expect to see some sort of connection failure message, unless there is an entry in the tnsnames.ora for MyDB:

Well, that error message certainly was unexpected (the OP in the OTN thread may have encountered the same error message)!  Let’s click on the Debug button:

ADODB cannot find the OraOLEDB.Oracle provider.  I am sure that I installed that component when installing the 32 bit Oracle Client on the computer.  The Visual Basic code that was added to the Late Binding command button is very generic.  In fact, as written the code can be placed in a plain text file with a .VBS extension and executed as a VBScript file with either the wscript or cscript command.  Copy the code from the Late Binding command button into a plain text file (start the Notepad program and paste the code), then save the file as “LateBinding.vbs” (including the quotes).

Now to test the script that was just created.  Open a Windows command prompt (Windows key and R, type cmd, press the Enter key).  Change to the folder where the LateBinding.vbs file was saved, then type:

wscript LateBinding.vbs

Note that this time the computer indicated an ORA-12154: TNS:could not resolve the connect identifier error, rather than a Provider could not be found error – we were hoping to obtain the same ORA-12154 error as we did when working in Visual Basic 6.  What changed?  Well, this is a 64 bit computer, so the 64 bit wscript program was used, which used the 64 bit Oracle client (actually the 64 bit Oracle Database home files).

Let’s try again, this time with the 32 bit version of the wscript program.  To execute the script with the 32 bit wscript, execute the following:
c:\windows\SysWOW64\wscript LateBinding.vbs

Notice that the error now appears as Provider could not be found, which is the same error message that was observed in Visual Basic 6.  We reproduced the problem!  Now how do we fix it?  I suppose that we should check with the Oracle Client installer to verify that the Oracle Provider for OLE DB 11.2.0.x.0 component was in fact installed.  Let’s re-run the Oracle installer to check the Inventory of the installed components.  Just open the Start menu, click the Oracle – OraClient11g_home1 folder, then expand Oracle Installation Products, then click Universal Installer:

Uh, sure.  Windows 8 and its tablet-like Start menu… good luck finding the Universal Installer for the Oracle Client. (Edit November 26, 2012: I was trying to be humorous here.  If you have a touch screen, you can swipe your finger up from the bottom of the screen to reveal the option to display all items that normally appear on the pre-Windows 8 Start menu, separated by folder/Oracle home.  If you do not have a touch screen, hold down the Windows key and press the W key while in this new Windows 8 start screen to display the same list of items separated by folder/Oracle home.)  I always change the base install folder to C:\Oracle when installing Oracle components, so to start up the Universal Installer for the Oracle Client, I can just run:
C:\oracle\product\11.2.0\client_1\oui\bin\setup.exe

Oracle Provider for OLE DB 11.2.0.3.0 is in the list of installed components (click the Installed Products button in the Oracle Installer to see the above list), so why doesn’t the Oracle Provider for OLE DB from the 32 bit client work as expected?  That component requires that the OraOLEDB11.DLL file is found in the expected location, in my case, here:
C:\Oracle\product\11.2.0\client_1\bin\OraOLEDB11.DLL

Yes, the file is in that location.  Is the Oracle Client 11.2.0.3 just not compatible with Windows 8 Pro?  The Oracle Client 11.2.0.3 seemed to work as expected on a computer that was upgraded from Windows 7 Pro to Windows 8 Pro, including the OLE DB functionality, so why will that functionality NOT work on a new computer with Windows 8 Pro installed?

While comparing the Windows registry entries on a Windows 7 Pro 64 bit computer with the Windows registry entries on the Windows 8 Pro computer where the OLE DB functionality did not work, I noticed a problem – missing entries in the Windows 8 Pro’s Windows registry.  From the Windows 7 Pro 64 bit computer, I exported the three missing Windows registry sections:

3F63C36E-51A3-11D2-BB7D-00C04FA30080.reg:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}]
@="OraOLEDB.Oracle"
"OLEDB_SERVICES"=dword:ffffffff

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors]
@="Extended Error Lookup Service"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\OLE DB Provider]
@="Oracle Provider for OLE DB"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ProgID]
@="OraOLEDB.Oracle.1"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\TypeLib]
@="{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.Oracle"

3FC8E6E4-53FF-11D2-BB7D-00C04FA30080.reg:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\ProdID]
@="OraOLEDB.ErrorLookup.1"

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.ErrorLookup.1"

0BB9AFD1-51A1-11D2-BB7D-00C04FA30080.reg (just the win32 section was missing):

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\TypeLib\{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}\1.0\win32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"

After verifying that the Oracle Client’s home on the Windows 8 Pro computer was in fact located in C:\Oracle\product\11.2.0\client_1 (it may not be on your computer, change the above registry files as necessary for your computer – use two \\ characters for each \ character that normally appears in the path to the Oracle Client’s home), I imported the registry entries into the Windows 8 Pro computer’s registry by double-clicking each of the files that were exported from the Windows 7 Pro computer.

The result after importing the registry entries and re-running the script with the 32 bit wscript:

Well, that is not good, another error message.  But wait, that was the same error message displayed when the 64 bit wscript was used to execute the script file.  Maybe we are making progress.  Let’s fix the LateBinding.vbs script file so that the strDatabase variable is set to a valid database name found in the 32 bit client’s tnsnames.ora file (I will use a database named sample, which has the Oracle sample schema loaded), the strUserName variable is set to a valid database username in the database (I will use the sh user), and the strPassword variable is set to the correct password for the username.

Let’s try executing the script again:

Oh, another error message.  ORA-28000: the account is locked.  I guess that some Oracle rules still apply.  Start up a SQL*Plus session in another Windows command prompt, connect to the database as the SYS AS SYSDBA user, and unlock the SH account (edit November 26, 2012: note that this account should probably be locked again once we finish experimenting):

ALTER USER SH ACCOUNT UNLOCK;

Let’s try executing the script again:

It Worked!  The 32 bit version of wscript on Windows 8 Pro 64 bit was able to connect to the 11.2.0.3 64 bit database running on the same computer and select the current date and time from the database.  But, does the Late Binding example code in Visual Basic 6 work?  Let’s check:

Nice!

[Headache Induced Rant] You may notice a nearly 3 hour time difference between the time output by wscript and the time output by Visual Basic 6.  That is not an error – I spent nearly three hours trying to put this blog article together in WordPress (total time putting together this article was close to 5 hours) on the Windows 8 computer AFTER successfully connecting to the database using the 32 bit version of wscript.  The Sony laptop designer who thought it would be a good idea to place the laptop’s left and right mouse buttons underneath the corners of the touch pad and then thought that it would be a good idea to enable zoom control when one finger is barely touching the location of the left mouse button while the other finger is attempting to scroll up and down the WordPress editor’s page should be taken out behind the building and slapped a couple of times.  I quite literally had to undo an unintended page zoom very close to 50 times while putting together this blog article.  The darn laptop has a touch screen… if I really want to zoom the screen, I will either manually select the zoom feature in the application or put two fingers on the screen and then separate those fingers.  (Edit November 26, 2012: Sony does a good job of burying the option to disable this touch pad feature – it is found in the same application that disables the touch pad’s tap to left-click “feature”.)

And which Microsoft developer thought that it would be a great idea to have a large clock pop up on screen in the lower left (taking away the typing entry focus from the application that was being used) on occasion when someone is trying to type on the computer?  That developer should also be taken out behind the building and slapped. [/Headache Induced Rant]

The above example code used early late (edit: July 24, 2014) binding, which defines variables as variants and then later assigns object types to the variables.  That technique, which now does work and is compatible with wscript, tends to result in poor performance.  Poor performance should be considered an application bug, so let’s build an example that uses early binding (note that the References mentioned earlier in this article must have been selected).  Add the following code to the Early Binding button (note that the code is attempting to connect to the sample database as sh user):

Dim intResult As Integer
Dim strSQL As String
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String

Dim snpData As ADODB.Recordset
Dim dbDatabase As ADODB.Connection

On Error Resume Next

Set dbDatabase = New ADODB.Connection

strDatabase = "sample" '"MyDB" 'From tnsnames.ora
strUserName = "sh" '"MyUserID"
strPassword = "sh" '"MyPassword"

'Connect to the database, specify that the default array fetch size should be 100 rows
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"
dbDatabase.ConnectionTimeout = 40
dbDatabase.CursorLocation = adUseClient
dbDatabase.Open

If (dbDatabase.State <> 1) Or (Err <> 0) Then
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
Else
    Set snpData = New ADODB.Recordset

    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  COUNT(*) NUM_ROWS" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  CUSTOMERS"
    snpData.Open strSQL, dbDatabase

    If snpData.State = 1 Then
        If Not (snpData.EOF) Then
            MsgBox "Number of rows in the Customers table: " & Format(snpData("num_rows")), vbInformation
        Else
            MsgBox "No rows returned from the query.", vbCritical
        End If
        snpData.Close
    Else
        MsgBox "The query could not be executed.", vbCritical
    End If
End If

'When finished, clean up
If dbDatabase.State = 1 Then
    dbDatabase.Close
End If

Set snpData = Nothing
Set dbDatabase = Nothing

Let’s see the result of executing the above code:

Wow, 55,500 rows in that sample table, that is a lot of rows.

Finally, let’s add the code to the Query Database button.  This code will use early binding, bind variables, and present the result rows in the grid control.

Dim i As Integer
Dim intResult As Integer
Dim strSQL As String
Dim strDatabase As String
Dim strUserName As String
Dim strPassword As String
Dim strOut As String

Dim snpData As ADODB.Recordset
Dim comData As ADODB.Command

On Error Resume Next

Set dbDatabase = New ADODB.Connection

strDatabase = "sample" '"MyDB" 'From tnsnames.ora
strUserName = "sh" '"MyUserID"
strPassword = "sh" '"MyPassword"

'Connect to the database, specify that the default array fetch size should be 100 rows
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUserName & ";Password=" & strPassword & ";ChunkSize=1000;FetchSize=100;"
dbDatabase.ConnectionTimeout = 40
dbDatabase.CursorLocation = adUseClient
dbDatabase.Open

If (dbDatabase.State <> 1) Or (Err <> 0) Then
    intResult = MsgBox("Could not connect to the database.  Check your user name and password." & vbCrLf & Error(Err), 16, "Oracle Connection Demo")
Else
    Set comData = New ADODB.Command
    With comData
        strSQL = "SELECT" & vbCrLf
        strSQL = strSQL & "  CUST_ID," & vbCrLf
        strSQL = strSQL & "  CUST_FIRST_NAME," & vbCrLf
        strSQL = strSQL & "  CUST_LAST_NAME," & vbCrLf
        strSQL = strSQL & "  CUST_GENDER," & vbCrLf
        strSQL = strSQL & "  CUST_YEAR_OF_BIRTH," & vbCrLf
        strSQL = strSQL & "  CUST_MARITAL_STATUS," & vbCrLf
        strSQL = strSQL & "  CUST_STREET_ADDRESS," & vbCrLf
        strSQL = strSQL & "  CUST_POSTAL_CODE," & vbCrLf
        strSQL = strSQL & "  CUST_CITY," & vbCrLf
        strSQL = strSQL & "  CUST_STATE_PROVINCE," & vbCrLf
        strSQL = strSQL & "  COUNTRY_ID" & vbCrLf
        strSQL = strSQL & "FROM" & vbCrLf
        strSQL = strSQL & "  CUSTOMERS" & vbCrLf
        strSQL = strSQL & "WHERE" & vbCrLf
        strSQL = strSQL & "  CUST_FIRST_NAME= ?"

        .CommandText = strSQL
        .CommandType = adCmdText
        .CommandTimeout = 30
        .Parameters.Append .CreateParameter("first_name", adVarChar, adParamInput, 20, "")
        .ActiveConnection = dbDatabase
    End With

    comData("first_name") = "Charles"
    Set snpData = comData.Execute

    If Not (snpData Is Nothing) Then
        If snpData.State = 1 Then
            If Not (snpData.EOF) Then
                grdOutput.Cols = snpData.Fields.Count
                grdOutput.FixedRows = 0
                grdOutput.FixedCols = 0
                grdOutput.Rows = 1

                strOut = ""
                For i = 0 To snpData.Fields.Count - 1
                    strOut = strOut & snpData(i).Name & vbTab
                Next i
                grdOutput.AddItem strOut 'Add the new heading row
                grdOutput.RemoveItem 0 'Remove whatever is on the first row

                Do While Not (snpData.EOF)
                    strOut = ""
                    strOut = strOut & Format(snpData("cust_id")) & vbTab
                    If Not (IsNull(snpData("cust_first_name"))) Then
                        strOut = strOut & snpData("cust_first_name") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_last_name"))) Then
                        strOut = strOut & snpData("cust_last_name") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_gender"))) Then
                        strOut = strOut & snpData("cust_gender") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_year_of_birth"))) Then
                        strOut = strOut & Format(snpData("cust_year_of_birth")) & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_marital_status"))) Then
                        strOut = strOut & snpData("cust_marital_status") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_street_address"))) Then
                        strOut = strOut & snpData("cust_street_address") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_postal_code"))) Then
                        strOut = strOut & snpData("cust_postal_code") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_city"))) Then
                        strOut = strOut & snpData("cust_city") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("cust_state_province"))) Then
                        strOut = strOut & snpData("cust_state_province") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    If Not (IsNull(snpData("country_id"))) Then
                        strOut = strOut & snpData("country_id") & vbTab
                    Else
                        strOut = strOut & vbTab
                    End If
                    grdOutput.AddItem strOut 'Add the new data row to the grid control

                    snpData.MoveNext
                Loop
            Else
                MsgBox "No rows were returned by the query.", vbInformation
            End If
        Else
            MsgBox "The query could not be executed.", vbCritical
        End If
    Else
        MsgBox "The query could not be executed.", vbCritical
    End If
    dbDatabase.Close
End If

If grdOutput.Rows > 1 Then
    grdOutput.FixedRows = 1
End If

Set snpData = Nothing
Set dbDatabase = Nothing

What does the output look like when the above code is executed?

Nice!  Now the problem.  It is a very bad habit to repeatedly connect to and disconnect from the database, and such a habit may result in a variety of problems, including performance problems.  Ideally, the code to connect to the database should be relocated to a different part of the program, possibly the Form_Load subroutine.

Added December 7, 2012:

I was able to reproduce the above mentioned problem on a Windows 7 Pro computer when the 11.2.0.3 client was installed (without first installing the 11.2.0.1 or 11.2.0.2 client).  Even after importing the above mentioned registry entries, the program (and the VBScript) reported that the “Provider cannot be found”.  As mentioned, after verifying that the above indicated Oracle components were installed, I imported the following registry script (save as a plain text file, rename with a .reg extension, then double-click the file), which combines the three above mentioned registry scripts into a single file:

Windows Registry Editor Version 5.00
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}]
@="OraOLEDB.Oracle"
"OLEDB_SERVICES"=dword:ffffffff
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors]
@="Extended Error Lookup Service"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ExtendedErrors\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\OLE DB Provider]
@="Oracle Provider for OLE DB"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\ProgID]
@="OraOLEDB.Oracle.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\TypeLib]
@="{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3F63C36E-51A3-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.Oracle"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}]
@="Oracle OLE DB Error Lookup"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\InprocServer32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"
"ThreadingModel"="Both"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\ProdID]
@="OraOLEDB.ErrorLookup.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\CLSID\{3FC8E6E4-53FF-11D2-BB7D-00C04FA30080}\VersionIndependentProgID]
@="OraOLEDB.ErrorLookup.1"
 
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Classes\TypeLib\{0BB9AFD1-51A1-11D2-BB7D-00C04FA30080}\1.0\win32]
@="C:\\Oracle\\product\\11.2.0\\client_1\\bin\\OraOLEDB11.DLL"

The missing element that I neglected to mention in the original version of this article is that the Oracle 11.2.0.3 Client installer apparently forgets to register the OraOLEDB11.DLL file that is in the Oracle home.  The simple work-around for this problem is to manually register the DLL file, with a command similar to the following executed at a Windows command line (replace C:\Oracle\product\11.2.0\client_1 with the correct location of the Oracle client):

regsvr32 C:\Oracle\product\11.2.0\client_1\bin\OraOLEDB11.DLL

Please note that I have not fully verified that importing the three missing registry sections shown above fixes all problems with the 32 bit Oracle Client 11.2.0.3 on Windows 8 Pro 64 bit.





oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0

22 10 2012

October 22, 2012

Lately, I have been busy with a couple of items that are not completely Oracle Database specific.  I tried experimenting with high definition 1080p security cameras.  After wasting an unacceptable amount of money to purchase two Y-Cam Wireless High Definition 1080p video cameras (model YCBLHD6), and spending in excess of 38 hours fighting with the cameras to make them work as advertised, I wrote a video review of the product.  After writing a review, Y-Cam’s tech support representative contacted me again, so I pulled the cameras back out of the box and wasted another five hours trying to make the cameras work wirelessly – a feature that was lost after installing the third firmware version on the cameras.  The cameras magically connected wirelessly very early the next morning – but wireless communiation was only one of several problems that I was experiencing with this model of security camera.  What might cause a procedure to work magically when that same procedure failed several times in the past – and might this somehow have an odd relation to Oracle Database?  This is the explanation that I provided to Y-Cam’s tech support representative and posted as a follow up on Y-Cam’s support forum:

The camera’s Ethernet port and wireless adapter both have the same hardcoded MAC address. This MAC address is used by network switches to determine which devices are attached to each of the switch ports so that network traffic addressed to a specific MAC address is only sent down the switch port where that MAC address is known to be connected. By contrast a network hub broadcasts network traffic to all ports, rather than just the port to which a hardware device’s MAC address is attached; a hub is less likely to have an issue with two different network adapters having the same MAC address (Y-CAM support reported that they confirmed that the camera with the latest firmware worked with a non-broadcasted WPA2 protected wireless network). The network switch maintains a MAC address table for each switch port – when the camera’s Ethernet port is disconnected and the camera is power cycled, the switch’s MAC address tables may not be immediately updated to remove the camera’s MAC address from the wired Ethernet port so that the MAC address may be added to the switch port used for wireless connectivity. Power cycling the network switch clears the switch’s MAC address tables, allowing the camera’s MAC address to be added to the switch port used for wireless connectivity.

In short, it is necessary to power cycle the switch (or the Cisco Linksys E2000 router acting as only a wireless access point with integrated 4 port gigabit switch in my case) when moving the camera from a wired Ethernet connection to a wireless connection.

Makes a bit of sense… now if only motion detection worked as advertised and stability improved to the point that the camera does not require two or three reboots per day.  Thankfully, most other consumer products tend to work out of the box; imagine the ramifications if every time the weather conditions included rain, a vehicle would sputter, shake violently, and reduce power to 10% of normal when the driver attempted to accelerate.

Changing gears a bit.  You might have noticed that the title of this article appears to be some sort of error message, probably produced by an Oracle product.  I am in the process of writing a program that interfaces with an Oracle Database, rapidly retrieving a variety of information from various database tables – using bind variables in the SQL statements, of course:

The problem?  My program seems to crash randomly on computers running either the 32 bit or 64 bit version of Windows 7 (the program works without issue on computers running the 32 bit version of Windows XP).  The randomness of the crashes makes it a bit difficult to troubleshoot this particular problem, but the vast majority of the crashes are associated with populating the treeview control.  Is the crash caused by a typo (bug) in my program logic?  Is the crash caused by the Windows 7 version of the treeview control?  Is the crash caused by Oracle Database 11.2.0.2, as a result of receiving several different SQL statements in rapid-fire fashion?  Could there be another explanation?

To start the investigation, I first enabled a 10046 trace at level 12.  I then entered an employee ID and started selecting different date ranges until the program crashed.  The crash message varied (as did the point at which the crash happened), and while Windows offered to contact the developer to report the error, I declined the offer.  Unfortunately, I did not capture any of the exact error messages, although I did open one of the debug files created during a crash – “Unhandled exception at 0x0fc01b3d”:

mov cx, word ptr [edi] – Thanks Windows, I think that I will go back to the safety of the 10046 trace files (my best guess is that this instruction is attempting to copy a value from a variable memory pointer).

-

Digging into the 10046 trace file, I found that this was the last SQL statement executed by my program before the crash:

CLOSE #1301509800:c=0,e=10,dep=0,type=1,tim=4032143047926
=====================
PARSING IN CURSOR #1301508512 len=1452 dep=0 uid=172 oct=3 lid=172 tim=4032143048002 hv=929599856 ad='3eb8207a0' sqlid='4y6m7dsvqj4bh'
SELECT 
  EMPLOYEE_ID, 
  SHIFT_DATE, 
  CLOCK_IN, 
  CLOCK_OUT, 
  HOURS_WORKED, 
  HOURS_VACATION, 
  HOURS_HOLIDAY, 
  HOURS_BEREAVEMENT, 
  HOURS_JURY_DUTY, 
  DECODE(SHIFT_DATE_CODE,'VAC',1,0) VACATION, 
  DECODE(SHIFT_DATE_CODE,'HOL',1,0) HOLIDAY, 
  DECODE(SHIFT_DATE_CODE,'BER',1,0) BEREAVEMENT, 
  DECODE(SHIFT_DATE_CODE,'JUR',1,0) JURY, 
  DECODE(SHIFT_DATE_CODE,'ABS',1,0) ABSENT, 
  DECODE(SHIFT_DATE_CODE,'EXC',1,0) EXCUSED, 
  SUM(DECODE(SHIFT_DATE_CODE,'VAC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) VACATION_COUNT, 
  SUM(HOURS_VACATION) OVER (PARTITION BY EMPLOYEE_ID) VACATION_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'HOL',1,0)) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_COUNT, 
  SUM(HOURS_HOLIDAY) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'BER',1,0)) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_COUNT, 
  SUM(HOURS_BEREAVEMENT) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'JUR',1,0)) OVER (PARTITION BY EMPLOYEE_ID) JURY_COUNT, 
  SUM(HOURS_JURY_DUTY) OVER (PARTITION BY EMPLOYEE_ID) JURY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'ABS',1,0)) OVER (PARTITION BY EMPLOYEE_ID) ABSENT_COUNT, 
  SUM(DECODE(SHIFT_DATE_CODE,'EXC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) EXCUSED_COUNT 
FROM 
  PAY_HOURS
WHERE 
  EMPLOYEE_ID= :1 
  AND SHIFT_DATE BETWEEN :2 AND :3 
  AND COALESCE(SHIFT_DATE_CODE,'ZZZ')<>'NOL' 
ORDER BY 
  COALESCE(SHIFT_DATE_CODE,'ZZZ'), 
  SHIFT_DATE
END OF STMT
PARSE #1301508512:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048001
BINDS #1301508512:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=4d936d40  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=4d936d60  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=4d936d68  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1301508512:c=0,e=129,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048323
WAIT #1301508512: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143048369
FETCH #1301508512:c=0,e=33,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048448
STAT #1301508512 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=43 us cost=6 size=50 card=1)'
STAT #1301508512 id=2 cnt=0 pid=1 pos=1 obj=0 op='WINDOW BUFFER (cr=3 pr=0 pw=0 time=34 us cost=6 size=50 card=1)'
STAT #1301508512 id=3 cnt=0 pid=2 pos=1 obj=0 op='FILTER  (cr=3 pr=0 pw=0 time=21 us)'
STAT #1301508512 id=4 cnt=0 pid=3 pos=1 obj=70154 op='TABLE ACCESS BY INDEX ROWID PAY_HOURS (cr=3 pr=0 pw=0 time=20 us cost=4 size=50 card=1)'
STAT #1301508512 id=5 cnt=0 pid=4 pos=1 obj=71786 op='INDEX RANGE SCAN SYS_C0022057 (cr=3 pr=0 pw=0 time=20 us cost=3 size=0 card=1)'
WAIT #1301508512: nam='SQL*Net message from client' ela= 1448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050020
PARSE #1301509800:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=266456858,tim=4032143050081
WAIT #1301509800: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050142

*** 2012-10-19 09:43:33.522
WAIT #1301509800: nam='SQL*Net message from client' ela= 5761077 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032148811261  

Great, I will take a closer look at the part of my program that executes that SQL statement.  But wait, someone famous once said: “Insanity is repeating the same mistakes and expecting different results.”  Let’s run my program again with a 10046 trace at level 12 enabled.  This time, I found that the following was the last SQL statement executed by my program before the crash:

PARSING IN CURSOR #1078422680 len=1257 dep=0 uid=172 oct=3 lid=172 tim=4031964081030 hv=4064672182 ad='3a68828c8' sqlid='4cj96t7t4bydq'
SELECT 
  LT.TRANSACTION_ID, 
  WO.PART_ID, 
  LT.WORKORDER_BASE_ID, 
  LT.WORKORDER_LOT_ID, 
  LT.WORKORDER_SUB_ID, 
  LT.OPERATION_SEQ_NO, 
  LT.TYPE, 
  LT.INDIRECT_ID, 
  LT.RESOURCE_ID, 
  LT.CLOCK_IN, 
  NVL2(LT.HOURS_WORKED,LT.CLOCK_OUT,NULL) CLOCK_OUT, 
  NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)) HOURS_WORKED, 
  LT.GOOD_QTY, 
  COUNT(*) OVER (PARTITION BY LT.EMPLOYEE_ID) LABOR_TICKETS, 
  SUM(NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2))) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_HOURS, 
  SUM(DECODE(LT.TYPE,'I',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_INDIRECT_HOURS, 
  SUM(DECODE(LT.TYPE,'S',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_SETUP_HOURS, 
  SUM(DECODE(LT.TYPE,'R',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_RUN_HOURS 
FROM 
  LABOR_TICKET LT, 
  WORK_ORDER WO 
WHERE 
  LT.EMPLOYEE_ID= :1 
  AND LT.SHIFT_DATE BETWEEN :2 AND :3 
  AND WO.TYPE(+)='W' 
  AND WO.SUB_ID(+)='0' 
  AND LT.WORKORDER_TYPE=WO.TYPE(+) 
  AND LT.WORKORDER_BASE_ID=WO.BASE_ID(+) 
  AND LT.WORKORDER_LOT_ID=WO.LOT_ID(+) 
  AND LT.WORKORDER_SPLIT_ID=WO.SPLIT_ID(+)
END OF STMT
PARSE #1078422680:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3199062715,tim=4031964081030
BINDS #1078422680:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=40476838  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=40476858  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=40476860  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1078422680:c=0,e=188,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4162394183,tim=4031964081369
WAIT #1078422680: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031964081423
FETCH #1078422680:c=0,e=79,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=4162394183,tim=4031964081539
STAT #1078422680 id=1 cnt=1 pid=0 pos=1 obj=0 op='WINDOW BUFFER (cr=4 pr=0 pw=0 time=81 us cost=9 size=101 card=1)'
STAT #1078422680 id=2 cnt=1 pid=1 pos=1 obj=0 op='FILTER  (cr=4 pr=0 pw=0 time=35 us)'
STAT #1078422680 id=3 cnt=1 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 time=33 us cost=8 size=101 card=1)'
STAT #1078422680 id=4 cnt=1 pid=3 pos=1 obj=69863 op='TABLE ACCESS BY INDEX ROWID LABOR_TICKET (cr=4 pr=0 pw=0 time=28 us cost=7 size=73 card=1)'
STAT #1078422680 id=5 cnt=1 pid=4 pos=1 obj=70966 op='INDEX RANGE SCAN IND_LT_SHIFT_DATE (cr=3 pr=0 pw=0 time=21 us cost=6 size=0 card=1)'
STAT #1078422680 id=6 cnt=0 pid=3 pos=2 obj=70072 op='TABLE ACCESS BY INDEX ROWID WORK_ORDER (cr=0 pr=0 pw=0 time=3 us cost=1 size=28 card=1)'
STAT #1078422680 id=7 cnt=0 pid=6 pos=1 obj=71722 op='INDEX UNIQUE SCAN SYS_C0021989 (cr=0 pr=0 pw=0 time=2 us cost=0 size=0 card=1)'

*** 2012-10-19 09:40:38.424
WAIT #1078422680: nam='SQL*Net message from client' ela= 9641490 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031973723245 

A different SQL statement was found in the trace file when my program crashed… do the SQL statements have anything in common?  Both SQL statements are using analytic functions, and both are using date-type bind variables.  Interesting… what was that saying about insanity?  Let’s try another test of my program with an enabled 10046 trace to see the last SQL statement found in the trace before the crash:

PARSING IN CURSOR #881825936 len=210 dep=1 uid=0 oct=3 lid=0 tim=4030801312382 hv=864012087 ad='3eddb9a08' sqlid='96g93hntrzjtr'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #881825936:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312381
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=2
EXEC #881825936:c=0,e=95,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312551
WAIT #881825936: nam='db file sequential read' ela= 6526 file#=1 block#=98252 blocks=1 obj#=427 tim=4030801319112
WAIT #881825936: nam='db file sequential read' ela= 759 file#=1 block#=100271 blocks=1 obj#=425 tim=4030801319923
FETCH #881825936:c=0,e=7396,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801319963
STAT #881825936 id=1 cnt=1 pid=0 pos=1 obj=425 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=2 pw=0 time=7394 us)'
STAT #881825936 id=2 cnt=1 pid=1 pos=1 obj=427 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=1 pw=0 time=6583 us)'
CLOSE #881825936:c=0,e=46,dep=1,type=3,tim=4030801320046
...
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=11
EXEC #881825936:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321733
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801321767
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801321802
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=33
EXEC #881825936:c=0,e=152,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321992
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322027
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322062
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=35
EXEC #881825936:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801322195
FETCH #881825936:c=0,e=18,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322229
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322265
WAIT #881751720: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030801323194

*** 2012-10-19 09:21:14.806
WAIT #881751720: nam='SQL*Net message from client' ela= 8851219 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030810174461 

You might have noticed that the trace file showed dep=1 in the PARSING IN CURSOR line, which indicates that the SQL statement was executed automatically outside of my program’s control – likely when attempting to optimize another SQL statement that was submitted by my program.

Let’s test that insanity theory again.  Another execution of my program with an enabled 10046 trace at level 12 showed this as the last SQL statement executed:

PARSING IN CURSOR #767713824 len=503 dep=0 uid=172 oct=3 lid=172 tim=4026166358219 hv=2534735629 ad='3b58041b0' sqlid='7x2t5dqbj9zsd'
SELECT 
  N.ID NCMT_ID, 
  N.NCMT_DATE, 
  N.PART_ID, 
  N.WORKORDER_BASE_ID, 
  N.WORKORDER_LOT_ID, 
  N.WORKORDER_PIECE_NO, 
  N.QTY, 
  N.DISCREPANCY_TYPE, 
  N.RESOURCE_ID, 
  N.CUSTOMER_ID, 
  NVL(N.TOTAL_COST,0) NCMT_COST, 
  COUNT(N.ID) OVER (PARTITION BY N.EMPLOYEE_ID) NCMT_COUNT, 
  SUM(NVL(N.TOTAL_COST,0)) OVER (PARTITION BY N.EMPLOYEE_ID) TOTAL_COST 
FROM 
  NON_CONFORMING N 
WHERE 
  N.EMPLOYEE_ID= :1 
  AND N.NCMT_DATE BETWEEN :2 AND :3 
  AND UPPER(N.ERROR_BY)='OPERATOR' 
ORDER BY 
  ID
END OF STMT
PARSE #767713824:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026166358219
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc25d58  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc25d78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc25d80  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
...
PARSE #767713824:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045440
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc21f50  bln=32  avl=07  flg=05
  value="DUDE2"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc21f70  bln=07  avl=07  flg=01
  value="6/22/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc21f78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #767713824:c=0,e=125,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045593
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237045634
FETCH #767713824:c=0,e=57,p=0,cr=10,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045727
WAIT #767713824: nam='SQL*Net message from client' ela= 3080 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237048857
CLOSE #767713824:c=0,e=7,dep=0,type=3,tim=4026237048899
PARSE #767713824:c=0,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237048951
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237051177

*** 2012-10-19 08:06:40.151
WAIT #767713824: nam='SQL*Net message from client' ela= 98732762 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026335783988 

Well, that SQL statement also used date-type bind variables and analytic functions… maybe there is a problem with Oracle Database 11.2.0.2?  But, why is this suddenly a problem… what has changed?  Nearly two years ago I wrote a blog article that described an ODBC update problem that was present in the Oracle 11.2.0.1 Client – the first Oracle Client version that could be easily downloaded and installed on Windows 7 (it is also possible to patch the downloadable 10.2.0.3/4 client to 10.2.0.5).  The Oracle 11.2.0.1 patch 7 (10155837 for 32 bit Windows clients) fixes the false ODBC lock violation errors that are reported in Microsoft Access (and various other programs) when UPDATE SQL statements are executed, and those SQL statements contain bind variables.  That same Oracle Client version and patch version are still installed on the computer (and all other computers here that run Windows 7), so that is not the source of the change.  On a side note, have you found it difficult to locate updated 11.2.0.2 or 11.2.0.3 client versions in MOS (Metalink)?

So, the 10046 trace failed to find a consistent source for the crashes, and the Oracle client has not been changed on the Windows 7 computer in nearly two years.  Program bug?  Analytic query bug?  Bind variable related bug?  I went digging for the sqlnet.log file to see if that file provided any clues.  This is what I found:

Fri Oct 19 13:26:52 2012
Directory does not exist for read/write [C:\Oracle\product\11.2.0\client_1\log] [C:\Oracle\product\11.2.0\client_1\log\diag\clients]
Errors in file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc  (incident=401):
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
Incident details in: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc

That file’s contents proved to be a bit helpful, I think.  Let’s take a look at the contents of the file mentioned above:

Dump file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc
Dump continued from file: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
========= Dump for incident 401 (oci 24550 [3221225477]) ========
Tracing is in restricted mode!
----- Short Call Stack Trace -----
_dbgexPhaseII()+850<-_dbgexProcessError()+2061<-_dbgeExecuteForError()+43<-_dbgePostErrorDirect()+2368<-_kpeDbgSignalHandler()+225<-_skgesig_Win_UnhandledExceptionFilter()+140<-75D6003D<-7295FA2E<-0040AD0D<-77309EF0<-77309EC0<-00000000<-773079DD<-77306E72<-772FE0ED<-275C83DD<-275D219F<-275D1D88<-275D0E0E<-2759C677<-004CB035<-004DC4A3<-72991D31<-00410FD8<-7299202F----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
End of Incident Dump

Might the above indicate that my program logic is not the cause of the problem, but instead that it is the Oracle client that is crashing?  I started digging through various websites looking for a similar oci-24500 error message, but I found nothing that was helpful.  I searched through the bug database and technical article database on MOS (Metalink), and found nothing that was helpful.  By chance, I widened the search on MOS a bit and stumbled upon Mark Powell’s message in this Oracle Support Community thread (you need a MOS account to access the message).  That message pointed to two other MOS articles, at least one of which stated that the problem is (possibly – not exactly the same error message) corrected when the Oracle 11.2.0.3 Client is installed.  I struggled finding the 11.2.0.3 Client in MOS, and then found that it is located on disk 3 of 6 of the 11.2.0.3 server download (direct link – just specify operating system version and download disk 3 – p10404530_112030_platform_3of6.zip (requires a MOS account to download)).  The 11.2.0.3 client version magically corrected the seemingly random crashes in my programprogrammer error averted.

Insanity?  No more so than an expensive security camera that fails at motion detection, or a vehicle that decelerates when the driver commands the vehicle to accelerate during a light rain.  ;-)

P.S.: Sorry about the formatting of the blog article.  A WordPress update apparently disabled my custom CSS that expanded the article to the full width of the browser window – I am still in search of a fix for that problem.





Undo Quiz – Bringing the Right Tools for the Job

17 09 2012

September 17, 2012

Sometimes humor is a good recipe for undo.  A mid-50s man was driving down the road, riding a purple mule, when he saw a house with a new window.  Uninvited, the man decided to take a closer look using an obvious tool of choice, a tire iron (YPDNTI).  Elapsed time, roughly 60 seconds.  The man is obviously familiar with Oracle Database’s concept of transactions and issuing a ROLLBACK, but did not account for video recording in several autonomous transactions.  Even through a COMMIT was not issued, that does not imply that nothing happened in those 60 seconds. 

Riding a purple mule… strange.

On to the quiz.  Assume that there are three and only three users connected to an Oracle database (preferrably a test database of sorts).  The first session is connected as the SYS user, and the other two sessions are connected as a normal database user (the same user ID, but with access permissions for DBA_SOURCE).  Assume that session 1 creates a new undo tablespace that cannot grow beyond 1MB in size:

CREATE UNDO TABLESPACE SMALL_UNDO2 DATAFILE 'C:\Oracle\OraData\SAMPLE\small_undo2.dbf' SIZE 1M AUTOEXTEND OFF; 

And then that tablespace is set as the system default tablespace:

ALTER SYSTEM SET UNDO_TABLESPACE=SMALL_UNDO2;

In session 2, table T10 is created based on DBA_SOURCE:

CREATE TABLE T10 AS
SELECT
  *
FROM
  DBA_SOURCE;

In my case, I used the Oracle sample schema in Oracle Database 11.2.0.3 (tablespace for T10 is ASSM AUTOALLOCATE).  Let’s check the amount of space consumed (not necessarily all used for data strorage yet) for the T10 table:

SELECT
  ROUND(SUM(BYTES)/1024/1024,2) SIZE_MB
FROM
  USER_EXTENTS
WHERE
  SEGMENT_NAME='T10';

   SIZE_MB
----------
        88 

88MB – a nice even number.  With a 1MB maximum size for the undo tablespace, let’s see if the user is able to select all of the rows from the T10 table:

SET ARRAYSIZE 1000
SET AUTOTRACE TRACEONLY STATISTICS

SELECT
  *
FROM
  T10;

633054 rows selected.

Statistics
---------------------------------------------------
         11  recursive calls
          0  db block gets
      11352  consistent gets
      10635  physical reads
          0  redo size
   66576339  bytes sent via SQL*Net to client
       7322  bytes received via SQL*Net from client
        635  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     633054  rows processed 

That worked, but why?  Why didn’t the session receive an error when performing the 11,352 consistent gets to provide a consistent view of the data?

Maybe on the second execution we will see an error?

SELECT
  *
FROM
  T10;

633054 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11271  consistent gets
          0  physical reads
          0  redo size
   66576339  bytes sent via SQL*Net to client
       7322  bytes received via SQL*Net from client
        635  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     633054  rows processed

—–

The first quiz question is simple: How it is possible for Oracle Database to provide a read consistent view of a table that requires approximately 88MB of storage space when the file supporting the undo tablespace cannot exceed 1MB?  Maybe someone will be kind enough to share the answer with the OP in this message thread:
dbaforums.org/oracle/index.php?s=439a689327b6fbd09ba0018b320041d8&showtopic=21972

The second quiz question is also simple: Why did the first select of the table require 81 more consistent gets than the second select from that table?

—–

Continuing with the test case, this time working in session 3.  Let’s delete some rows from the table created in session 2:

DELETE FROM
  T10
WHERE
  ROWNUM<=300000; 

Oracle Database responses with the following message:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=300000;
  T10
  *
ERROR at line 2:
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO2'

The third question is slightly more challenging: Why is Oracle attempting to extend a segment when the session is deleting rows; also, what does the number 8 signify?

Let’s try again with a smaller number of rows:

DELETE FROM
  T10
WHERE
  ROWNUM<=30;

30 rows deleted. 

That worked.  Interestingly, the first time I tried the test case, I named the tablespace SMALL_UNDO and specified “RETENTION GUARANTEE” when creating the undo tablespace, and the following message appeared when executing the above command:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=30;
  T10
  *
ERROR at line 2:
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO'

After the above message appeared, session 3 successfully deleted 3 rows, 30 rows, and then 300 rows from table T10 without receiving an error:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=3;

3 rows deleted.

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=30;

30 rows deleted.

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=300;

300 rows deleted. 

The fourth question is again slightly more challenging: why was the 30 row delete from table T10 successful when the SMALL_UNDO2 undo tablespace was the default undo tablespace, yet that delete failed when the SMALL_UNDO tablespace was set as the default tablespace – until 3 rows were first deleted from the table?

Going back to the forum thread, another interesting question arises.  The fifth question is again challenging: What type of compression is used in undo logs?

I was curious about the last question.  In session 1 I executed the following:

SELECT
  R.NAME,
  T.XIDUSN,
  T.XIDSLOT,
  T.XIDSQN
FROM
  V$TRANSACTION T,
  V$ROLLNAME R
WHERE
  T.XIDUSN=R.USN; 

NAME                               XIDUSN    XIDSLOT     XIDSQN
------------------------------ ---------- ---------- ----------
_SYSSMU15_897375467$                   15          1          8

ALTER SESSION SET TRACEFILE_IDENTIFIER='UNDO_BLOCKS';
ALTER SYSTEM DUMP UNDO BLOCK '_SYSSMU15_897375467$' XID 15 1 8;

A portion of the generated trace file follows:

********************************************************************************
Undo Segment:  _SYSSMU15_897375467$ (15)
xid: 0x000f.001.00000008
Low Blk   :   (0, 0) 
High Blk  :   (2, 7) 
Object Id :   ALL 
Layer     :   ALL 
Opcode    :   ALL 
Level     :   2 

********************************************************************************
UNDO BLK:  Extent: 1   Block: 1   dba (file#, block#): 6,0x00000071
xid: 0x000f.001.00000008  seq: 0x4   cnt: 0x2c  irb: 0xa   icl: 0x0   flg: 0x0000

 Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x1f24     0x02 0x1e64     0x03 0x1dbc     0x04 0x1d34     0x05 0x1c68     
0x06 0x1b9c     0x07 0x1ae4     0x08 0x1a5c     0x09 0x1998     0x0a 0x18d8     
0x0b 0x1830     0x0c 0x17a8     0x0d 0x16f0     0x0e 0x162c     0x0f 0x15a4     
0x10 0x14ec     0x11 0x1424     0x12 0x1378     0x13 0x12f0     0x14 0x1234     
0x15 0x1168     0x16 0x10b8     0x17 0x1030     0x18 0x0f74     0x19 0x0ecc     
0x1a 0x0e44     0x1b 0x0d88     0x1c 0x0cbc     0x1d 0x0be8     0x1e 0x0b44     
0x1f 0x0a88     0x20 0x09bc     0x21 0x08e8     0x22 0x0840     0x23 0x0784     
0x24 0x06b8     0x25 0x05e4     0x26 0x0540     0x27 0x0484     0x28 0x03b8     
0x29 0x02e4     0x2a 0x023c     0x2b 0x0180     0x2c 0x00b4     

*-----------------------------
* Rec #0xa  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x09   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.09
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 16(0x10) size/delt: 89
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1f
col  4: [60]
 20 20 20 20 70 72 61 67 6d 61 20 42 55 49 4c 54 49 4e 28 27 58 4f 52 27 2c
 38 2c 20 33 2c 20 39 29 3b 20 2d 2d 20 50 45 4d 53 5f 49 4e 54 45 47 45 52
 2c 20 49 4e 54 5f 58 4f 52 0a

*-----------------------------
* Rec #0x9  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x08   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.08
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 15(0xf) size/delt: 90
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1e
col  4: [61]
 20 20 66 75 6e 63 74 69 6f 6e 20 58 4f 52 20 28 4c 45 46 54 20 42 4f 4f 4c
 45 41 4e 2c 20 52 49 47 48 54 20 42 4f 4f 4c 45 41 4e 29 20 72 65 74 75 72
 6e 20 42 4f 4f 4c 45 41 4e 3b 0a

*-----------------------------
* Rec #0x8  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x07   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.07
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 14(0xe) size/delt: 30
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1d
col  4: [ 1]  0a

*-----------------------------
* Rec #0x7  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x06   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.06
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 13(0xd) size/delt: 78
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1c
col  4: [49]
 20 20 2d 2d 20 20 77 68 65 6e 20 74 68 65 79 20 6f 63 63 75 72 20 69 6e 20
 63 6f 6e 64 69 74 69 6f 6e 61 6c 20 73 74 61 74 65 6d 65 6e 74 73 2e 0a 

It would not surprise me too much if a guide existed to assist in decompressing the undo logs (whatever that term implies).





On the Topic of Programming 3

3 09 2012

September 3, 2012

(Back to the Previous Post in the Series)

In the previous article in this series, a question was posed regarding how many SQL statements would be required to generate the following report, which displays in Microsoft Excel:

A significant number of items regarding the report require explanation:

  • Employee IDs and employee names appear in column A (the left-most column).  There could be many employees, but only those employees reporting labor transactions for machining operations in production batches (work order lots) that completed during one of the weeks should be listed.
  • Part IDs, part descriptions, and operations (Sub n OP nn in the above example report) are listed in column B (the second column from the left).  The number of operations listed for an employee could range from one to possibly 50 (or more, depending on the date range of the report), but that list will only include machining type operations (not welding, blasting, painting, laser cutting, etc.).
  • Total hours are reported in column C.  The second output row for an employee shows the sum of the numbers directly below for the employee.  The numbers on the 3rd, 6th, 9th, etc. rows for the employee in column C are the sum of the hours the employee worked in the operation for all manufacturing batches that completed in the report time period.
  • The first row for each employee, in columns E though H in the above report (could very well extend beyond column H, depending on the number of weeks selected), are the Monday through Sunday dates for the week.
  • The second row for each employee, in columns E through H in the above report show the weighted average efficiency for the employee in that week.  The weighting is based on the percentage of hours the employee spent in the operations, for those operations in batches that completed during the week.  This number of hours is calculated, but not printed in the report.
  • The numbers on the 4th, 7th, 10th, etc. rows for each employee, in columns E through H, shows the average number of hours the operation required, for those batches that completed during the week.  If an employee did not work in the operation one week, but did another, then a blank should appear in the week that the employee did not work in the operation.  All employees that worked in the operation for the week will have the same average printed.  For example, Cary Grant and James Stewart both worked on WIDGET101 operation Sub 0 OP 30.  The average hours for the first week is 10.50, and that value appears for both employees.  During the second week, only James Stewart worked on that operation, and the average hours for that week decreased to 7.56.
  • The numbers of on the 3rd, 6th, 9th, etc. rows for each employee, shows the ratio of the engineering standard hours divided by the average hours for the operation in that week.  The same values will appear for all employees reporting time to the operation in the week.
  • The percentage in column I is the simple average of the weighted averages to the left.  There may be causes where a blank will appear for one of the weighted averages for an employee, so it is not possible to simply sum the weighted averages and divide by the number of weeks.

This appeared to be a simple report when it was just a sketch on a sheet of paper!

How many SQL statement executions will be necessary to generate the above report?  Before answering that question, let’s take another look at how the data is organized and related between the tables:

OK, so the above diagram might be a little confusing.

The engineering standard hours are found by querying the OPERATION table with the WORKORDER_TYPE=’W’ and the WORKORDER_BASE_ID equal to the part ID that is of interest (the WORK_ORDER table also contains engineering standard information, accessed by searching for TYPE=’M’ and the BASE_ID equal to the part ID tht is of interest).  In theory, there could be multiple engineering standards for the same part ID and operation, which is why we might need to join to the PART table to make certain that the current engineering standard is retrieved, but we will ignore that potential issue for now.  It would be a good idea to also check the WORK_ORDER table to obtain a distinct list of part IDs that had a close date in the time range that is of interest (there is no sense in forcing the database RDBMS to retrieve the engineering standards for parts that have not been produced in the last 17 years), so we will need to specify that the TYPE column in the WORK_ORDER table is equal to W and that the CLOSE_DATE is in the date range of the report.  Many operations are NOT machining operations, and we can only determine the type of operation by determining the type machine that is selected for use, so we must also query the SHOP_RESOURCE table to determine if the operation is processed at a machining center.

The actual production hours are found by querying the LABOR_TICKET table with the the WORKORDER_TYPE=’W’ (to eliminate spurious indirect labor) and joined to the SHOP_RESOURCE table to make certain that the operation is a machining type operation (as explained above).  To determine the employees’ name, the LABOR_TICKET table is joined to the EMPLOYEE table.  To determine the part ID that is produced in the labor ticket transaction, the LABOR_TICKET table must be joined to the WORK_ORDER table, with the TYPE column in the WORK_ORDER table set to ‘W’ and the SUB_ID column in the WORK_ORDER table set to ‘0’ so that the main part ID for the work order is returned rather than a subordinate part ID.  To retrieve the description of the part ID that is produced, the WORK_ORDER table must be joined to the PART table.

That sketch of a report on a sheet of paper sure looked simple, but the degree of difficulty is all relative (or relational).

How many query executions will be required to produce the report that will appear in Microsoft Excel?  Should we retrieve the entire database over the network and process the data client-side?  I don’t program in the Java programming language, but I am guessing that the majority of Java programmers would not attempt to retrieve the entire database.   Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?

In the words of the TV show “Name that Tune“, I can name that tune in 1 note.  Because there are many to many joins between the various table, it is not possible to construct a single simple SQL statement that retrieves the correct result.  However, it is possible to construct a single complex query using inline views or subquery factoring (WITH blocks), along with analytic functions to retrieve the correct, necessary information to the client for presentation in Microsoft Excel.  The correct answer to the above question must:

  • Minimize the amount of data flowing across the network.  SQL*Net compression may help, but of course that can only possibly help if more than one row is retrieved at a time.  Ideally, the array fetch size should be set to a large value (possibly 100 up to the maximum of 5,000) to make the best use of SQL*Net compression.
  • Minimize as best as possible the performance impact on the database server, the database server is a shared resource without unlimited capacity.  Setting a reasonably large array fetch size may help reduce the number of memory accesses (specifically consistent gets) on the database server.  Make certain that available indexes are usable by the query – be careful about using functions (such as TRUNC) on date type columns in the WHERE clause.
  • When a query accesses many tables, and also when there are multiple inline views (or factored subqueries), the Oracle optimizer may not always find the optimal join order.  The _OPTIMIZER_MAX_PERMUTATIONS hidden parameter defaults to a value of 2,000, which of course restricts the number of possible join permutations attempted per query block in a query, so with many tables in a query block (more than 6, for instance), the limit for the maximum number of permutations may easily be hit.  Statistics must be reasonably accurate for cardinality estimates to be close to accurate, otherwise the query optimizer may select not only an inappropriate join order but also an inappropriate join type.  Histograms may help (or harm) the query optimizer, indexes may help (or harm), establishing default cardinality and cost for custom PL/SQL functions may help, and adding hints to help control the cardinality estimates or join orders may also help.
  • Lastly, and most importantly, the queries must provide correct information.

I struggled with this problem a bit.  The report is intended to measure efficiency, so it would be somewhat laughable if the report performed inefficiently.  How could I minimize execution time for this particular report?  If Oracle Database is licensed by CPU core (or CPU socket in the case of the Standard Edition), does it make sense to perform all of the computations on the database server, or does it make sense to off-load some of that processing to the client’s dual, quad, or eight core CPU?  If the report were rendered by a shared resource, such as a web server, application server, or Citrix server, would I want that same CPU offloading as I would have planned with a dedicated client computer?  The report shows the information in a (modified) pivot table format, should I use the PIVOT function that was introduced with Oracle Database 11.1; or maybe I should use client-side control break logic that was taught in one of my university-level structured programming classes years ago?

That sketch of a report on a sheet of paper sure looked simple, now I need a control break (must be simple one key solution for that problem – check your keyboard :-) ).

It is possible to group the manufacturing lots by the Monday of the work week of the completion date (CLOSE_DATE) with the following function call:

NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7)

Looks fairly simple.  If I had a date range of four weeks in mind, I might be inclined to do something like this in the WHERE clause of the query:

CLOSE_DATE BETWEEN (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-28) AND (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-1)

However, I did not do that.  Why not?  One reason is that the Oracle query optimizer would likely have a hard time determining the approximate number of rows that would be returned from the table with the CLOSE_DATE column.  The NEXT_DAY(TO_DATE()) combination hides the actual date range of interest, so the optimizer falls back to using default cardinality percentages for that predicate.  Secondly, what happens when the user of the program is interested in something other than four weeks – the solution may be cumbersome to maintain.  Thirdly, the above did not make use of bind variables, so every time the date range is changed, the query optimizer will perform a hard parse of the query.  A better approach calculates the date range on the client-side and submits the WHERE clause like this:

CLOSE_DATE BETWEEN TO_DATE('30-JUL-2012','DD-MON-YYYY') AND TO_DATE('26-AUG-2012','DD-MON-YYYY')

Or, better yet using bind variables:

CLOSE_DATE BETWEEN :START_DATE AND :END_DATE

So, how many query executions will this report require?  Just one query, executed a single time.  The particular programming environment that I used requires ? to appear in bind variable positions, so when you see a ? just read it as either :START_DATE or :END_DATE.

SELECT
  L.EMPLOYEE_ID,
  E.LAST_NAME||', '||E.FIRST_NAME EMPLOYEE_NAME,
  L.PART_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO,
  L.DESCRIPTION,
  L.CLOSE_WEEK,
  L.PART_EFF_WEEK,
  L.AVG_HRS_PC,
  L.EMP_ACCUM_EFF_WEEK,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID) EMP_HOURS_WORKED,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.PART_ID, L.WORKORDER_SUB_ID,L.OPERATION_SEQ_NO) EMP_PART_HOURS_WORKED,
  SUM(L.EMP_ACCUM_EFF_WEEK) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_EFF_WEEK
FROM
  (SELECT
    O.PART_ID,
    O.DESCRIPTION,
    O.WORKORDER_SUB_ID,
    O.OPERATION_SEQ_NO,
    O.CLOSE_WEEK,
    O.RECEIVED_QTY,
    O.ACT_RUN_HRS,
    L.EMPLOYEE_ID,
    L.HOURS_WORKED,
    ROUND(O.ACT_RUN_HRS/O.RECEIVED_QTY,2) AVG_HRS_PC,
    ENG.ENG_HRS,
    ROUND(DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) PART_EFF_WEEK,
    SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_HRS_WEEK,
    ROUND(L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK)),4) EMP_PERCENT_WEEK,
    ROUND((L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK))) * DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) EMP_ACCUM_EFF_WEEK
  FROM
    (SELECT
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(RECEIVED_QTY) RECEIVED_QTY,
      SUM(ACT_RUN_HRS) ACT_RUN_HRS
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.RECEIVED_QTY,
        WO.CLOSE_DATE,
        SUM(O.ACT_SETUP_HRS+O.ACT_RUN_HRS) ACT_RUN_HRS,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO OPERATION_SEQ_NO,
        P.DESCRIPTION
      FROM
        WORK_ORDER WO,
        PART P,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND WO.PART_ID=P.ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.RECEIVED_QTY,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO,
        P.DESCRIPTION)
    GROUP BY
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) O,
    (SELECT /*+ LEADING(WO LT) */
      WO.PART_ID,
      LT.EMPLOYEE_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(LT.HOURS_WORKED) HOURS_WORKED
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID
      FROM
        WORK_ORDER WO,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID) WO,
      LABOR_TICKET LT,
      SHOP_RESOURCE SR
    WHERE
      WO.TYPE=LT.WORKORDER_TYPE
      AND WO.BASE_ID=LT.WORKORDER_BASE_ID
      AND WO.LOT_ID=LT.WORKORDER_LOT_ID
      AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
      AND LT.TYPE IN ('R','S')
      AND LT.HOURS_WORKED<>0
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      AND LT.RESOURCE_ID=SR.ID
    GROUP BY
      WO.PART_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.EMPLOYEE_ID,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) L,
    (SELECT /*+ LEADING(WO) */
      WO2.TYPE,
      WO2.BASE_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO OPERATION_SEQ_NO,
      SUM(O.SETUP_HRS+O.RUN_HRS) ENG_HRS,
      WO.PART_ID
    FROM
      (SELECT
        WO.PART_ID
      FROM
        WORK_ORDER WO
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
      GROUP BY
        WO.PART_ID) WO,
      WORK_ORDER WO2,
      OPERATION O,
      SHOP_RESOURCE SR
    WHERE
      WO2.TYPE='M'
      AND WO.PART_ID=WO2.BASE_ID
      AND WO2.LOT_ID='0'
      AND WO2.SPLIT_ID='0'
      AND WO2.SUB_ID='0'
      AND WO2.TYPE=O.WORKORDER_TYPE
      AND WO2.BASE_ID=O.WORKORDER_BASE_ID
      AND WO2.LOT_ID=O.WORKORDER_LOT_ID
      AND WO2.SPLIT_ID=O.WORKORDER_SPLIT_ID
      AND O.RESOURCE_ID=SR.ID
      AND SR.BUILDING_ID='Machine'
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
    GROUP BY
      WO2.TYPE,
      WO2.BASE_ID,
      WO.PART_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO) ENG
  WHERE
    O.PART_ID=L.PART_ID
    AND O.WORKORDER_SUB_ID=L.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=L.OPERATION_SEQ_NO
    AND O.CLOSE_WEEK=L.CLOSE_WEEK
    AND O.PART_ID=ENG.PART_ID
    AND O.WORKORDER_SUB_ID=ENG.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=ENG.OPERATION_SEQ_NO) L,
  EMPLOYEE E
WHERE
  L.EMPLOYEE_ID=E.ID
ORDER BY
  L.EMPLOYEE_ID,
  L.PART_ID,
  TO_NUMBER(L.WORKORDER_SUB_ID),
  L.OPERATION_SEQ_NO,
  L.CLOSE_WEEK;

I suppose that I could have performed more of the processing on the database server by sliding the above query into an inline view and used the PIVOT function, and/or additional analytic functions to calculate the employee’s weighted efficiency average per week and the simple averages of the wieghted averages – doing so would have eliminated some potentially complex client-side logic that needed to be programmed a single time, but counter point is that every time the report executed, it would require a more of the database server’s resources than were absolutely required.

That sketch of a report on a sheet of paper sure looked simple, but it turned into a three part blog article series.  Interesting, now the person would like to analyze the data by part ID, listing the employees working on the operations for the part.  Do I smell another three part blog article series?  Sorry, no – I just changed the ORDER BY clause so that the PART_ID column was listed first, and made a small change to the client-side control break logic.  Now the person wants to analyze 104 weeks worth of data rather than just four weeks, and only a single part ID or employee.  Too late, those changes were anticipated, and included in the original specification in the client-side programming, implemented with a simple change to the query and pre-planning for a variable number of weeks in the report.





On the Topic of Programming 2

2 09 2012

September 2, 2012

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Over the course of the last six months I developed several efficiency reports for the machining facility where I work, measuring the average time required to complete a customer’s part compared to the engineering standard “target” run time for producing the part.  Sounds simple, right?  If the engineering target is 30 widgets per hour, and only 20 widgets per hour are produced, then the average efficiency is 20/30 = 66.67%.  It does not take a person with a degree in mathematics to arrive at the conclusion that the widgets are not being produced as quickly as expected.  What is the scope of the measurement time frame for the average: a year, a month, a week, a day, one work shift in a day, a single machining center for a day, a single machining center and employee, or something else?

OK, now flip the engineering standard run times a bit so that the times are a little more consistent with the machining facility where I work.  It might take four hours, six hours, 10 hours, 20 hours, or possibly even 200 hours to complete a single operation at a machining center (one of several operations in the manufacturing process for the widget) to produce a single widget.  With the long run times of the operations, calculating the efficiency of a machining center or an employee for a specified period of time was a daunting task… a task that I was asked to solve roughly 12 years ago (with the help of Oracle Database 8.0.5).

The hours per widget run time (rather than widgets per hour) situation presented various problems for calculating employee efficiency, especially when an operation at a machining center did not complete before the end of an employee’s shift.  Consider a situation where an operation is expected to require eight hours to complete, but only when the tooling used at the machining center is new.  As such, the first shift employee installs new tooling in the machining center every morning before starting a new widget.  The first shift employee spends about 30 minutes finishing up the widget that was started the previous night, changes the tooling, and then starts the machining on the next widget.  So, the first shift employee reports that one widget (started the previous night) completed after 0.5 hours and the second widget completed in 7.5 hours.  The first shift employee’s efficiency, considering that two widgets were completed during his shift, is easily calculated as (8 + 8) / (0.5 + 7.5) * 100 = 200%.  The second shift employee’s efficiency is a consistent 0% because the operation for the widget never completes during his shift because the sharpness of the tooling deteriorates through the day (thus causing the machining operation to take longer).  This obviously leads to odd questions: 1) Why is the second shift employee more efficient when the first shift employee is on vacation (equivalent to the question in Oracle performance tuning: why does my report run faster when it rains Monday mornings?)?  Why is the second shift employee more efficient when working a nine or 10 hour shift, rather than an eight hour shift?  The fun questions that one is able to answer when familiar with the data…

When an employee starts working on a machining operation, a labor ticket transaction is created detailing the fact that the work for the machining operation is in-process.  When the machining operation completes (or when it is time to leave for the day), the employee closes the labor ticket transaction and reports the number of widgets that completed.  These labor ticket transactions are conveniently stored in a table named LABOR_TICKET.  Back in 2006 or 2007, I decided to take another attempt at solving the efficiency problem – after all, I now had access to analytic functions in Oracle Database (analytic functions did not exist in Oracle Database 8.0.5).  I expected that by extending the time range beyond a single employee and/or shift, I might be able to extract useful efficiency information from the database.  I put together a rather long SQL statement that looked something like this:

SELECT DISTINCT
  1 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-30) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-30) AND TRUNC(SYSDATE)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
UNION ALL
SELECT DISTINCT
  2 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-60) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-60) AND TRUNC(SYSDATE-31)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
...

The above SQL statement grew in length substantially based on the number of time periods that I selected to compare.  This data was returned to a program for display purposes, so reformatting the output was always a possibility.  Consider a situation where the programmer is unfamiliar with Oracle specific SQL and is asked to generate the same output in a program that he is constructing.  For a program seeking to measure efficiency, the programmer’s solution would likely involve repeated executions of SQL statements to retrieve the bits and pieces of information that need to be presented – this row by row processing will likely be very slow over a high latency WAN (VPN) connection (but certainly faster than never receiving the information, so the programmer will still be a hero), and could very well hinder the performance of the Oracle Database server.

The above solution is good, but problems such as the progressive wearing of the tooling cutter, as mentioned earlier, could still lead to unexpected differences in efficiency of different employees that are working as fast as is possible.  Fast forward a few years.  The right questions are asked from people with a more complete understanding of efficiency measurements – examine the efficiency calculation problem from an entirely different angle.  Not long ago I was handed an example of a simple efficiency report, and asked to reproduce that report with live data, output in a Microsoft Excel spreadsheet.  The example was little more than a simple sketch, so for this blog article I put together a color-coded example of the report format in Microsoft Excel:

A little explanation is required.  The entirely different angle for calculating employee efficiency in the execution of widget operations involves not looking at the date of the labor ticket transaction, or even the number of widgets produced by a particular employee in a particular time frame.  Instead, the date of the manufacturing batch’s (lot in this particular ERP system) completion date, the quantity produced in the batch, and the total hours to execute a single machining operation for the lot become the criteria when comparing against the engineering standards to determine efficiency.  The manufacturing batch’s completion date is used to divide the batches into specific timeframes (in this case weeks: Monday through Sunday).  All employees working on a particular operation, where the batch completion date is in a certain week, will receive the same efficiency rating for that particular operation (EFF_WEEK1, EFF_WEEK2, EFF_WEEK3, etc.) as all other employees working on the same operation with the same batch completion date range.  It is not uncommon for employees to generate labor transactions for multiple operations for production of the same part, as well as operations for different parts that have batch completion dates in the same week, so the employee’s efficiency rating (W_AVG_EFF1, W_AVG_EFF2, W_AVG_EFF3, etc.) weights the individual part efficiencies based on the percentage of machining time the employee spent in a given operation compared to all of the other operations the employee worked.  The employee’s efficiency rating (Avg W_VE_EFF) is the simple average of the employee’s weekly efficiency ratings.

In the above example report, the blue colored text is a static label.  The red colored text is a calculated static label that shows the Monday through Sunday date range for the week.  The black colored text is a simple “rip and read” from the database – no calculation is required.  The purple colored text indicates a calculated value.  The numbers at the right will not appear on the completed report, but are included to indicate which EFF_WEEKn and AVG_WEEKn values will be identical if the employee worked on an operation whose batch closed in the given week (notice that there are blanks in columns of the sample report, indicating that the employee did not work on that operation in the batch completion week).

The programming challenge is to determine the number of SQL statements that would be needed to retrieve the information from the database, and the number of times those SQL statements would need to be executed.  Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?  Before you think about the problem, we should probably investigate the data organization in the various tables.  The picture below shows the tables (and the specific columns) required for the report, showing the tables for the engineering standards in green boxes and the tables needed for the labor transaction analysis in black boxes.  The lines show the association of the data in the different tables.

(The above image was adjusted 7 hours after this blog article was initially posted.  The linking between the green OPERATION table and the green WORK_ORDER table was corrected, and the red colored text was added to indicate whether the table was to be used for retrieving the engineering master standards (WORKORDER_TYPE=’M’) or information from the production work order batches (WORKORDER_TYPE=’W’ or TYPE=’W’).)

It might also be interesting to think about what processing will be performed on the data returned by the SQL statements – that could be a pivotal design decision for the SQL statements.  Curve balls are to be expected – what is currently an examination of four weeks’ worth of efficiency numbers today might be an examination of 52 weeks tomorrow.  Oh, here is a thought, what if instead on the next day it is important to focus on the various part efficiencies and the employees who worked on the operations, rather than focusing on the employees and the various parts that the employees worked on?

Part 3 of this series will attempt to answer some of the above questions.

Late addition, sample of a completed report in Excel:





On the Topic of Programming 1

26 08 2012

August 26, 2012

(Forward to the Next Post in the Series)

As those of you who have read this blog’s About page probably know, my day to day job responsibilities involve a lot of activities that are not specific to Oracle Database performance tuning, or even remotely Oracle DBA type activities.  Those extra acttivites are part of what keeps the job fresh and interesting, whether I am swapping in a new roll of labels into a Zebra label printer (that was a three minute fun task this morning), troubleshooting phone system problems (from a fork lift truck “disabling” a phone, to programming the PBX with a clever “message delivery system” to waste the time of persistent telemarketers), swapping out the power supply in a computer very early in the morning, or engaged in a marathon of application programming into the evening and weekend hours.

One of the recent programming projects involves the continuing effort of decreasing the number of data islands, allowing the data previously contained in the islands to be accessed and cross-referenced with data generated by other applications.  One of the data island generators that I have been working to phase out is Superbase 3.0, a database platform that started life on a Commodore 64 in 1983.  Superbase 3.0 is a 16 bit client-side database, so it does not play well with the 64 bit Windows 7 that ships with most new desktop computers (16 bit applications will not run natively on 64 bit Windows, instead the 16 bit applications must be run inside a virtual machine such as Windows XP Mode, or run remotely using remote access software such as a VNC client).

The data contained in the old Superbase databases is critical to the company’s processes, so that data must be salvaged – meaning that the data must be transformed and imported into an Oracle database.  Unlike what a developer would likely create in a relational database, often with multiple tables used to store one “record” in the database, the long departed designer of the Superbase databases used a single row in a single database table to store one “record” in the database.  That organization reminds me a lot of the Microsoft Works package’s database from the early 1990s, with its fancy data entry forms which allowed users to escape the dull spreadsheet-like data entry screen.  Microsoft Excel from the early/mid 1990s could magically transform a dull spreadsheet data entry screen into a simple data entry form, in the process transforming the expensive Microsoft Excel into essentially a cheap database program.  It is a bit more of a challenge to locate the automatic data entry form creator in Excel 2010 than I recall it being in the early/mid 1990s version of Excel, but I suppose that helps further reduce the number of potential data islands:

So, what does the above discussion of Microsoft Excel have to do with anything related to Oracle Database?  The data contained in the Superbase databases must be transformed and inserted into an Oracle database.  It is good news that Superbase is able to export data to Microsoft Excel format.  The bad news is that the exported format is designed to work with Microsoft Excel 2.0 – a very old version of Microsoft Excel that seems to date back to 1987!  Time for a lot of manual data entry if that data must end up in an Oracle Database 11.2.0.x database… unless…

Microsoft Excel 2003 (version 12.0 if I remember correctly) is able to open Excel 2.0 files… success, even if the success is minor.  Now, how to go about tranferring the data from Excel into Oracle Database?  I suppose that I could have created a macro in Microsoft Excel to insert the data into Oracle Database, but at the time I was not interested in writing a macro that accomplished the task “the right way” using bind variables.  And just look at that data – some of the date values were imported as very small (roughly -65000) numbers, in some cases nearly 20 different spellings for the same customer name, and alpha-numeric text in columns that should be numeric.

So, how did I import the Superbase data that was now in Excel 2003 into the Oracle Database 11.2.0.x database without writing an Excel macro?  The particular computer with Excel 2003 that I was using also had a copy of Access 2003 installed.  Access 2003 is able to create a table “link” to an Excel 2003 spreadsheet’s worksheet, and handle that worksheet essentially the same as if it were a database table.  Now the data is “in” Microsoft Access 2003, but still not in an Oracle database.  Previous experience with this process pays off – before bringing the data into Microsoft Access, type each of the Oracle Database destination table’s column names into the first row of the Microsoft Excel spreadsheet, above the appropriate column’s data.  Importing the data into the Oracle database then becomes a simple four step process (assuming that no other data transformation is necessary)

  1. Link to the Excel spreadsheet’s worksheet and the destination table in the Oracle database.
  2. Create an Access view (stored Query) that selects all of the columns from the Excel worksheet that must be inserted into the Oracle database.
  3. Convert the view (stored Query) type to an Append type and select the linked Oracle Database table as the destination – Access will automatically find the correct destination column in the Oracle table, if the source column name (from the first row in the Excel worksheet) matches the destination column name.
  4. Execute the append type view.

A simple transformation of the data from 1994 database technology to 1987, 2003, and then on to 2011 in Oracle Database – and without writing a single line of code.  Remember that problem that I mentioned about alpha-numeric text in columns that should be numeric, such as “10&20″ in a column named OPERATION_NUMBER (or OPERATION_SEQ_NO) – it turns out that that bit of inconsistent data cannot just be thrown out (thanks Microsoft Access 2003).  To fix that problem, I would need to add another column to the Oracle Database table, and then have Microsoft Access update that table using the Microsoft Excel spreadsheet data (fixing the “10&20″, “10 & 20″, “10  &20″, “10 AND 20″, “10,20” and “10, 20″ variants into a standard format.  The SQL dialect in Microsoft Access is a bit odd at times, and I could not remember if the odd syntax applies to UPDATE statements also.  As an example of the odd syntax, the simple CREATE TABLEAS SELECT:

CREATE TABLE
  T2 AS
SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Must be written like the following in Microsoft Access:

SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
INTO
  T2
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Since I am not 100% confident in my SQL authoring skills in Microsoft Access, how do I move the data from the Excel spreadsheet into the new column of the Oracle Database table… and without writing a single line of programming code?  I simply created a temporary table (not a true temporary table, because the table data must be visible to more than one session) that contained the primary key column and a second column for the non-numeric numeric data.  Once the data was in the temporary Oracle table (using the simple four step process outlined above), I simply executed an UPDATE statement similar to this:

UPDATE
  T1
SET
  NON_NUMERIC_NUMERIC=(
    SELECT
      NON_NUMERIC_NUMERIC
    FROM
      T1_TEMP TT
    WHERE
      T1.PRIMARY_KEY=TT.PRIMARY_KEY)
WHERE
  T1.PRIMARY_KEY IN (
    SELECT
      PRIMARY_KEY
    FROM
      T1_TEMP);

With the data successfully transferred into an Oracle database table, the programming continues.  That brings me to the next article in this series, the internal conflicts of the “best” way to accomplish the programming task.





Brain Teaser: 10046 Extended SQL Trace Shows a FETCH Call with c=306350000, e=299174653, p=0, cr=22298 – How is that Possible?

26 04 2012

April 26, 2012

Last week I posted a popular article that questioned the effectiveness of showing a job candidate a raw 10046 trace file that was captured at level 12, asking the job candidate to evaluate whether or not the SQL statement in the 10046 trace is efficient.  Many of the commenters suggested that the approach is potentially very effective if the interviewer is knowledgeable about the subject matter, and he recognizes that the question deserves more than a simple yes or no answer.

A year ago I posted a related article, a brain teaser in fact, that showed 0.015600 seconds of CPU consumption in 0.000510 seconds of elapsed time without using parallel execution.  The comments attached to last year’s article correctly identified the source of the seemingly impossible statistics.

A recent OTN thread (don’t click the link until you have thought about the problem for a while) shows an interesting section of a 10046 trace file, a portion of which appears below:

PARSE #2:c=0,e=4764,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1182070262719
BINDS #2:
EXEC #2:c=0,e=255,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1182070263378
WAIT #2: nam='SQL*Net message to client' ela= 5 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182070263514
FETCH #2:c=306350000,e=299174653,p=0,cr=22298,cu=0,mis=0,r=1,dep=0,og=1,tim=1182369438322
WAIT #2: nam='SQL*Net message from client' ela= 1454 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369440799
WAIT #2: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369441012
FETCH #2:c=0,e=293,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,tim=1182369441265
WAIT #2: nam='SQL*Net message from client' ela= 7437 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369448858
WAIT #2: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369449007
FETCH #2:c=0,e=283,p=0,cr=0,cu=0,mis=0,r=15,dep=0,og=1,tim=1182369449257
WAIT #2: nam='SQL*Net message from client' ela= 6701 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369456082
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=258 tim=1182369456222 

The interesting portion of the above trace file is the first FETCH line where a single row is returned to the client.  The c=306350000 entry indicates that 306.35 seconds of server CPU time was consumed during the fetch call.  The e=299174653 entry indicates that the elapsed time (as if captured by a stop watch) of the single row fetch was 299.174653 seconds.  The p=0 entry indicates that 0 blocks were physically read from disk when preparing to retrieve the one row.  The cr=22298 entry indicates that 22,298 consistent gets were performed when preparing to retrieve the first row.  Typically, performing a large number of consistent gets will result in the consumption of a considerable number of CPU seconds, yet 22,298 consistent gets could not explain the consumption of more than 5 minutes of CPU time, even if the OP was running Oracle Database on an Intel 8088 CPU.

The brain teaser posed (directly or indirectly) by the OP:

  1. Why did the first fetch of a single row require roughly 299 seconds of elapsed time, when the second fetch of 15 rows apparently required 0.000293 seconds?
  2. Why did the first fetch of a single row require roughly 306 seconds of CPU time, when the second fetch of 15 rows apparently required no CPU time?
  3. How is it possible that the CPU consumption exceeds the elapsed time by 7.175347 seconds?
  4. How is it possible that 306.35 seconds of CPU were burned while performing only 22,298 consistent gets?
  5. The trace file indicates that 306.35 seconds of CPU were burned during the first fetch call, does the number 306.35 represent 100% of the CPU consumption during the fetch call, or is the actual CPU consumption higher or lower than that number?
  6. What release version of Oracle Database is the OP running?
  7. What operating system is the OP using for Oracle Database?
  8. What other questions might you ask of a job candidate if the 10046 trace file from the OTN thread were handed to a job candidate during an interview?

Think about the questions for a couple of minutes before visiting the OTN thread.  Several of the above questions are answered in that thread.





Extents of an Oracle Database DBA’s Knowledge Base

18 04 2012

April 18, 2012

I saw an interesting thread on the OTN forums this morning that forced me to stop and think about several items.  The thread contains a question posed during a recent job interview – one of the best questions that I have seen (dare I say, better than one I might have crafted myself).  Consider this: you are given a SQL statement, and a raw 10046 trace captured at level 12.  The question is very simple: Is the SQL statement efficient or not?

What is the point of the interview question?  Barely scratching the surface:

  • It tests how you will approach unfamiliar problems if you encounter the problem once hired – it may not be this exact question, but this may be similar to a real problem that was faced by the company’s DBAs. Do you panic and say that it cannot be solved, do you say that the job belongs to someone else, do you say that there is no point in looking at the raw trace file, or do you dive into the problem and think about what could happen if… (yes, there is something here, but I do not want to spoil the fun for other candidates that may face this exact problem).
  • It determines in part what types of material you have read to further enhance your knowledge of Oracle Database.
  • It determines whether or not you recognize the potential problems that may be associated with specific Oracle Database releases (unpatched and feeling ANSI?)
  • It determines whether or not you recognize differences in predicated cardinality and the actual number of rows returned, and how that may affect the performance outcome. Is it better to filter early or filter late, and does that apply to this situation?
  • Is there a DISTINCT possibility that the ROWNUM function may allow an arbitrary set of rows to be returned, possibly dependent on the value of the OPTIMIZER_FEATURES_ENABLE parameter (see *** item below)?
  • It determines whether or not you follow logic or magic when troubleshooting problems.

Reflecting on the question a bit, did the person in the interview have access to Google, TKPROF, the Oracle Database documentation, books with red covers, books with yellow covers, books with white covers, books with blue covers, etc.

*** A year ago in an OTN thread I made the following comment that applies to at least one of the above bullet point and also the answer to the question posed by the interviewer:

“Other than the “HASH GROUP BY” operation on the second line of the printed execution plan for 11.2.0.2, and the “SORT GROUP BY” operation on the second line of the printed execution plan for 10.1.0.3, the execution plans are identical. Keep in mind that 11.2.0.2, due to adaptive cursor sharing, is capable of changing the execution plan for future executions (there was a single parse call in this case, so adaptive cursor sharing likely did not take place). Also, keep in mind that 11.2.0.2 by default in a 10046 trace will output the row source operation execution plan after the first execution, rather than when the cursor was closed – this explains the difference in the ROWS column in the execution plan. If we look closely at the summary information, 11.2.0.2 performed 113,319 consistent gets in 673 executions, while 10.1.0.3 performed 175,168 consistent gets in 644 executions. Each execution in 11.2.0.2 is in theory more efficient than each execution in 10.1.0.3, yet the average time per execution is much longer.”

In your opinion, what do you think of the interview question?

(Please do not provide an exact answer to the interviewer’s question for at least 14 days, there is no sense in disrupting the interview process.)





Failed Logon Attempts

4 04 2012

April 4, 2012

A fair number of articles on this blog are Oracle performance related in one respect or another.  I started thinking about how to measure how much more efficient something is when compared to something that is never performed.  Consider a situation where you had a database user account that is a member of the DEFAULT Oracle Database profile that is configured as follows:

ALTER PROFILE DEFAULT LIMIT
  PASSWORD_LIFE_TIME UNLIMITED
  FAILED_LOGIN_ATTEMPTS 5
  PASSWORD_LOCK_TIME 1
  PASSWORD_VERIFY_FUNCTION NULL; 

With the above configuration, passwords do not expire after a specified number of days, there are no password complexity requirements, and after five failed logon attempts, the account is automatically locked for one day.  The above configuration is not ideal from a security standpoint, but that is not the point of this blog article.

Assume that the one database user account is shared by multiple people (or multiple utility programs with an embedded username and password).  Suddenly, you find that your efficient utility program becomes inefficient… to the point that the “utility” portion of the program never has an opportunity to execute.  Well, that was unexpected, the database user account is locked out.  How did that happen?

The following SQL statement is probably quite simplistic for many of the readers of this blog, and probably should have been included in my Neat Tricks article that I published a couple of years ago, but I thought that I would include the SQL statement here to save myself a couple of minutes of typing the next time I need to figure out who-done-it:

SELECT
  TO_CHAR(TIMESTAMP,'MM/DD HH24:MI') TIMESTAMP,
  SUBSTR(OS_USERNAME,1,20) OS_USERNAME,
  SUBSTR(USERNAME,1,20) USERNAME,
  SUBSTR(TERMINAL,1,20) TERMINAL,
  ACTION_NAME,
  RETURNCODE
FROM
  SYS.DBA_AUDIT_SESSION
WHERE
  USERNAME LIKE 'MYUSER%'
  AND TIMESTAMP BETWEEN SYSDATE-1 AND SYSDATE
ORDER BY
  TIMESTAMP DESC;

TIMESTAMP   OS_USERNAME          USERNAME TERMINAL        ACTION_NAME          RETURNCODE
----------- -------------------- -------- --------------- -------------------- ----------
04/03 11:33 USER1                MYUSER   CUSER1          LOGON                         0
04/03 11:33 USER1                MYUSER   CUSER1          LOGOFF                        0
04/03 11:33 USER1                MYUSER   CUSER1          LOGOFF                        0
04/03 11:33 USER1                MYUSER   CUSER1          LOGON                         0
04/03 10:54 USER2                MYUSER   CUSER2          LOGOFF                        0
04/03 10:53 USER2                MYUSER   CUSER2          LOGON                         0
04/03 10:53 USER2                MYUSER   CUSER2          LOGON                         0
04/03 10:53 USER2                MYUSER   CUSER2          LOGON                         0
04/03 09:58 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 09:56 USER4                MYUSER   CUSER4          LOGON                         0
04/03 09:56 USER4                MYUSER   CUSER4          LOGOFF                        0
04/03 09:51 USER5                MYUSER   CUSER5          LOGON                         0
04/03 09:51 USER5                MYUSER   CUSER5          LOGON                         0
04/03 09:51 USER3                MYUSER   CUSER3          LOGON                         0
04/03 09:51 USER5                MYUSER   CUSER5          LOGOFF                        0
04/03 09:51 USER5                MYUSER   CUSER5          LOGOFF                        0
04/03 09:44 USER4                MYUSER   CUSER4          LOGOFF                        0
04/03 09:29 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 09:10 USER4                MYUSER   CUSER4          LOGON                         0
04/03 09:09 USER3                MYUSER   CUSER3          LOGON                         0
04/03 09:06 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 09:06 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 08:47 USER3                MYUSER   CUSER3          LOGON                         0
04/03 07:50 USER4                MYUSER   CUSER4          LOGON                         0
04/03 07:50 USER4                MYUSER   CUSER4          LOGOFF                        0
04/03 07:50 USER4                MYUSER   CUSER4          LOGOFF BY CLEANUP             0
04/03 07:50 USER4                MYUSER   CUSER4          LOGOFF BY CLEANUP             0
04/03 07:50 USER4                MYUSER   CUSER4          LOGOFF BY CLEANUP             0
04/03 07:50 USER4                MYUSER   CUSER4          LOGON                         0
04/03 07:50 USER4                MYUSER   CUSER4          LOGON                         0
04/03 07:50 USER4                MYUSER   CUSER4          LOGON                         0
04/03 07:12 USER3                MYUSER   CUSER3          LOGON                         0
04/03 07:12 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 07:11 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 07:11 IUSR_WEBSERV         MYUSER   SERVER          LOGON                         0
04/03 06:35 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 06:35 IUSR_WEBSERV         MYUSER   SERVER          LOGOFF                        0
04/03 06:35 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 06:34 IUSR_WEBSERV         MYUSER   SERVER          LOGON                         0
04/03 06:34 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 06:34 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 06:29 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 06:29 USER3                MYUSER   CUSER3          LOGON                         0
04/03 06:28 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 06:28 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/03 06:27 IUSR_WEBSERV         MYUSER   SERVER          LOGOFF                        0
04/03 06:27 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 06:26 IUSR_WEBSERV         MYUSER   SERVER          LOGON                         0
04/03 06:26 USER3                MYUSER   CUSER3          LOGOFF                        0
04/03 06:26 USER3                MYUSER   CUSER3          LOGON                         0
04/03 06:26 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/03 06:23 USER3                MYUSER   CUSER3          LOGON                     28000
04/03 06:23 USER3                MYUSER   CUSER3          LOGON                     28000
04/03 06:22 NETWORK SERVICE      MYUSER   SERVER          LOGON                     28000
04/03 06:22 NETWORK SERVICE      MYUSER   SERVER          LOGON                     28000
04/03 02:30 USER5                MYUSER   SERVER2         LOGON                     28000
04/02 19:53 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 19:53 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 19:11 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 19:11 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 18:37 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 18:37 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 18:37 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 18:37 USER1                MYUSER   CUSER1-LT       LOGON                     28000
04/02 18:32 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 18:32 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 18:31 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 18:31 USER1                MYUSER   CUSER1-LT       LOGON                      1017
04/02 17:08 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 17:03 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 17:03 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:55 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:55 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:54 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:52 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:45 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/02 16:45 NETWORK SERVICE      MYUSER   SERVER          LOGOFF                        0
04/02 16:45 IUSR_WEBSERV         MYUSER   SERVER          LOGOFF                        0
04/02 16:44 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/02 16:44 NETWORK SERVICE      MYUSER   SERVER          LOGON                         0
04/02 16:44 IUSR_WEBSERV         MYUSER   SERVER          LOGON                         0
04/02 16:43 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:42 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:42 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:32 USER6                MYUSER   CUSER6          LOGON                         0
04/02 16:32 USER6                MYUSER   CUSER6          LOGON                         0
04/02 16:32 USER6                MYUSER   CUSER6          LOGOFF                        0
04/02 16:32 USER6                MYUSER   CUSER6          LOGOFF                        0
04/02 16:21 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:20 USER3                MYUSER   CUSER3          LOGON                         0
04/02 16:20 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:19 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 16:15 USER3                MYUSER   CUSER3          LOGON                         0
04/02 15:40 USER3                MYUSER   CUSER3          LOGON                         0
04/02 15:40 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 15:31 USER3                MYUSER   CUSER3          LOGOFF                        0
04/02 15:06 USER1                MYUSER   USER1           LOGON                         0
04/02 15:06 USER1                MYUSER   USER1           LOGOFF                        0
04/02 15:06 USER1                MYUSER   USER1           LOGON                         0 

A couple of the rows in the above output are slightly out of sequence, but the order of the output is close enough for my needs.  The RETURNCODE column is the number associated with the ORA- error code that was returned to the client computer:

  • RETURNCODE=0 indicates success
  • RETURNCODE=1017 indicates bad password
  • RETURNCODE=28000 indicates account is locked out

Reviewing the above output, operating system user USER3 logged on successfully at 5:03 PM and logged off at 5:08 PM.  Operating system user USER1 attempted to log on at 6:31 PM from a computer named CUSER1-LT, but failed due to an invalid password.  By 6:37 PM, that same operating system user and computer combination had entered an incorrect password five times, which resulted in an ORA-28000 account lockout error being returned starting with the sixth logon attempt.

At 2:30 AM, USER5 on a computer named SERVER2 attempted to connect using the same database user account, but was also greeted with an ORA-28000.  The same fate awaited operating system user NETWORK SERVICE (this was a web-based logon attempt) and USER3 at 6:22 AM and 6:23 AM, respectively.  The problem was obviously corrected by 6:26 AM, quite likely through the application of the following SQL statement:

ALTER USER MYUSER ACCOUNT UNLOCK;

If you tried executing the above query, you might have noticed that the query execution required a bit of time.  Enterprise Manager also, apparently, executes a somewhat similar SQL statement every 30 minutes.  You can learn more about the potential performance impact of Enterprise Manager’s use of SYS.DBA_AUDIT_SESSION here.

I again started thinking about how to measure how much more efficient something is when compared to something that is never performed… I suppose that it is important to first determine whether or not the task at hand is important before deciding if not doing something is infinitely more efficient or infinitely less efficient than would be the case if the task at hand were completed.





Monitoring Changes to Table Data

22 03 2012

March 22, 2012

Suppose that you receive a request stating that a particular table in one of your databases must be monitored for changes.  One such table might be the one that lists the ERP system’s suppliers and their addresses – you would not want someone to be able to set up a legitimate supplier, and later have the billing remit to name and address changed without being detected.

What approach would you take to solve the above problem?  A question similar to the above arrived recently in an ERP mailing list email – the original poster (OP) is using a SQL Server RDBMS, so that might change the proposed solution just a bit.

Here is the “How simple is too simple?” suggestion that I offered:

Periodically, create a VENDOR_SAVE table (drop it if it already exists, or delete all rows and re-insert from the original source table):

CREATE TABLE VENDOR_SAVE AS
SELECT
  *
FROM
  VENDOR;

Now, just wait a while.

The question then becomes, how do we detect:

  • A new row (record) added to the original source (VENDOR) table.
  • An old row (record) deleted from the original source (VENDOR) table.
  • A change to any column (program field) in the original source (VENDOR) table since the last time the VENDOR_SAVE table was created/refreshed.

Let’s start with the first two bullet points.  I will write the SQL statements so that the statements should work with Oracle Database 9.0.1 and above, and SQL Server with very few changes, even if I feel a bit ANSI about doing so.  I will use the COALESCE function, rather than the NVL function, and CASE syntax rather than the equivalent DECODE syntax.

To identify cases where a row has appeared in, or disappeared from the original source (VENDOR) table, we can simply perform a full outer join between the original source table and the historical mirror image of the original table (VENDOR_SAVE).  We are only interested in cases where the primary key column (ID) is found in exactly one of the two tables:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL;

So, the above SQL statement satisfies the first two bullet points.  The third bullet point is a little more challenging to accomplish… unless of course we employ UNION labor.  If we have two row sources with identical columns, and UNION the row sources together, the resulting row source will be absent of any entirely duplicated rows from the two original row sources (two rows will be reduced to a single row).  If there were no changes to any of the column values (or if the row was added to or deleted from the original source table), there will be a single row for the primary key column value.  If any columns were changed, there will be two rows containing the primary key column value.

Let’s build a SQL statement that UNIONs the rows from the two tables together, and counts the number of rows for each primary key value:

SELECT
  COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
  VL.ID,
  VL.NAME,
  VL.ADDR_1
FROM
  (SELECT
    *
  FROM
    VENDOR
  UNION
  SELECT
    *
  FROM
    VENDOR_SAVE) VL;

To complete the requirement for bullet point 3 above, we need to eliminate all rows from the result set where there is a single row for the primary key value:

SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1;

As a final step, we should join the two resultsets into a single resultset using UNION ALL:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL
UNION ALL
SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1
ORDER BY
  ID;

While somewhat limited in usefulness, the above approach will indicate which rows in the original source table should be examined because the column values in those rows changed (or were added or deleted).

Another, potentially more useful approach involves setting up a logging trigger and logging table.  I previously shared a VBS script that helps to automate and standardize the process of creating the logging trigger and logging table.  A very similar feature is built into my Hyper-Extended Oracle Performance Monitor program – but, much like the VBS script, this solution is useless for the OP who uses a SQL Server backend.

What are the other options?  Oracle Database’s built-in auditing capabilities.  Auditing built into the ERP system (this particular ERP system writes all audit/change records to a single table that uses a VARCHAR2 column to capture the before/after images of the values stored in columns).  Any other options?  (The mess that someone can create with a bit of idle time on their hands…)





Tables and Their Indexes Should be Located on Different Disks – What is Wrong with this Quote?

17 02 2012

February 17, 2012

As I mentioned in a previous blog article, I placed an order for the book “Oracle Database 11gR2 Performance Tuning Cookbook“, mostly because I was curious how the book would compare with the “Oracle Database 11g Performance Tuning Recipes” book (my review of that book was a total of about 35 typewritten pages in length), and some of the other Oracle performance related books that I have reviewed.

After placing my order for the book, I took a quick look at a couple of the book’s preview pages on Amazon, downloaded the sample chapter from the book publisher – chapter 10 “Tuning I/O”, and spent a bit of time reading the sample chapter (making comments in the margins of the printed pages).  I circled the following paragraph (sorry for the long quote – I want to make certain that the context of the quote is not lost), found on page 422 of the book (page 17 in the PDF download):

“We can also distribute tables and related indexes on different disks, to obtain performance gain in both read and write operations. If we have tables and indexes on the same disk, we need to read and write in two different places on the same disk. By dividing the work between two disks, we can perform an index range scan faster than when the index and the table are on the same disk.”

What, if anything, is wrong with the above quote from the book?  For some reason, the acronym MASE came to mind when I read the above paragraph. 

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.





SESSIONS Derived Value Formula – Doc Bug Explains

8 02 2012

February 8, 2012

Default parameter values are great, as long as it is possible to predict the default values, and the defaults are appropriate for the environment in which the defaults are present.  It is sometimes a challenge to remember all of the rules, and exceptions to those rules, that determine the defaults. I had a vague recollection of how the value of the SESSIONS parameter value is derived from the value of the PROCESSES parameter value.  An Oracle Database 11.2.0.2 question recently arrived in my email box from an ERP mailing list.  The original poster (OP) showed the following output in the email:

SQL> show parameter sessions

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
sessions                             integer     624

SQL> show parameter processes

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
processes                            integer     400 

At first glance, I thought that the OP made a mistake with the manual specification of the SESSIONS parameter value when taking into consideration the current value of the PROCESSES parameter value.

A Google search found a couple of helpful discussion threads that seemed to provide a formula that showed how the default value for the SESSIONS parameter is derived.  I decided to also perform a Google search of the Oracle documentation (trying to avoid circular references).   The same formula found in the helpful discussion threads is also included in the documentation, so that lends a bit of credibility to the information.  Let’s take a look at the Oracle Database 11.2 documentation for the SESSIONS parameter:

Default Value: Derived: (1.1 * PROCESSES) + 5

“… You should always set this parameter explicitly to a value equivalent to your estimate of the maximum number of concurrent users, plus the number of background processes, plus approximately 10% for recursive sessions.

Oracle uses the default value of this parameter as its minimum. Values between 1 and the default do not trigger errors, but Oracle ignores them and uses the default instead.”

A second page from the documentation library included additional information about the PROCESSES parameter:

“Oracle uses the default value of this parameter as its minimum. Values between 1 and the default do not trigger errors, but Oracle ignores them and uses the default instead…”

If you plan on running 50 user processes, a good estimate would be to set the PROCESSES initialization parameter to 70.”

So, let’s take another look at the PROCESSES value specified by the OP, keeping in mind the information from the documentation.  The OP has a database instance with a PROCESSES value of 400, and with the insight provided by the first documentation quote, that would suggest that the SESSIONS parameter would have been autoset to a value of (1.1 * 400) + 5 = 445.  That formula result coupled with the first paragraph of the second quote might lead someone to believe that the OP manually set the SESSIONS value to 624 – there could be a good reason for doing so.  The second paragraph in the second quote from the documentation suggests that OP’s PROCESSES value would allow about (400 – (70 – 50)) =  380 user processes  (each session will require 1 dedicated process, so that value would permit about 380 user sessions).

In an earlier blog article we saw that the auto-tuned value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter is dependent, in part, on the value of the SESSIONS parameter and the size of the SGA.  In another blog article we found that the maximum number of blocks read by serial direct path read is the greatest power of 2 that is equal to or less than the value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter.  Various other items are also affected by the SESSIONS parameter value, so setting the value to too high of a value has few rewards.

So, did the OP make a mistake when setting the SESSIONS parameter value?  I took a look at a test 11.2.0.2 database instance where I had set the PROCESSES parameter to a value of 150 and left the SESSIONS parameter unset.  The SESSIONS parameter value in the 11.2.0.2 database instance showed a value of 248 when I opened the database and executed:

SHOW PARAMETER SESSIONS

Something a bit stange is happening.  Oracle Database 10.2 seemed to respect the 11.2 documentation’s formula for the auto-calculation of the SESSIONS parameter, but Oracle Database 11.2.0.2 seems to use a different formula.  It was late at night when I posted my reply, so I guessed that the formula was about (1.52 * PROCESSES) + 20 (it was too late at night for thinking in terms of basic algebra, so I just picked a set of numbers that worked with my test PROCESSES and SESSIONS parameter values).

Since responding to the email, I took a sampling of a couple of 11.2.0.2 database instances, and summarized those findings below, including the actual derived SESSIONS value and the calculated value using the formula from the documentation:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
330 520 368
150 248 170
400 624 445
60 112 71
80 144 93

The actual numbers don’t quite match what the documentation suggests.

Time for a little basic algebra.  The formula from the documentation could be represented as:

(x * PROCESSES) + y = SESSIONS

If the structure of the above formula is still correct, just with unknown values for x and y, we will need at least two sets of equations to determine the value of the x and y variables.  To find the x and y variable values, we will use the known PROCESSES and actual derived SESSIONS values from first two rows of the above table:

(x * 330) + y = 520
(x * 150) + y = 248

Solving for y in the first equation:

y = 520 - (x * 330)
y = 520 - 330x

Substituting the above value of y into the second equation to solve for x:

(x * 150) + (520 - 330x) = 248
150x + 520 - 330x = 248
-180x = -272
x = 272/180
x = 1.51111111

Plug the value for x into the above equation for y:

y = 520 - 330x
y = 520 - 330*(1.51111111)
y = 21.3333337

So, based on the values from the first two rows from the above table, in Oracle Database 11.2.0.2:

SESSIONS = (1.51111111 * PROCESSES) + 21.3333337

The above formula works for the first, second, and fourth rows in the above table, but the formula is a bit off for the remaining two rows:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
   Formula 1
330 520 368   520
150 248 170   248
400 624 445   625.77778
60 112 71   112
80 144 93   142.22222

Let’s try solving the formula again using the values from the two rows that did not agree with the new formula for the derived SESSIONS parameter value:

(x * 400) + y = 624
(x * 80) + y = 144

Solving for y in the first equation:

y = 624 - (x * 400)
y = 624 - 400x

Substituting the above value of y into the second equation to solve for x:

(x * 80) + (624 - 400x) = 144
80x - 400x + 624 = 144
-320x = -480
x = 480/320
x = 1.5

Plug the value for x into the above equation for y:

y = 624 - 400 * 1.5
y = 624 - 600
y = 24

The above result leads to the following formula:

SESSIONS = (1.5 * PROCESSES) + 24

Let’s take a look at the values generated by the two formulas side-by-side with the actual derived SESSIONS values:

PROCESSES SESSIONS
(Actual Derived)
SESSIONS
(Documentation Derived)
  Formula 1 Formula 2
330 520 368   520 519
150 248 170   248 249
400 624 445   625.77778 624
60 112 71   112 114
80 144 93   142.22222 144

Interesting results… Doc Bug?  Why three formulas (plus the guess that I wrote in the email)?  No wonder it is sometimes difficult to remember the default derived values for parameters.

What about this suggestion from the documentation:

“If you plan on running 50 user processes, a good estimate would be to set the PROCESSES initialization parameter to 70.”

That suggestion might also be out of date.  From a generic test 11.2.0.2 Standard Edition database instance running on Windows:

SELECT
  COUNT(*)
FROM
  V$SESSION
WHERE
  USERNAME IS NULL;

COUNT(*)
--------
      27

SELECT
  PROGRAM
FROM
  V$SESSION
WHERE
  USERNAME IS NULL
ORDER BY
  PROGRAM;

PROGRAM
-----------------
ORACLE.EXE (ARC0)
ORACLE.EXE (ARC1)
ORACLE.EXE (ARC2)
ORACLE.EXE (ARC3)
ORACLE.EXE (CJQ0)
ORACLE.EXE (CKPT)
ORACLE.EXE (DBRM)
ORACLE.EXE (DBW0)
ORACLE.EXE (DIA0)
ORACLE.EXE (DIAG)
ORACLE.EXE (GEN0)
ORACLE.EXE (J000)
ORACLE.EXE (J001)
ORACLE.EXE (LGWR)
ORACLE.EXE (MMAN)
ORACLE.EXE (MMNL)
ORACLE.EXE (MMON)
ORACLE.EXE (PMON)
ORACLE.EXE (PSP0)
ORACLE.EXE (Q001)
ORACLE.EXE (Q002)
ORACLE.EXE (QMNC)
ORACLE.EXE (RECO)
ORACLE.EXE (SMCO)
ORACLE.EXE (SMON)
ORACLE.EXE (VKTM)
ORACLE.EXE (W000) 

Interesting results… Doc Bug?  So this generic Standard Edition database has 27 background processes.  Maybe this page in the documentation needs a little work also?  Ah, Doc Bug explains it.





The Transforming Face of the Oracle Support Site

23 12 2011

December 23, 2011

A day or two ago I saw an announcement that the HTML (non-Flash) version of the Oracle Support site was to be phased out in January 2012.  It seems like the last time I tried to use that site, the search functionality did not quite work for Oracle Database products; I gave up on the HTML version of the site and went back to the Flash-based version of the site. 

The performance of the Flash-based version of Metalink (I usually navigate to the site by entering metalink.oracle.com into the web browser’s address bar) My Oracle Support has improved significantly since the initial launch, and it appears that once the website is loaded, it is a bit more stable now.  I do not own any i products (iPhone, iPad, iToy) which do not support Flash, so the Flash-based version of the Oracle support site works OK on most of my devices (Windows 7 64 bit with IE 9, Windows 7 32 bit with IE 8, Windows XP 32 bit, Motorola Xoom, Amazon BlackBerry Playbook, etc.) as long as Flash is supported in the web browser.  Now the shocker, the Flash-based version of the support site is also planned to be phased out, based on Metalink (MOS) Doc ID 1385682.1.

Turning the other cheek, or doing an about-face?  The support document mentions that ADF Faces will be used – the site will be built using Oracle Application Development Framework.  I hate to say that this is a case of Oracle eating its own dog food – maybe there is a better way to state that the support site will be using their own technology rather than the technology of a third party?  I wonder if the Oracle OTN forums are also being redesigned to use Oracle technology, rather than that of a third party?  On a related note, it appears that the OTN forum site has fixed the Internal Server Error problem.

Anyone found any additional information about the change to the support site?  Any experience with ADF Faces?





What Could Go Wrong? Testing the DB_FILE_MULTIBLOCK_READ_COUNT Parameter

14 10 2011

October 14, 2011 (Modified October 15, 2011)

When reading various Oracle Database books, I often perform mental exercises related to the book’s contents, asking “what could go wrong?”   That is probably why some of my book reviews have extended to record setting lengths… who knew that Amazon imposed a limit on book review lengths…

I recently started re-reading the “Troubleshooting Oracle Performance” book after buying a PDF copy of the book.  As previously mentioned, the ultimate reason for re-reading the book is so that I may compose a detailed book review, just as I have done for most of the Oracle Database related books that I have read in the past couple of years.  So far, I have had few opportunities to ask “what could go wrong” – until of course I made it to page 178 during the second read-through of the book.  That page includes a PL/SQL block that tests the performance effects of modifying the DB_FILE_MULTIBLOCK_READ_COUNT parameter from a value of 1 to 32, increasing the value by 1 for each pass through the loop in the PL/SQL block.  What could go wrong?

Let’s build a sample table with 11,000,000 rows:

DROP TABLE T1 PURGE;

CREATE TABLE T1(
  C1 NUMBER,
  C2 NUMBER,
  C3 NUMBER,
  C4 DATE,
  C5 DATE,
  C6 DATE,
  C7 VARCHAR2(20),
  C8 VARCHAR2(20),
  C9 VARCHAR2(20));

INSERT INTO
  T1
SELECT
  ROWNUM,
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  TRUNC(SYSDATE)+TRUNC(ROWNUM/100),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A')),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'))
FROM
  DUAL
CONNECT BY
  LEVEL<=1000000;

INSERT INTO
  T1
SELECT
  T1.*
FROM
  T1,
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10);

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)

The above script might take a minute or two to complete.  Let’s take a look at the extent sizes for table T1:

SELECT
  SEGMENT_NAME SEGMENT,
  COUNT(*) EXTENTS,
  BYTES/1024 EXT_SIZE_KB,
  (COUNT(*) * BYTES)/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('T1')
GROUP BY
  SEGMENT_NAME,
  BYTES
ORDER BY
  SEGMENT_NAME,
  BYTES;

SEGMENT    EXTENTS EXT_SIZE_KB   TOTAL_MB
------- ---------- ----------- ----------
T1              16          64          1
T1              63        1024         63
T1             120        8192        960
T1               2       65536        128

Now let’s try a slightly modified version of the script that appears in the book:

SET SERVEROUTPUT ON
SET TIMING ON

DECLARE
  L_COUNT PLS_INTEGER;
  L_TIME PLS_INTEGER;
  L_STARTING_TIME PLS_INTEGER;
  L_ENDING_TIME PLS_INTEGER;
BEGIN
  DBMS_OUTPUT.PUT_LINE('DBFMBRC SECONDS');
  FOR L_DBFMBRC IN 1..128
  LOOP
    EXECUTE IMMEDIATE 'ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT='||L_DBFMBRC;
    L_STARTING_TIME := DBMS_UTILITY.GET_TIME();
    SELECT /*+ FULL(T) */ COUNT(*) INTO L_COUNT FROM T1 T;
    L_ENDING_TIME := DBMS_UTILITY.GET_TIME();
    L_TIME := ROUND((L_ENDING_TIME-L_STARTING_TIME)/100);
    DBMS_OUTPUT.PUT_LINE(L_DBFMBRC||' '||L_TIME);
  END LOOP;
END;
/

-

-

-

-

-

-

-

-

-

-

-

-

-

The output that I received when tested on Oracle Database 11.2.0.2 follows:

DBFMBRC SECONDS
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
30 0
31 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 0
41 0
42 0
43 0
44 0
45 0
46 0
47 0
48 0
49 0
50 0
51 0
52 0
53 0
54 0
55 0
56 0
57 0
58 0
59 0
60 0
61 0
62 0
63 0
64 0
65 0
66 0
67 0
68 0
69 0
70 0
71 0
72 0
73 0
74 0
75 0
76 0
77 0
78 0
79 0
80 0
81 0
82 0
83 0
84 0
85 0
86 0
87 0
88 0
89 0
90 0
91 0
92 0
93 0
94 0
95 0
96 0
97 0
98 0
99 0
100 0
101 0
102 0
103 0
104 0
105 0
106 0
107 0
108 0
109 0
110 0
111 0
112 0
113 0
114 0
115 0
116 0
117 0
118 0
119 0
120 0
121 0
122 0
123 0
124 0
125 0
126 0
127 0
128 0

PL/SQL procedure successfully completed.

What Could Go Wrong?  (I had two or three answers for this question as I was reading the script, but I have a couple more after a bit of experimentation.  Think about what might go wrong without looking in the book.)

Added October 15, 2011:

The “Troubleshooting Oracle Performance” book, to my knowledge, was the first performance tuning book to address Oracle Database 11.1, and this book was released shortly after the release of Oracle Database 11.1.0.6.  While it was not widely known at the time (I do not think that it was documented), Oracle Database 11.1.0.6 might employ serial direct path reads (directly into the PGA) rather than db file scattered reads (into the buffer cache) when performing a full table scan.  Based on an experiment, the DB_FILE_MULTIBLOCK_READ_COUNT parameter influences the maximum number of blocks read in a single direct path read request; the maximum number of blocks read by direct path read appears to be the greatest power of two that is equal to or less than the value of the the DB_FILE_MULTIBLOCK_READ_COUNT parameter.  The paragraphs that follow this test case in the book suggest that a parallel full table scan should not be utilized – that implies that the book author is aware that direct path reads might distort the outcome of the test case.  This was the first thought that I had when viewing this test case when re-reading the book.

The effects of the table blocks that are already buffered in the buffer cache is also a potential area where this test case might see distorted results (the paragraphs that follow this test case in the book address this issue).  The test table is roughly 1,152MB in size (minus the blocks that are above the high water mark, of course), so how likely is it that some of the table blocks might be cached in the buffer cache either before the first execution of the SELECT from the table T1, or between executions of the SELECT from T1?  Does the answer depend on what was performed prior to the test case – how full is the buffer cache with unrelated blocks?  What is possibly interesting is that in my test case, all of the blocks belonging to the entire table were buffered in the buffer cache prior to the start of the PL/SQL block in the test case.  Without the timing information displayed by SET TIMING ON, it is difficult to determine if the DBMS_UTILITY.GET_TIME function simply is not supported on the operating system platform used for the test, or if as Timur stated in the comments section, the full table scans were completing so quickly that the timing of the individual runs was being rounded to 0 seconds.  I was careful not to include the output of the SQL*Plus timing during the initial posting of the blog article.  I obtained the following times from repeated PL/SQL executions:

Elapsed: 00:00:39.37

Elapsed: 00:00:37.05

Elapsed: 00:00:37.14

With a 10046 trace enabled, the execution time climbed sharply:

...
125 1
126 1
127 1
128 1

PL/SQL procedure successfully completed.

Elapsed: 00:01:40.16

The potential buffering of some or all of the table blocks was the second thought that I had while reading the test case during the re-read of the book.  The paragraph in the book that followed the test case addressed the potential problems of the table blocks being cached in the buffer cache, the operating system cache, a SAN cache, or a similar cache.  As previously mentioned, Oracle Database 11.1.0.6 and later are able to perform serial direct path reads, where previous Oracle Database versions would have used db file scattered reads, so the Oracle Database version is potentially important.  The book previously mentioned on page 175 three items that might cause fewer blocks to be read than what is specified by the DB_FILE_MULTIBLOCK_READ_COUNT parameter – those items might impact the results of the test, if the test table is not representative of the other objects in the database that might be read by a db file scattered read.

While reading the test case, another thought raced through my mind: why is the author experimenting with the DB_FILE_MULTIBLOCK_READ_COUNT parameter – just leave it unset and make certain that system statistics are collected so that the MBRC system statistic is used (I believe that this is related to Mohamed’s second point in the comments section).  Since the release of Oracle Database 10.2.0.1 it is possible to leave that parameter unset (and a bug was fixed in 10.2.0.4 that then allowed the parameter to be set to 0 which then had the same effect as leaving it unset).  That is the point that Mohamed made in the comments section.  The book author covered the possibility of leaving the DB_FILE_MULTIBLOCK_READ_COUNT parameter unset in the paragraphs that followed the test case, and since this book attempts to help with performance problems in Oracle 9.0.1 and later, it probably is a good idea that the test case was included to benefit those who had not yet transitioned all of the databases to Oracle Database 10.2 or later.

Result caching, mentioned by Steve Catmull in the comments section, is not something that I had considered while reading the test case.  If I remember correctly, when the RESULT_CACHE_MODE parameter is set to FORCE, the result cache could distort the results of the test case.

I thought that I would include a couple of more test cases on 11.2.0.2.
With an unset DB_FILE_MULTIBLOCK_READ_COUNT and the following script:

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

A variety of “direct path read” wait events appear in the resulting trace file.  Two typical wait events found in the trace file follow:

WAIT #362523544: nam='direct path read' ela= 2280 file number=4 first dba=3951616 block cnt=128 obj#=71407 tim=366050651386
WAIT #362523544: nam='direct path read' ela= 2323 file number=4 first dba=3951744 block cnt=128 obj#=71407 tim=366050654201

The above shows the types of wait events that I expected to have when I read the PL/SQL block.

Let’s try again with a specified value for DB_FILE_MULTIBLOCK_READ_COUNT:

ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT=20;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

A variety of “direct path read” wait events appear in the resulting trace file.  Two typical wait events found in the trace file follow:

WAIT #362523544: nam='direct path read' ela= 272 file number=4 first dba=4061184 block cnt=16 obj#=71407 tim=366357357469
WAIT #362523544: nam='direct path read' ela= 372 file number=4 first dba=4061216 block cnt=16 obj#=71407 tim=366357357889

In the above, note that the direct path read wait did not entirely adhere to the value for the modified DB_FILE_MULTIBLOCK_READ_COUNT, the actual number of blocks was rounded down to the next lower power of 2.

Let’s try again with another value for the DB_FILE_MULTIBLOCK_READ_COUNT parameter:

ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT=100;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

A variety of “direct path read” wait events appear in the resulting trace file.  Two typical wait events found in the trace file follow:

WAIT #362523544: nam='direct path read' ela= 1206 file number=4 first dba=4050624 block cnt=64 obj#=71407 tim=366563051144
WAIT #362523544: nam='direct path read' ela= 1144 file number=4 first dba=4050688 block cnt=64 obj#=71407 tim=366563052537

In the above, note that the direct path read wait did not entirely adhere to the value for the modified DB_FILE_MULTIBLOCK_READ_COUNT, the actual number of blocks was rounded down to the next lower power of 2.

Let’s try again with another value for the DB_FILE_MULTIBLOCK_READ_COUNT parameter:

ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT=120;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

A variety of “direct path read” wait events appear in the resulting trace file.  Two typical wait events found in the trace file follow:

WAIT #362523544: nam='direct path read' ela= 1183 file number=4 first dba=4047232 block cnt=64 obj#=71407 tim=366628708554
WAIT #362523544: nam='direct path read' ela= 1280 file number=4 first dba=4047296 block cnt=64 obj#=71407 tim=366628710113

Just for verification:

ALTER SESSION SET DB_FILE_MULTIBLOCK_READ_COUNT=128;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

Wait events:

WAIT #362523544: nam='direct path read' ela= 2505 file number=4 first dba=4050560 block cnt=128 obj#=71407 tim=366681412987
WAIT #362523544: nam='direct path read' ela= 2259 file number=4 first dba=4050688 block cnt=128 obj#=71407 tim=366681415722

The above again shows that the number of blocks read by direct path read are rounded down to the nearest power of two.

OK, let’s try the initial test again:

DISCONNECT ALL;
CONNECT ...

Instead of using a PL/SQL block, let’s use regular SQL statements:

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>100)

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE

SELECT /*+ FULL(T1) */ COUNT(*) FROM T1;

EXEC DBMS_MONITOR.SESSION_TRACE_DISABLE

The results, “direct path read” wait events for both of the selects from table T1:

...
WAIT #357271008: nam='direct path read' ela= 2774 file number=4 first dba=3930496 block cnt=128 obj#=71407 tim=371288846837
WAIT #357271008: nam='direct path read' ela= 2353 file number=4 first dba=3930624 block cnt=128 obj#=71407 tim=371288849736
...
WAIT #357239640: nam='direct path read' ela= 2749 file number=4 first dba=3935360 block cnt=128 obj#=71407 tim=371292356287
WAIT #357239640: nam='direct path read' ela= 2580 file number=4 first dba=3935488 block cnt=128 obj#=71407 tim=371292359340
...

Other testing showed that if the buffer cache was flushed after creating and populating the table, but before collecting statistics, Oracle Database 11.2.0.2 performed serial direct path reads to read the table blocks, rather than reading those blocks into the buffer cache using db file scattered reads.





Having Fun with ORA-00600 (qkebCreateConstantOpn1) – What is the Cause of this Error?

28 09 2011

September 28, 2011

A friend brought an interesting Oracle Database 11.2.0.2 problem to my attention. He is in the process of upgrading to a new Oracle Database server (and Oracle Database version) and a new version of an ERP package. He has experienced an ORA-00600 error with both patch 6 and patch 9 for 11.2.0.2 on 64 bit Windows after rolling forward the database table definitions for the new version of the ERP package and then testing the ERP functionality (all in a test environment, of course).

I have not seen the most recent table modification scripts first hand, but upgrade scripts for the earlier ERP versions exercised good habits of declaring column constraints and default column values (as well as triggers to make certain that SQL script writers do not get too carried away).

So, what to do to troubleshoot this problem, call Oracle support because an ORA-00600 error is returned, or poke around a little to see if we can determine what is happening? Since I am not able to touch the database to investigate the problem freely, I asked for a 10046 trace at level 12 to be generated. Below is a portion of that trace file (the portion in bold was actually executed by the client application, while the rest is at dep=1, in this case internal SQL used for query optimization):

PARSING IN CURSOR #386947304 len=42 dep=1 uid=0 oct=3 lid=0 tim=63747639340 hv=844002283 ad='314cbd908' sqlid='ftj9uawt4wwzb'
select condition from cdef$ where rowid=:1
END OF STMT
PARSE #386947304:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1884964958,tim=63747639340
BINDS #386947304:
 Bind#0
  oacdty=11 mxl=16(16) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=00 csi=00 siz=16 off=0
  kxsbbbfp=1c28c668  bln=16  avl=16  flg=05
  value=0000A680.003D.0001
EXEC #386947304:c=0,e=32,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1884964958,tim=63747639412
FETCH #386947304:c=0,e=8,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=1884964958,tim=63747639432
STAT #386947304 id=1 cnt=1 pid=0 pos=1 obj=31 op='TABLE ACCESS BY USER ROWID CDEF$ (cr=1 pr=0 pw=0 time=4 us cost=1 size=15 card=1)'
CLOSE #386947304:c=0,e=21,dep=1,type=0,tim=63747639466
...
...
...
PARSING IN CURSOR #386943464 len=58 dep=0 uid=36 oct=6 lid=36 tim=63750497441 hv=1532843717 ad='3144ac3d8' sqlid='9m0sbg1dpupq5' 
update PART set mrp_required = :1     where id = :2       
END OF STMT 
PARSE #386943464:c=0,e=121,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=63750497440 
PARSE #386947304:c=0,e=10,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1884964958,tim=63750497602
BINDS #386947304:
 Bind#0
  oacdty=11 mxl=16(16) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=00 csi=00 siz=16 off=0
  kxsbbbfp=1edffbd8  bln=16  avl=16  flg=05
  value=0000A7B3.0019.0001
EXEC #386947304:c=0,e=33,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1884964958,tim=63750497664
FETCH #386947304:c=0,e=9,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=1884964958,tim=63750497684
CLOSE #386947304:c=0,e=8,dep=1,type=3,tim=63750497704
=====================
PARSING IN CURSOR #517995552 len=98 dep=1 uid=0 oct=3 lid=0 tim=63750497798 hv=3044436590 ad='2f0e99070' sqlid='dha5nv6urcumf'
select binaryDefVal,           lengthb(binaryDefVal) from ecol$ where tabobj# = :1 and colnum = :2
END OF STMT
PARSE #517995552:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=3081038021,tim=63750497797
BINDS #517995552:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=1edfe758  bln=22  avl=04  flg=05
  value=18450
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=1edfe770  bln=22  avl=03  flg=01
  value=141
EXEC #517995552:c=0,e=56,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=3081038021,tim=63750497891
FETCH #517995552:c=0,e=14,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=3081038021,tim=63750497913
STAT #517995552 id=1 cnt=1 pid=0 pos=1 obj=123 op='TABLE ACCESS BY INDEX ROWID ECOL$ (cr=2 pr=0 pw=0 time=11 us cost=2 size=95 card=1)'
STAT #517995552 id=2 cnt=1 pid=1 pos=1 obj=126 op='INDEX RANGE SCAN ECOL_IX1 (cr=1 pr=0 pw=0 time=8 us cost=1 size=0 card=1)'
CLOSE #517995552:c=0,e=1,dep=1,type=3,tim=63750497955
PARSE #386947304:c=0,e=9,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1884964958,tim=63750498060
BINDS #386947304:
 Bind#0
  oacdty=11 mxl=16(16) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=00 csi=00 siz=16 off=0
  kxsbbbfp=1edfe360  bln=16  avl=16  flg=05
  value=0000A7B3.0019.0001
EXEC #386947304:c=0,e=34,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1884964958,tim=63750498120
FETCH #386947304:c=0,e=9,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=1884964958,tim=63750498140
CLOSE #386947304:c=0,e=7,dep=1,type=3,tim=63750498159
Symbol file c:\Oracle\product\11.2.0\dbhome_1\BIN\orannzsbb11.SYM does not match binary.
 Symbol TimeStamp=4cea6d1b, Module TimeStamp=4d6f3707 are different
Incident 4193 created, dump file: C:\ORACLE\diag\rdbms\test\test\incident\incdir_4193\test_ora_3796_i4193.trc
ORA-00600: internal error code, arguments: [qkebCreateConstantOpn1], [], [], [], [], [], [], [], [], [], [], []

* BLOCK CHECKING IS NOT DONE-Call Heap not initialized.
* Skip checking block dba=0x2402d02.
* BLOCK CHECKING IS NOT DONE-Call Heap not initialized.
* Skip checking block dba=0x340660f.
WAIT #386943464: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=17988 tim=63752354884
WAIT #386943464: nam='SQL*Net break/reset to client' ela= 469 driver id=1413697536 break?=0 p3=0 obj#=17988 tim=63752355367
WAIT #386943464: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=17988 tim=63752355385
WAIT #386943464: nam='SQL*Net message from client' ela= 7198 driver id=1413697536 #bytes=1 p3=0 obj#=17988 tim=63752362607
WAIT #0: nam='ADR file lock' ela= 3  =0  =0  =0 obj#=17988 tim=63752362675
WAIT #0: nam='ADR file lock' ela= 67  =0  =0  =0 obj#=17988 tim=63752362763
WAIT #0: nam='ADR block file read' ela= 146  =0  =0  =0 obj#=17988 tim=63752363029
WAIT #0: nam='ADR block file write' ela= 392  =0  =0  =0 obj#=17988 tim=63752363449
XCTEND rlbk=1, rd_only=0, tim=63752364013
WAIT #0: nam='log file sync' ela= 115 buffer#=1552 sync scn=251585418 p3=0 obj#=17988 tim=63752364516
WAIT #0: nam='SQL*Net message to client' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=17988 tim=63752364548

Interesting, maybe?  I ran a test in an older version of the ERP package, also using Oracle Database 11.2.0.2.  My test did not generate an ORA-00600 error, but I did notice something that was a bit interesting.  There was not a single SQL statement similar to the following in my trace file, while the SQL statement was found in the trace file that was sent to me:

select binaryDefVal, lengthb(binaryDefVal) from ecol$ where tabobj# = :1 and colnum = :2

ECOL$? Anything found by Google?  OK, two articles, I will briefly quote one of the articles for dramatic effect:

“11G introduced a fast=true feature for adding columns with default values. In 11G when you want to add a not null column with a default value, the operation completes in the blink of an eye (if there are no locks on the table of course). Now it does not have to update all the rows, it just keeps the information as meta-data in the dictionary. So adding a new column becomes just a few dictionary operations.”

I suspect that the ERP upgrade script might have added non-NULL columns with default values… so let’s take a tour through Metalink (MOS) to see if anything interesting is found:

  • Doc ID 1301504.1: “ORA-00600: internal error code, arguments: [qkebCreateConstantOpn1] ON UPDATES ON TABLES WITH CONSTRAINTS”
  • Bug 12316092: “ORA-600 [QKEBCREATECONSTANTOPN1] AFTER ADDING NOT NULL COLUMN WITH DEFAULT”
  • Bug 12760255: “ORA-00600 INTERNAL ERROR CODE, ARGUMENTS [QKEBCREATECONSTANTOPN1] ON SELECT STAT”
  • Doc ID 8501439.8, Bug 8501439:” OERI[kghfrh:ds] / [kghfrempty:ds] / [17147] ORA-7445 [memcpy] with ANALYZE in a table with added column”
  • Doc ID 1106553.1: “Wrong Result For Added Column After Table Creation in 11g”
  • Bug 11865621: “MERGE ON TABLE WITH COL ADDED WHEN _ADD_COL_OPTIM_ENABLED=TRUE SPINS ON CPU”
  • Bug 10261680: “QUERY RETURNING WRONG RESULT WITH PARALLEL ENABLED”
  • Doc ID 10080167.8, Bug 10080167: “ORA-600 [15599] / dump [kghalf] from TRIGGER with WHEN clause”

OK, so I located through Metalink a couple of potential problems that might be related to the problem that was brought to my attention, some of which might be fixed in Oracle Database 11.2.0.3 or 11.2 (the first two seem to directly pertain to this issue).  It appears that quite a few of the bugs are triggered by the optimization (the fast=true reference in the above quote) that was introduced in Oracle Database 11.1.

What might be the work-around for this particular problem (mostly rough guesses at this point – pick one):

  • Wait for the release of Oracle Database 11.2.0.3 on the Windows platform… or 12.1.
  • Execute “ALTER SYSTEM SET “_ADD_COL_OPTIM_ENABLED”=FALSE;” before running the scripts that add the additional columns to the tables.
  • After running the ERP upgrade scripts, export the database, drop the database, create a new blank database, and then import the exported file.

Any other options?

If you look at the trace file output above, you might have noticed the following line:

Symbol file c:\Oracle\product\11.2.0\dbhome_1\BIN\orannzsbb11.SYM does not match binary

I wonder if that message has any relevance to the problem? It is slightly interesting to find that Oracle Database 11.2.0.1 with patch 7 installed has a file by that name with a file date of April 8, 2011 (399KB in size) while 11.2.0.2 with patch 6 apparently has an older version with a file date of November 22, 2010 (395KB in size). Probably unrelated to the problem at hand, but still interesting that a more recent release offers an older version of a file. I wonder if that file date problem could be related (requires a My Oracle Support account):

Doc ID 1272490.1: “‘Symbol file oraclsra11.SYM does not match binary’ Reported In Trace File + ORA-600 / ORA-7445″





SESSION_CACHED_CURSORS – Possibly Interesting Details

21 07 2011

July 21, 2011

Have you ever wondered about the V$OPEN_CURSOR view, the SESSION_CACHED_CURSORS parameter, and the two session-level statistics “session cursor cache count” and “session cursor cache hits”?  I did after reading from two different sources that stated essentially that a larger shared pool would be required when the value for the SESSION_CACHED_CURSORS parameter is increased.  The shared pool? – but the session’s cursors are cached in the UGA, which is typically in the PGA (unless shared server is used), and as far as I am aware the PGA is not in the shared pool.

So, what do we (think that we) know about the session cached cursors?

  • The per session limit for the number of cached cursors is specified by the SESSION_CACHED_CURSORS parameter.
  • Prior to the release of the Oracle Database 9.2.0.5 patchset the OPEN_CURSORS parameter controlled the number of cached cursors for PL/SQL execution, and as of 9.2.0.5 the SESSION_CACHED_CURSORS parameter controls the number of PL/SQL cursors that are cached per session (source).
  • In recent Oracle Database release versions the SESSION_CACHED_CURSORS parameter defaults to a value of 50.
  • Assuming that the SESSION_CACHED_CURSORS parameter is set to a value greater than 0, a SQL statement that is executed three times in the same session will be added to that session’s cursor cache and will remain open, even when explicitly closed by the session.
  • Starting in Oracle Database 11.1, if a 10046 trace is enabled for a session and a cursor is closed, a CLOSE # line will be written to the 10046 trace file, with the type= parameter indicating whether or not the cursor was added to the session cursor cache (and was thus actually not closed).
  • In V$MYSTAT, the session-level “session cursor cache count” statistic indicates the number of cursors that are cached for the current session, while the “session cursor cached hits” statistic indicates the number of times the current session has accessed those cached cursors.
  • V$OPEN_CURSOR shows the cursors that are currently open for all connected sessions.  Those cursors that are in the session cursor cache will also appear in V$OPEN_CURSOR.

The above bullet points are probably common knowledge, but is it all true?

Let’s check the Oracle Database 11.2 documentation for V$OPEN_CURSOR:

“V$OPEN_CURSOR lists cursors that each user session currently has opened and parsed, or cached.”

Nothing out of the ordinary with the above quote – it seems to be well aligned with the last of the above bullet points.

Let’s check the Oracle Database 11.2 documentation for SESSION_CACHED_CURSORS:

Default value: 50

“SESSION_CACHED_CURSORS specifies the number of session cursors to cache. Repeated parse calls of the same SQL (including recursive SQL) or PL/SQL statement cause the session cursor for that statement to be moved into the session cursor cache. Subsequent parse calls will find the cursor in the cache and do not need to reopen the cursor. Oracle uses a least recently used algorithm to remove entries in the session cursor cache to make room for new entries when needed.”

The above confirms bullet points one and three that were listed above, and almost bullet point two.

Let’s check the behavior changes listed for Oracle Database 11.2 in the documentation library:

“Starting with Oracle Database 10g Release 1 (10.1), the number of cached cursors is determined by the SESSION_CACHED_CURSORS initialization parameter. In previous Oracle Database releases, the number of SQL cursors cached by PL/SQL was determined by the OPEN_CURSORS initialization parameter.”

Although the above quote states Oracle Database 10.1 rather than 9.2.0.5, the quote confirms the change mentioned in the second of the above bullet points.

Let’s take a look at part of the Performance Tuning Guide from the 11.2 documentation set:

“Oracle checks the library cache to determine whether more than three parse requests have been issued on a given statement. If so, then Oracle assumes that the session cursor associated with the statement should be cached and moves the cursor into the session cursor cache. Subsequent requests to parse that SQL statement by the same session then find the cursor in the session cursor cache.”

Although the above states that a SQL statement must be parsed more than three times before it is added to the session cursor cache rather than executed three times, the quote basically confirms the fourth of the above bullet points.

Checking the statistic descriptions for Oracle Database 11.2:

session cursor cache count: Total number of cursors cached. This statistic is incremented only if SESSION_CACHED_CURSORS > 0. This statistic is the most useful in V$SESSTAT. If the value for this statistic in V$SESSTAT is close to the setting of the SESSION_CACHED_CURSORS parameter, the value of the parameter should be increased.

session cursor cache hits: Number of hits in the session cursor cache. A hit means that the SQL statement did not have to be reparsed. Subtract this statistic from “parse count (total)” to determine the real number of parses that occurred.”

The above quote seems to confirm the second to the last of the above bullet points.

Everyone happy, or should we test?

We need a table for a bit of experimentation, and we might as well flush the shared pool too:

CREATE TABLE T4 AS
SELECT
  *
FROM
  ALL_OBJECTS
WHERE
  ROWNUM<=10;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4') 

ALTER SYSTEM FLUSH SHARED_POOL;

Now we need some sort of an elegant way to test the bulleted items that are listed at the start of this article.  Of course, a VBS script that is executed on the poor Windows client computer that is sitting under the finger smudged iPad is the perfect way to run a test.  In the script below replace MyUsername with a valid username for your database, replace MyPassword with the password for that user, and replace MyDatabase with the database SID as it appears in the TNSNAMES.ORA file.  The script performs the following actions:

  1. Connects to the database
  2. Determines the SID and SERIAL# for the session and writes those values to the C:\SessionCursorCache.txt text file.
  3. Enables a 10046 trace at level 1 (minimal detail with no wait events)
  4. Writes the session’s current values for the “session cursor cache hits” and “session cursor cache count” to the text file.
  5. Writes to the text file all SQL statements from V$OPEN_CURSOR where the SID matches the SID for the current session.
  6. Opens two cursors for the SQL statement “SELECT * FROM T4″
  7. Writes the session’s current values for the “session cursor cache hits” and “session cursor cache count” to the text file.  This should indicate if opening the cursors in step 6 triggered a hit on the cached cursors.
  8. Retrieves all rows from the two cursors and explicitly closes those cursors (potentially alternating between fetches from the two cursors if table T4 contained more than 1,000 rows).
  9. Repeat steps 4 through 8 two more times to see how repeated executions affect the statistics and the cursors that are cached.
  10. Disconnects from the database.

The script follows (script download, save with .vbs extension SessionCachedCursorsTest.vbs):

Dim strSQL
Dim strSQL2
Dim strSQL3
Dim strUsername
Dim strPassword
Dim strDatabase
Dim snpData
Dim snpData2
Dim snpDataCursorCache
Dim snpDataOpenSQL
Dim snpDataSID
Dim dbDatabase
Dim lngCnt
Dim intPass
Dim objFSO
Dim objFile

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile=objFSO.CreateTextFile("C:\SessionCursorCache.txt", True)

Set snpData = CreateObject("ADODB.Recordset")
Set snpData2 = CreateObject("ADODB.Recordset")
Set snpDataCursorCache = CreateObject("ADODB.Recordset")
Set snpDataOpenSQL = CreateObject("ADODB.Recordset")
Set snpDataSID = CreateObject("ADODB.Recordset")
Set dbDatabase = CreateObject("ADODB.Connection")

strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDatabase"
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";FetchSize=5000;"

dbDatabase.Open

strSQL = "SELECT" & VBCrLf
strSQL = strSQL & "  S.SID," & VBCrLf
strSQL = strSQL & "  S.SERIAL# SERIALN" & VBCrLf
strSQL = strSQL & "FROM" & VBCrLf
strSQL = strSQL & "  V$SESSION S," & VBCrLf
strSQL = strSQL & "  (SELECT" & VBCrLf
strSQL = strSQL & "    SID" & VBCrLf
strSQL = strSQL & "  FROM" & VBCrLf
strSQL = strSQL & "    V$MYSTAT" & VBCrLf
strSQL = strSQL & "  WHERE" & VBCrLf
strSQL = strSQL & "    ROWNUM=1) M" & VBCrLf
strSQL = strSQL & "WHERE" & VBCrLf
strSQL = strSQL & "  S.SID=M.SID" & VBCrLf

snpDataSID.Open strSQL, dbDatabase
If snpDataSID.State = 1 Then
  Do While Not(snpDataSID.EOF)
    objFile.Write "SID: " & snpDataSID("sid") & ",  SERIAL#: " & snpDataSID("serialn") & VBCrLf
    snpDataSID.MoveNext
  Loop
  snpDataSID.Close
  objFile.Write " " & VBCrLf
End If

dbDatabase.Execute "ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SessionCursorCache'"
dbDatabase.Execute "ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 1'"

strSQL2 = "SELECT" & VBCrLf
strSQL2 = strSQL2 & "  SN.NAME STAT_NAME," & VBCrLf
strSQL2 = strSQL2 & "  M.VALUE" & VBCrLf
strSQL2 = strSQL2 & "FROM" & VBCrLf
strSQL2 = strSQL2 & "  V$MYSTAT M," & VBCrLf
strSQL2 = strSQL2 & "  V$STATNAME SN" & VBCrLf
strSQL2 = strSQL2 & "WHERE" & VBCrLf
strSQL2 = strSQL2 & "  SN.NAME IN ('session cursor cache hits','session cursor cache count')" & VBCrLf
strSQL2 = strSQL2 & "  AND SN.STATISTIC#=M.STATISTIC#" & VBCrLf
strSQL2 = strSQL2 & "ORDER BY" & VBCrLf
strSQL2 = strSQL2 & "  SN.NAME"

strSQL3 = "SELECT" & VBCrLf
strSQL3 = strSQL3 & "  SQL_TEXT" & VBCrLf
strSQL3 = strSQL3 & "FROM" & VBCrLf
strSQL3 = strSQL3 & "  V$OPEN_CURSOR" & VBCrLf
strSQL3 = strSQL3 & "WHERE" & VBCrLf
strSQL3 = strSQL3 & "  SID=" & VBCrLf
strSQL3 = strSQL3 & "    (SELECT" & VBCrLf
strSQL3 = strSQL3 & "       SID" & VBCrLf
strSQL3 = strSQL3 & "     FROM" & VBCrLf
strSQL3 = strSQL3 & "       V$MYSTAT" & VBCrLf
strSQL3 = strSQL3 & "     WHERE" & VBCrLf
strSQL3 = strSQL3 & "       ROWNUM=1)" & VBCrLf
strSQL3 = strSQL3 & "ORDER BY" & VBCrLf
strSQL3 = strSQL3 & "  SQL_TEXT"

For intPass = 1 to 3
snpDataCursorCache.Open strSQL2, dbDatabase
If snpDataCursorCache.State = 1 Then
  objFile.Write "Before Pass #" & intPass & VBCrLf
  Do While Not(snpDataCursorCache.EOF)
    objFile.Write snpDataCursorCache("stat_name") & " " & snpDataCursorCache("value") & VBCrLf
    snpDataCursorCache.MoveNext
  Loop
  snpDataCursorCache.Close
  objFile.Write " " & VBCrLf
End If

snpDataOpenSQL.Open strSQL3, dbDatabase
lngCnt = 0
If snpDataOpenSQL.State = 1 Then
  Do While Not(snpDataOpenSQL.EOF)
    lngCnt = lngCnt + 1
    objFile.Write lngCnt & "  " & snpDataOpenSQL("sql_text") & VBCrLf
    snpDataOpenSQL.MoveNext
  Loop
  snpDataOpenSQL.Close
  objFile.Write " " & VBCrLf
End If

strSQL = "SELECT * FROM T4"
snpData.Open strSQL, dbDatabase
snpData2.Open strSQL, dbDatabase
snpDataCursorCache.Open strSQL2, dbDatabase
If snpDataCursorCache.State = 1 Then
  objFile.Write "After Open Pass #" & intPass & VBCrLf
  Do While Not(snpDataCursorCache.EOF)
    objFile.Write snpDataCursorCache("stat_name") & " " & snpDataCursorCache("value") & VBCrLf
    snpDataCursorCache.MoveNext
  Loop
  snpDataCursorCache.Close
  objFile.Write " " & VBCrLf
End If

lngCnt = 0
If snpData.State = 1 Then
  Do While Not(snpData.EOF)
    Do While (lngCnt < 1000) and Not(snpData.EOF)
      lngCnt = lngCnt + 1
      snpData.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData2.EOF)
      lngCnt = lngCnt + 1
      snpData2.MoveNext
    Loop
    lngCnt = 0
  Loop
End If

snpData.Close
snpData2.Close

Next

dbDatabase.Execute "ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'"
objFile.Close

Set snpData = Nothing
Set snpData2 = Nothing
Set snpDataCursorCache = Nothing
Set snpDataOpenSQL = Nothing
Set snpDataSID = Nothing

dbDatabase.Close
Set dbDatabase = Nothing
Set objFile = Nothing
Set objFSO = Nothing

If you save the script as SessionCachedCursorsTest.vbs you can execute it from a Windows command line with the following command:

CSCRIPT SessionCachedCursorsTest.vbs

The script will generate a file named C:\SessionCursorCache.txt (if User Access Control is enabled on the Windows computer, you will need to modify the script to create the file in a different location).  What might you find in the generated file?  This is what I saw after the script ran shortly after flushing the shared pool (64 bit Oracle Database 11.2.0.2):

SID: 66,  SERIAL#: 7863

Before Pass #1
session cursor cache count 49
session cursor cache hits 240

1  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
2  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
3  select /*+ rule */ bucket, endpoint, col#, epvalue from hist
4  select /*+ rule */ bucket, endpoint, col#, epvalue from hist
5  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
6  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
7  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
8  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
9  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
10  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
11  select col#,intcol#,charsetid,charsetform from col$ where ob
12  select col#,intcol#,charsetid,charsetform from col$ where ob
13  select col#,intcol#,charsetid,charsetform from col$ where ob
14  select col#,intcol#,ntab# from ntab$ where obj#=:1 order by
15  select col#,intcol#,ntab# from ntab$ where obj#=:1 order by
16  select col#,intcol#,ntab# from ntab$ where obj#=:1 order by
17  select col#,intcol#,reftyp,stabid,expctoid from refcon$ wher
18  select col#,intcol#,reftyp,stabid,expctoid from refcon$ wher
19  select col#,intcol#,reftyp,stabid,expctoid from refcon$ wher
20  select col#,intcol#,toid,version#,packed,intcols,intcol#s,fl
21  select col#,intcol#,toid,version#,packed,intcols,intcol#s,fl
22  select col#,intcol#,toid,version#,packed,intcols,intcol#s,fl
23  select cols,audit$,textlength,intcols,property,flags,rowid f
24  select cols,audit$,textlength,intcols,property,flags,rowid f
25  select con#,obj#,rcon#,enabled,nvl(defer,0),spare2,spare3 fr
26  select con#,obj#,rcon#,enabled,nvl(defer,0),spare2,spare3 fr
27  select con#,type#,condlength,intcols,robj#,rcon#,match#,refa
28  select con#,type#,condlength,intcols,robj#,rcon#,match#,refa
29  select distinct(-privilege#),nvl(option$,0) from sysauth$ wh
30  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
31  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
32  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
33  select intcol#, toid, version#, intcols, intcol#s, flags, sy
34  select intcol#, toid, version#, intcols, intcol#s, flags, sy
35  select intcol#, toid, version#, intcols, intcol#s, flags, sy
36  select intcol#,type,flags,lobcol,objcol,extracol,schemaoid,
37  select intcol#,type,flags,lobcol,objcol,extracol,schemaoid,
38  select intcol#,type,flags,lobcol,objcol,extracol,schemaoid,
39  select l.col#, l.intcol#, l.lobj#, l.ind#, l.ts#, l.file#, l
40  select name,intcol#,segcol#,type#,length,nvl(precision#,0),d
41  select name,intcol#,segcol#,type#,length,nvl(precision#,0),d
42  select node,owner,name from syn$ where obj#=:1
43  select node,owner,name from syn$ where obj#=:1
44  select obj#,type#,ctime,mtime,stime, status, dataobj#, flags
45  select obj#,type#,ctime,mtime,stime, status, dataobj#, flags
46  select order#,columns,types from access$ where d_obj#=:1
47  select owner#,name,namespace,remoteowner,linkname,p_timestam
48  select rowcnt, blkcnt, empcnt, avgspc, chncnt, avgrln, analy
49  select text from view$ where rowid=:1
50  select text from view$ where rowid=:1
51  select timestamp, flags from fixed_obj$ where obj#=:1
52  select timestamp, flags from fixed_obj$ where obj#=:1

After Open Pass #1
session cursor cache count 49
session cursor cache hits 279

Before Pass #2
session cursor cache count 49
session cursor cache hits 279

1  SELECT    SN.NAME STAT_NAME,    M.VALUE  FROM    V$MYSTAT M,
2  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
3  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
4  select /*+ rule */ bucket, endpoint, col#, epvalue from hist
5  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
6  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
7  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
8  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
9  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
10  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
11  select col#,intcol#,charsetid,charsetform from col$ where ob
12  select col#,intcol#,charsetid,charsetform from col$ where ob
13  select col#,intcol#,charsetid,charsetform from col$ where ob
14  select col#,intcol#,ntab# from ntab$ where obj#=:1 order by
15  select col#,intcol#,ntab# from ntab$ where obj#=:1 order by
16  select col#,intcol#,reftyp,stabid,expctoid from refcon$ wher
17  select col#,intcol#,reftyp,stabid,expctoid from refcon$ wher
18  select col#,intcol#,toid,version#,packed,intcols,intcol#s,fl
19  select col#,intcol#,toid,version#,packed,intcols,intcol#s,fl
20  select cols,audit$,textlength,intcols,property,flags,rowid f
21  select cols,audit$,textlength,intcols,property,flags,rowid f
22  select con#,obj#,rcon#,enabled,nvl(defer,0),spare2,spare3 fr
23  select con#,obj#,rcon#,enabled,nvl(defer,0),spare2,spare3 fr
24  select con#,obj#,rcon#,enabled,nvl(defer,0),spare2,spare3 fr
25  select con#,type#,condlength,intcols,robj#,rcon#,match#,refa
26  select con#,type#,condlength,intcols,robj#,rcon#,match#,refa
27  select con#,type#,condlength,intcols,robj#,rcon#,match#,refa
28  select distinct(-privilege#),nvl(option$,0) from sysauth$ wh
29  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
30  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
31  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
32  select i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,i.fla
33  select intcol#, toid, version#, intcols, intcol#s, flags, sy
34  select intcol#, toid, version#, intcols, intcol#s, flags, sy
35  select intcol#,nvl(pos#,0),col#,nvl(spare1,0) from ccol$ whe
36  select intcol#,type,flags,lobcol,objcol,extracol,schemaoid,
37  select intcol#,type,flags,lobcol,objcol,extracol,schemaoid,
38  select intcol#,type,flags,lobcol,objcol,extracol,schemaoid,
39  select name,intcol#,segcol#,type#,length,nvl(precision#,0),d
40  select name,intcol#,segcol#,type#,length,nvl(precision#,0),d
41  select name,intcol#,segcol#,type#,length,nvl(precision#,0),d
42  select node,owner,name from syn$ where obj#=:1
43  select node,owner,name from syn$ where obj#=:1
44  select obj#,type#,ctime,mtime,stime, status, dataobj#, flags
45  select obj#,type#,ctime,mtime,stime, status, dataobj#, flags
46  select obj#,type#,ctime,mtime,stime, status, dataobj#, flags
47  select order#,columns,types from access$ where d_obj#=:1
48  select owner#,name,namespace,remoteowner,linkname,p_timestam
49  select t.ts#,t.file#,t.block#,nvl(t.bobj#,0),nvl(t.tab#,0),t
50  select text from view$ where rowid=:1
51  select timestamp, flags from fixed_obj$ where obj#=:1
52  select type#,blocks,extents,minexts,maxexts,extsize,extpct,u

After Open Pass #2
session cursor cache count 50
session cursor cache hits 280

Before Pass #3
session cursor cache count 50
session cursor cache hits 281

1  SELECT    SN.NAME STAT_NAME,    M.VALUE  FROM    V$MYSTAT M,
2  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
3  SELECT * FROM T4
4  SELECT * FROM T4
5  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
6  select /*+ rule */ bucket, endpoint, col#, epvalue from hist
7  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
8  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
9  select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt,
10  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
11  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
12  select col#, grantee#, privilege#,max(mod(nvl(option$,0),2))
13  select col#,intcol#,charsetid,charsetform from col$ where ob
14  select col#,intcol#,charsetid,charsetform from col$ where ob
15  select col#,intcol#,ntab# from ntab$ where obj#=:1 order by
16  select col#,intcol#,ntab# from ntab$ where obj#=:1 order by
17  select col#,intcol#,reftyp,stabid,expctoid from refcon$ wher
18  select col#,intcol#,reftyp,stabid,expctoid from refcon$ wher
19  select col#,intcol#,toid,version#,packed,intcols,intcol#s,fl
20  select col#,intcol#,toid,version#,packed,intcols,intcol#s,fl
21  select cols,audit$,textlength,intcols,property,flags,rowid f
22  select cols,audit$,textlength,intcols,property,flags,rowid f
23  select con#,obj#,rcon#,enabled,nvl(defer,0),spare2,spare3 fr
24  select con#,obj#,rcon#,enabled,nvl(defer,0),spare2,spare3 fr
25  select con#,obj#,rcon#,enabled,nvl(defer,0),spare2,spare3 fr
26  select con#,type#,condlength,intcols,robj#,rcon#,match#,refa
27  select con#,type#,condlength,intcols,robj#,rcon#,match#,refa
28  select con#,type#,condlength,intcols,robj#,rcon#,match#,refa
29  select distinct(-privilege#),nvl(option$,0) from sysauth$ wh
30  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
31  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
32  select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0
33  select i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,i.fla
34  select intcol#, toid, version#, intcols, intcol#s, flags, sy
35  select intcol#, toid, version#, intcols, intcol#s, flags, sy
36  select intcol#,nvl(pos#,0),col#,nvl(spare1,0) from ccol$ whe
37  select intcol#,type,flags,lobcol,objcol,extracol,schemaoid,
38  select intcol#,type,flags,lobcol,objcol,extracol,schemaoid,
39  select name,intcol#,segcol#,type#,length,nvl(precision#,0),d
40  select name,intcol#,segcol#,type#,length,nvl(precision#,0),d
41  select name,intcol#,segcol#,type#,length,nvl(precision#,0),d
42  select node,owner,name from syn$ where obj#=:1
43  select node,owner,name from syn$ where obj#=:1
44  select obj#,type#,ctime,mtime,stime, status, dataobj#, flags
45  select obj#,type#,ctime,mtime,stime, status, dataobj#, flags
46  select obj#,type#,ctime,mtime,stime, status, dataobj#, flags
47  select order#,columns,types from access$ where d_obj#=:1
48  select owner#,name,namespace,remoteowner,linkname,p_timestam
49  select t.ts#,t.file#,t.block#,nvl(t.bobj#,0),nvl(t.tab#,0),t
50  select text from view$ where rowid=:1
51  select timestamp, flags from fixed_obj$ where obj#=:1
52  select type#,blocks,extents,minexts,maxexts,extsize,extpct,u

After Open Pass #3
session cursor cache count 50
session cursor cache hits 284

Reading the above, before our newly created session even selected from table T4, the statistics indicate that somehow there were already 49 cursors in that session’s cursor cache (many of the SQL statements appeared multiple times) and there were already 240 hits on those cached cursors.  Did this session really perform more than 387 (49*3 + 240) executions of SQL statements before our first selection from T4, or does V$OPEN_CURSOR perform differently than described in the documentation?  By the time the two cursors selecting from T4 were opened, the number of cursor cache hits jumped by 39.  After the two cursors were closed (Before Pass #2) we are able to confirm that the select from table T4 was not added to the session cursor cache and there were still 49 cursors in that cache.

On the second pass, we are able to confirm that the select from table T4 is not yet in the session cursor cache.  Immediately after the two cursors selected from table T2 were opened, we see that the session experienced one more cursor cache hit, and another cursor was added to the session cursor cache.

At the start of the third pass, we see that our select from T2 is now in the session cursor cache, not once, but twice, there were 50 cursors in the session cursor cache, and there was one more session cursor cache hit.  When the two cursors selecting from table T4 were opened, we see that the session experienced three more cursor cache hits.

Let’s take a quick look at portions of the 10046 trace file that are related to the SQL statements that were executed by our script (just primarily focusing on the parse and close calls):

PARSING IN CURSOR #429272512 len=212 dep=0 uid=64 oct=3 lid=64 tim=2769436992465 hv=3255970530 ad='3ed509198' sqlid='c2a4dv3114ar2'
SELECT
  SN.NAME STAT_NAME,
  M.VALUE
FROM
  V$MYSTAT M,
  V$STATNAME SN
WHERE
  SN.NAME IN ('session cursor cache hits','session cursor cache count')
  AND SN.STATISTIC#=M.STATISTIC#
ORDER BY
  SN.NAME
END OF STMT
CLOSE #429272512:c=0,e=16,dep=0,type=0,tim=2769436996834

PARSING IN CURSOR #429272512 len=161 dep=0 uid=64 oct=3 lid=64 tim=2769437006931 hv=3838443513 ad='3eb511630' sqlid='babcumbkcmzzt'
SELECT
  SQL_TEXT
FROM
  V$OPEN_CURSOR
WHERE
  SID=
    (SELECT
       SID
     FROM
       V$MYSTAT
     WHERE
       ROWNUM=1)
ORDER BY
  SQL_TEXT
END OF STMT
CLOSE #429272512:c=0,e=19,dep=0,type=0,tim=2769437013363

PARSING IN CURSOR #429272512 len=16 dep=0 uid=64 oct=3 lid=64 tim=2769437025761 hv=1886046789 ad='3eba6b810' sqlid='49fgb3ts6pkk5'
SELECT * FROM T4
END OF STMT

PARSING IN CURSOR #441175000 len=16 dep=0 uid=64 oct=3 lid=64 tim=2769437031138 hv=1886046789 ad='3eba6b810' sqlid='49fgb3ts6pkk5'
SELECT * FROM T4
END OF STMT

PARSING IN CURSOR #440835720 len=212 dep=0 uid=64 oct=3 lid=64 tim=2769437036683 hv=3255970530 ad='3ed509198' sqlid='c2a4dv3114ar2'
SELECT
  SN.NAME STAT_NAME,
  M.VALUE
FROM
  V$MYSTAT M,
  V$STATNAME SN
WHERE
  SN.NAME IN ('session cursor cache hits','session cursor cache count')
  AND SN.STATISTIC#=M.STATISTIC#
ORDER BY
  SN.NAME
END OF STMT

CLOSE #441175000:c=0,e=11,dep=0,type=0,tim=2769437040272
CLOSE #429272512:c=0,e=8,dep=0,type=0,tim=2769437040299
CLOSE #440835720:c=0,e=6,dep=0,type=0,tim=2769437040318
=====================
PARSING IN CURSOR #429272512 len=212 dep=0 uid=64 oct=3 lid=64 tim=2769437040366 hv=3255970530 ad='3ed509198' sqlid='c2a4dv3114ar2'
SELECT
  SN.NAME STAT_NAME,
  M.VALUE
FROM
  V$MYSTAT M,
  V$STATNAME SN
WHERE
  SN.NAME IN ('session cursor cache hits','session cursor cache count')
  AND SN.STATISTIC#=M.STATISTIC#
ORDER BY
  SN.NAME
END OF STMT

CLOSE #429272512:c=0,e=13,dep=0,type=1,tim=2769437042471
=====================
PARSING IN CURSOR #441175000 len=161 dep=0 uid=64 oct=3 lid=64 tim=2769437042547 hv=3838443513 ad='3eb511630' sqlid='babcumbkcmzzt'
SELECT
  SQL_TEXT
FROM
  V$OPEN_CURSOR
WHERE
  SID=
    (SELECT
       SID
     FROM
       V$MYSTAT
     WHERE
       ROWNUM=1)
ORDER BY
  SQL_TEXT
END OF STMT
CLOSE #441175000:c=0,e=19,dep=0,type=0,tim=2769437051038
=====================
PARSING IN CURSOR #441175000 len=16 dep=0 uid=64 oct=3 lid=64 tim=2769437051137 hv=1886046789 ad='3eba6b810' sqlid='49fgb3ts6pkk5'
SELECT * FROM T4
END OF STMT

PARSING IN CURSOR #440600648 len=16 dep=0 uid=64 oct=3 lid=64 tim=2769437053863 hv=1886046789 ad='3eba6b810' sqlid='49fgb3ts6pkk5'
SELECT * FROM T4
END OF STMT

CLOSE #440600648:c=0,e=10,dep=0,type=1,tim=2769437060397
CLOSE #441175000:c=0,e=14,dep=0,type=1,tim=2769437060431
CLOSE #429272512:c=0,e=14,dep=0,type=3,tim=2769437060459
PARSE #429272512:c=0,e=15,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1338012530,tim=2769437060499
CLOSE #429272512:c=0,e=31,dep=0,type=3,tim=2769437062969
=====================
PARSING IN CURSOR #440639688 len=161 dep=0 uid=64 oct=3 lid=64 tim=2769437063067 hv=3838443513 ad='3eb511630' sqlid='babcumbkcmzzt'
SELECT
  SQL_TEXT
FROM
  V$OPEN_CURSOR
WHERE
  SID=
    (SELECT
       SID
     FROM
       V$MYSTAT
     WHERE
       ROWNUM=1)
ORDER BY
  SQL_TEXT
END OF STMT
CLOSE #440639688:c=0,e=20,dep=0,type=1,tim=2769437071488

PARSE #440600648:c=0,e=25,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2560505625,tim=2769437071547
PARSE #441175000:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2560505625,tim=2769437073533
PARSE #429272512:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1338012530,tim=2769437076014
CLOSE #441175000:c=0,e=7,dep=0,type=3,tim=2769437079257
CLOSE #440600648:c=0,e=4,dep=0,type=3,tim=2769437079281

What does the above show?  Notice that the numbers following type= on the CLOSE lines seem to change – what do those numbers mean?  If we refer to page 284 of the book “Secrets of the Oracle Database“, we find the following information:

type=0: cursor is actually closed.
type=1: cursor is added to the session cursor cache without pushing an existing cursor out of the session cursor cache.
type=2: cursor is added to the session cursor cache, in the process pushing an existing cursor out of the session cursor cache.
type=3: cursor was already in the session cursor cache

So, the above indicates that the cursors are actually added to the session cursor cache NOT during a parse call, but instead when the cursor is attempted to be explicitly closed.  The documentation describing the SESSION_CACHED_CURSORS parameter, referenced above, appears to be misleading.  It is a bit odd that our select from T4 was added to the session cursor cache twice.  It is also a bit odd that the large cursor numbers, introduced with Oracle Database 11.2.0.2 that I understand to represent the address of the cursor, were reused for different SQL statements (this might have significance in a later blog article).

Just for fun, let’s re-execute the VBS script:

CSCRIPT SessionCachedCursorsTest.vbs

This is now the script’s output:

SID: 192,  SERIAL#: 693

Before Pass #1
session cursor cache count 2
session cursor cache hits 1

1  SELECT    SN.NAME STAT_NAME,    M.VALUE  FROM    V$MYSTAT M,
2  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
3  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
4  select /*+ connect_by_filtering */ privilege#,level from sys
5  select privilege# from sysauth$ where (grantee#=:1 or grante

After Open Pass #1
session cursor cache count 4
session cursor cache hits 2

Before Pass #2
session cursor cache count 6
session cursor cache hits 3

1  SELECT    SN.NAME STAT_NAME,    M.VALUE  FROM    V$MYSTAT M,
2  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
3  SELECT * FROM T4
4  SELECT * FROM T4
5  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
6  select /*+ connect_by_filtering */ privilege#,level from sys
7  select privilege# from sysauth$ where (grantee#=:1 or grante

After Open Pass #2
session cursor cache count 6
session cursor cache hits 7

Before Pass #3
session cursor cache count 6
session cursor cache hits 8

1  SELECT    SN.NAME STAT_NAME,    M.VALUE  FROM    V$MYSTAT M,
2  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
3  SELECT * FROM T4
4  SELECT * FROM T4
5  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
6  select /*+ connect_by_filtering */ privilege#,level from sys
7  select privilege# from sysauth$ where (grantee#=:1 or grante

After Open Pass #3
session cursor cache count 6
session cursor cache hits 12

In the above, you will likely first notice that the SID and SERIAL# for our newly created session is not the same as it was earlier.  At the start of the output, before the first execution of the select from T4, we see that there were two cursors in the session’s cursor cache, and one hit on one of those two cursors – quite a change from the first execution of the script.  Immediately after opening cursors that selected from T4 we see that there were 4 cursors in the session cursor count, and 2 hits on those cursors.

Now another big change before the start of the second pass, we see that there were 6 cursors in the session cursor cache, including two selects from T4 –  previously, those cursors were added to the session cursor cache just before the start of the third pass.  After opening the two cursors selecting from T4, there were 4 more hits on the session cursor cache.

So, what are the apparent take-aways from this test case:

  • After flushing the shared pool a session’s cursor cache may contain a large number of SQL statements, duplicated in some cases, that may have only been executed a single time by the session at a depth greater than 0 (as indicated in a 10046 trace file).
  • After a SQL statement is parsed and closed 3 times the SQL statement is eligible for the session cursor cache when the SQL statement is closed for the third time.
  • If a SQL statement was previously parsed and closed 3 times by another session, the SQL statement is eligible for a different session’s cursor cache the first time that a cursor with that SQL statement is closed in the session.
  • Oracle Database 11.2.0.2’s huge cursor numbers that are written to 10046 trace files may be reused for entirely different SQL statements, and that does not necessarily indicate a cursor leak.
  • If a SQL statement is opened multiple times in a session without being closed between opens, it is quite possible that all of the cursors using the SQL statement will be added to the session cached cursors (the SQL statement will appear multiple times in V$OPEN_CURSOR) when those cursors are explicitly closed.

At this point, you are probably wondering what might happen when we open 12 cursors that select from T4.  Here is the modified script (script download, save with .vbs extension SessionCachedCursorsTest2.vbs):

Dim strSQL
Dim strSQL2
Dim strSQL3
Dim strUsername
Dim strPassword
Dim strDatabase
Dim snpData
Dim snpData2
Dim snpData3
Dim snpData4
Dim snpData5
Dim snpData6
Dim snpData7
Dim snpData8
Dim snpData9
Dim snpData10
Dim snpData11
Dim snpData12
Dim snpDataCursorCache
Dim snpDataOpenSQL
Dim snpDataSID
Dim dbDatabase
Dim lngCnt
Dim intPass
Dim objFSO
Dim objFile

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile=objFSO.CreateTextFile("C:\SessionCursorCache2.txt", True)

Set snpData = CreateObject("ADODB.Recordset")
Set snpData2 = CreateObject("ADODB.Recordset")
Set snpData3 = CreateObject("ADODB.Recordset")
Set snpData4 = CreateObject("ADODB.Recordset")
Set snpData5 = CreateObject("ADODB.Recordset")
Set snpData6 = CreateObject("ADODB.Recordset")
Set snpData7 = CreateObject("ADODB.Recordset")
Set snpData8 = CreateObject("ADODB.Recordset")
Set snpData9 = CreateObject("ADODB.Recordset")
Set snpData10 = CreateObject("ADODB.Recordset")
Set snpData11 = CreateObject("ADODB.Recordset")
Set snpData12 = CreateObject("ADODB.Recordset")
Set snpDataCursorCache = CreateObject("ADODB.Recordset")
Set snpDataOpenSQL = CreateObject("ADODB.Recordset")
Set snpDataSID = CreateObject("ADODB.Recordset")
Set dbDatabase = CreateObject("ADODB.Connection")

strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDatabase"
dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";FetchSize=5000;"

dbDatabase.Open

strSQL = "SELECT" & VBCrLf
strSQL = strSQL & "  S.SID," & VBCrLf
strSQL = strSQL & "  S.SERIAL# SERIALN" & VBCrLf
strSQL = strSQL & "FROM" & VBCrLf
strSQL = strSQL & "  V$SESSION S," & VBCrLf
strSQL = strSQL & "  (SELECT" & VBCrLf
strSQL = strSQL & "    SID" & VBCrLf
strSQL = strSQL & "  FROM" & VBCrLf
strSQL = strSQL & "    V$MYSTAT" & VBCrLf
strSQL = strSQL & "  WHERE" & VBCrLf
strSQL = strSQL & "    ROWNUM=1) M" & VBCrLf
strSQL = strSQL & "WHERE" & VBCrLf
strSQL = strSQL & "  S.SID=M.SID" & VBCrLf

snpDataSID.Open strSQL, dbDatabase
If snpDataSID.State = 1 Then
  Do While Not(snpDataSID.EOF)
    objFile.Write "SID: " & snpDataSID("sid") & ",  SERIAL#: " & snpDataSID("serialn") & VBCrLf
    snpDataSID.MoveNext
  Loop
  snpDataSID.Close
  objFile.Write " " & VBCrLf
End If

dbDatabase.Execute "ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SessionCursorCache2'"
dbDatabase.Execute "ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 1'"

strSQL2 = "SELECT" & VBCrLf
strSQL2 = strSQL2 & "  SN.NAME STAT_NAME," & VBCrLf
strSQL2 = strSQL2 & "  M.VALUE" & VBCrLf
strSQL2 = strSQL2 & "FROM" & VBCrLf
strSQL2 = strSQL2 & "  V$MYSTAT M," & VBCrLf
strSQL2 = strSQL2 & "  V$STATNAME SN" & VBCrLf
strSQL2 = strSQL2 & "WHERE" & VBCrLf
strSQL2 = strSQL2 & "  SN.NAME IN ('session cursor cache hits','session cursor cache count')" & VBCrLf
strSQL2 = strSQL2 & "  AND SN.STATISTIC#=M.STATISTIC#" & VBCrLf
strSQL2 = strSQL2 & "ORDER BY" & VBCrLf
strSQL2 = strSQL2 & "  SN.NAME"

strSQL3 = "SELECT" & VBCrLf
strSQL3 = strSQL3 & "  SQL_TEXT" & VBCrLf
strSQL3 = strSQL3 & "FROM" & VBCrLf
strSQL3 = strSQL3 & "  V$OPEN_CURSOR" & VBCrLf
strSQL3 = strSQL3 & "WHERE" & VBCrLf
strSQL3 = strSQL3 & "  SID=" & VBCrLf
strSQL3 = strSQL3 & "    (SELECT" & VBCrLf
strSQL3 = strSQL3 & "       SID" & VBCrLf
strSQL3 = strSQL3 & "     FROM" & VBCrLf
strSQL3 = strSQL3 & "       V$MYSTAT" & VBCrLf
strSQL3 = strSQL3 & "     WHERE" & VBCrLf
strSQL3 = strSQL3 & "       ROWNUM=1)" & VBCrLf
strSQL3 = strSQL3 & "ORDER BY" & VBCrLf
strSQL3 = strSQL3 & "  SQL_TEXT"

For intPass = 1 to 3
snpDataCursorCache.Open strSQL2, dbDatabase
If snpDataCursorCache.State = 1 Then
  objFile.Write "Before Pass #" & intPass & VBCrLf
  Do While Not(snpDataCursorCache.EOF)
    objFile.Write snpDataCursorCache("stat_name") & " " & snpDataCursorCache("value") & VBCrLf
    snpDataCursorCache.MoveNext
  Loop
  snpDataCursorCache.Close
  objFile.Write " " & VBCrLf
End If

snpDataOpenSQL.Open strSQL3, dbDatabase
lngCnt = 0
If snpDataOpenSQL.State = 1 Then
  Do While Not(snpDataOpenSQL.EOF)
    lngCnt = lngCnt + 1
    objFile.Write lngCnt & "  " & snpDataOpenSQL("sql_text") & VBCrLf
    snpDataOpenSQL.MoveNext
  Loop
  snpDataOpenSQL.Close
  objFile.Write " " & VBCrLf
End If

strSQL = "SELECT * FROM T4"
snpData.Open strSQL, dbDatabase
snpData2.Open strSQL, dbDatabase
snpData3.Open strSQL, dbDatabase
snpData4.Open strSQL, dbDatabase
snpData5.Open strSQL, dbDatabase
snpData6.Open strSQL, dbDatabase
snpData7.Open strSQL, dbDatabase
snpData8.Open strSQL, dbDatabase
snpData9.Open strSQL, dbDatabase
snpData10.Open strSQL, dbDatabase
snpData11.Open strSQL, dbDatabase
snpData12.Open strSQL, dbDatabase

snpDataCursorCache.Open strSQL2, dbDatabase
If snpDataCursorCache.State = 1 Then
  objFile.Write "After Open Pass #" & intPass & VBCrLf
  Do While Not(snpDataCursorCache.EOF)
    objFile.Write snpDataCursorCache("stat_name") & " " & snpDataCursorCache("value") & VBCrLf
    snpDataCursorCache.MoveNext
  Loop
  snpDataCursorCache.Close
  objFile.Write " " & VBCrLf
End If

lngCnt = 0
If snpData.State = 1 Then
  Do While Not(snpData.EOF)
    Do While (lngCnt < 1000) and Not(snpData.EOF)
      lngCnt = lngCnt + 1
      snpData.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData2.EOF)
      lngCnt = lngCnt + 1
      snpData2.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData3.EOF)
      lngCnt = lngCnt + 1
      snpData3.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData4.EOF)
      lngCnt = lngCnt + 1
      snpData4.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData5.EOF)
      lngCnt = lngCnt + 1
      snpData5.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData6.EOF)
      lngCnt = lngCnt + 1
      snpData6.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData7.EOF)
      lngCnt = lngCnt + 1
      snpData7.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData8.EOF)
      lngCnt = lngCnt + 1
      snpData8.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData9.EOF)
      lngCnt = lngCnt + 1
      snpData9.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData10.EOF)
      lngCnt = lngCnt + 1
      snpData10.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData11.EOF)
      lngCnt = lngCnt + 1
      snpData11.MoveNext
    Loop
    lngCnt = 0

    Do While (lngCnt < 1000) and Not(snpData12.EOF)
      lngCnt = lngCnt + 1
      snpData12.MoveNext
    Loop
    lngCnt = 0
  Loop
End If

snpData.Close
snpData2.Close
snpData3.Close
snpData4.Close
snpData5.Close
snpData6.Close
snpData7.Close
snpData8.Close
snpData9.Close
snpData10.Close
snpData11.Close
snpData12.Close

Next

dbDatabase.Execute "ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'"
objFile.Close

Set snpData = Nothing
Set snpData2 = Nothing
Set snpData3 = Nothing
Set snpData4 = Nothing
Set snpData5 = Nothing
Set snpData6 = Nothing
Set snpData7 = Nothing
Set snpData8 = Nothing
Set snpData9 = Nothing
Set snpData10 = Nothing
Set snpData11 = Nothing
Set snpData12 = Nothing
Set snpDataCursorCache = Nothing
Set snpDataOpenSQL = Nothing
Set snpDataSID = Nothing

dbDatabase.Close
Set dbDatabase = Nothing
Set objFile = Nothing
Set objFSO = Nothing

The following shows the output that I received on the first execution (the shared pool was not flushed prior to executing the script):

SID: 159,  SERIAL#: 5853

Before Pass #1
session cursor cache count 7
session cursor cache hits 1

1  SELECT    S.SID,    S.SERIAL# SERIALN  FROM    V$SESSION S,
2  SELECT    SN.NAME STAT_NAME,    M.VALUE  FROM    V$MYSTAT M,
3  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
4  SELECT PARAMETER, VALUE FROM SYS.NLS_DATABASE_PARAMETERS WHE
5  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
6  select /*+ connect_by_filtering */ privilege#,level from sys
7  select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('U
8  select decode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECON
9  select privilege# from sysauth$ where (grantee#=:1 or grante
10  select value$ from props$ where name = 'GLOBAL_DB_NAME'

After Open Pass #1
session cursor cache count 9
session cursor cache hits 2

Before Pass #2
session cursor cache count 21
session cursor cache hits 3

1  SELECT    S.SID,    S.SERIAL# SERIALN  FROM    V$SESSION S,
2  SELECT    SN.NAME STAT_NAME,    M.VALUE  FROM    V$MYSTAT M,
3  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
4  SELECT * FROM T4
5  SELECT * FROM T4
6  SELECT * FROM T4
7  SELECT * FROM T4
8  SELECT * FROM T4
9  SELECT * FROM T4
10  SELECT * FROM T4
11  SELECT * FROM T4
12  SELECT * FROM T4
13  SELECT * FROM T4
14  SELECT * FROM T4
15  SELECT * FROM T4
16  SELECT PARAMETER, VALUE FROM SYS.NLS_DATABASE_PARAMETERS WHE
17  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
18  select /*+ connect_by_filtering */ privilege#,level from sys
19  select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('U
20  select decode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECON
21  select privilege# from sysauth$ where (grantee#=:1 or grante
22  select value$ from props$ where name = 'GLOBAL_DB_NAME'

After Open Pass #2
session cursor cache count 21
session cursor cache hits 17

Before Pass #3
session cursor cache count 21
session cursor cache hits 18

1  SELECT    S.SID,    S.SERIAL# SERIALN  FROM    V$SESSION S,
2  SELECT    SN.NAME STAT_NAME,    M.VALUE  FROM    V$MYSTAT M,
3  SELECT    SQL_TEXT  FROM    V$OPEN_CURSOR  WHERE    SID=   
4  SELECT * FROM T4
5  SELECT * FROM T4
6  SELECT * FROM T4
7  SELECT * FROM T4
8  SELECT * FROM T4
9  SELECT * FROM T4
10  SELECT * FROM T4
11  SELECT * FROM T4
12  SELECT * FROM T4
13  SELECT * FROM T4
14  SELECT * FROM T4
15  SELECT * FROM T4
16  SELECT PARAMETER, VALUE FROM SYS.NLS_DATABASE_PARAMETERS WHE
17  insert into sys.aud$( sessionid,entryid,statement,ntimestamp
18  select /*+ connect_by_filtering */ privilege#,level from sys
19  select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('U
20  select decode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECON
21  select privilege# from sysauth$ where (grantee#=:1 or grante
22  select value$ from props$ where name = 'GLOBAL_DB_NAME'

After Open Pass #3
session cursor cache count 21
session cursor cache hits 32

Is that what you expected to see?

We still have one remaining question – is a larger shared pool required when increasing the SESSION_CACHED_CURSORS parameter (the parameter was set to the default value of 50 for this blog article).  To help you decide, refer to the following:

AskTom Article

Blog article written by Jonathan Lewis, specifically this comment:

“Every individiual holding a cursor open has an entry in x$kgllk – which is in the SGA – and these entries seem to be 172 bytes long in 10g (152 in 9i). So, clearly, if you hold more cursors open, you will be using more memory for these structures.”








Follow

Get every new post delivered to your Inbox.

Join 139 other followers