Install the Oracle Client on a Synology DiskStation DS1813+ or DS412+ for Nagios to Monitor Databases

26 07 2013

July 26, 2013

(Back to the Previous Post in the Series)

In the previous article of this series I provided steps to install and run the Nagios network monitoring utility on either a Synology DiskStation DS1813+ or a DS412+ for the purpose of pinging network devices to verify that the devices respond to ping requests.  That article may have seemed a bit out of place on a blog that is primarily intended to include notes about using Oracle Database.  In the previous article I hinted that there is a check_oracle plugin for Nagios, suggesting that a Synology DiskStation DS1813+ or DS412+ could be used to monitor Oracle databases.

This blog article does NOT use the check_oracle plugin.  There are a small handful of official Nagios plugins for Oracle Database, but this article does not use any of those plugins either.  Instead, I took a chance at crafting a custom Perl script to monitor Oracle databases, the first time I have ever used Perl.  I thought back to some of the earlier blog articles where I introduced a couple of Oracle Database monitoring scripts that were written in VBScript, including Working with Oracle’s Time Model Data 3 and Oracle Statistics Chart Viewer - I started seeing the possibilities of using Nagios to alert DBAs about potential issues.  Yet, I have never used Perl, and at the time did not even have a clue how to concatenate strings in the language (it turns out that there are more than two techniques).

Verify that Perl is installed on the DiskStation by accessing the Package Center in the DiskStation’s DSM web interface.  If there is an Install button under the Perl heading, that indicates that Perl is not yet installed – click the Install button and wait until that button changes to Installed.

InstallOracleClient-1

Let’s start by installing the Oracle Client on the DiskStation.  I will use the Oracle Instant Client version 11.2.0.3 (the 12.1.0.1 Instant Client throws a different error message than did the 11.2.0.3 and 11.1.0.7 clients, so I reverted back to the 11.2.0.3 version).  To download the Oracle Instant Client for Linux, visit this link.  You will need an OTN account to download the client files – I downloaded the files using a desktop computer, and then transferred the files to a custom created Config share that I set up on the DiskStation.  This article requires the Basic Instant Client package and the SQL*Plus add-on package for the Instant Client.

Connect to the DiskStation using Telnet as the root user (see the previous article for Telnet connection directions).  For consistency, the .zip files for the Oracle Instant Client will be copied to the downloads directory that was created in the previous article.  We will create an oracle directory in the downloads directory:

mkdir /volume1/downloads/oracle/

Next, the downloaded files for the Oracle Instant Client are copied from the config share that I created on the DiskStation (to allow transporting files from a desktop computer) to the /volume1/downloads/oracle directory that was just created, and then the Oracle Instant Client files are unzipped, which automatically creates the instantclient_11_2 directory:

cp /volume1/config/instantclient-basic-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cp /volume1/config/instantclient-sqlplus-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cd /volume1/downloads/oracle/
unzip -o instantclient-basic-linux-11.2.0.3.0.zip
unzip -o instantclient-sqlplus-linux-11.2.0.3.0.zip
cd instantclient_11_2

The Oracle Instant Client 11.2.0.3 (and 11.1.0.7) require a file named libaio.so.1 that is not installed by default on the DiskStation.  After searching the Internet for a while, I found a copy of that file in the /lib directory on a Red Hat Enterprise Linux 3 server.  Using a desktop computer, I copied that file to the DiskStation’s config share, as well as a copy of the tnsnames.ora file that includes the database connection information for the databases that must be monitored.  The libaio.so.1 file will be placed in the /volume1/downloads/oracle/instantclient_11_2 directory, and the tnsnames.ora file will be placed in the /etc directory (one of the directories that is automatically searched).

cp /volume1/config/libaio.so.1 .
cp /volume1/config/tnsnames.ora /etc

Let’s start up SQL*Plus:

./sqlplus /nolog
./sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory

An error…

ls
BASIC_README    adrci   glogin.sql   libclntsh.so.11.1  libocci.so.11.1  libocijdbc11.so  libsqlplusic.so  ojdbc6.jar  sqltest.sql  xstreams.jar
SQLPLUS_README  genezi  libaio.so.1  libnnz11.so        libociei.so      libsqlplus.so    ojdbc5.jar       sqlplus     uidrvci

The file mentioned in the error message exists in the correct directory.  That error is caused by an undefined environment variable.  Fixing the error and trying again:

LD_LIBRARY_PATH="/volume1/downloads/oracle/instantclient_11_2"
export LD_LIBRARY_PATH
./sqlplus /nolog

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 26 13:22:16 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL>

Try to connect to one of the databases that are defined in the tnsnames.ora file, and then try executing a simple SQL statement before exiting SQL*Plus (change testuser, password, and DBName as appropriate for your environment):

CONNECT testuser/password@DBName
Connected.
SELECT SYSDATE FROM DUAL;

SYSDATE
---------
26-JUL-13

EXIT

Now that we have verified that the Instant Client version of SQL*Plus works from the DiskStation, we will create a simple SQL script to verify that the Instant Client version of SQL*Plus is able to accept script names from a command line.  The script will be created in the same directory where the Oracle Instant Client is located (see the basic directions for using vi, as found in the previous article):

vi /volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The first line in the script instructs SQL*Plus to abort the execution of the script when an error is encountered; if the connection attempt fails, there is no point in attempting to execute any SQL statements that follow.  The second line connects to the database (change testuser, password, and DBName as appropriate for your environment).  The third line executes a simple SQL statement, selecting from a table (rather than the virtual table DUAL) – change the SQL statement to a valid statement for your database.  The last statement exits SQL*Plus once the SQL statement finishes executing:

WHENEVER SQLERROR EXIT SQL.SQLCODE
CONNECT testuser/password@DBName
SELECT ID, DESCRIPTION FROM T1 WHERE ID LIKE '8X%';
EXIT

Try using SQL*Plus to execute the script:

./sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The results from the SQL statement should display in the Telnet window, and then the normal DiskStation prompt should appear.  If the database instance was not running at the time of the execution, you would see something like this rather than the results of the SQL statement:

ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Process ID: 0
Session ID: 0 Serial number: 0

Nagios apparently expects its plugins to return one of four exit codes, as explained in this article, to determine whether or not the checked object is behaving as expected:

Exit Code Status
        0 OK
        1 WARNING
        2 CRITICAL
        3 UNKNOWN

The exit function in Perl permits returning a number, so that is one problem solved.  I want the Perl plugin to accept a command line parameter from Nagios so that I am able to check more than one database using the same Perl script – the Nagios command line parameter will specify the script to execute.  This article explains how to receive a command line parameter from Nagios, so that is a second problem that is solved.

Let’s put together a simple (OK, not simple for me) script that executes a SQL*Plus script that is specified by Nagios (the sqltest.sql file that was just created), and then returns 0 if no ORA- type error message is returned when the Oracle Instant Client’s SQL*Plus executes a script.  If an ORA- type error message is returned, then the script’s exit code is set to 2 and the ORA- errors are returned to Nagios.  Nagios apparently only accepts one line of output from the script, reading what is written by Perl’s print command.  As we saw earlier, the script may return multiple nested ORA- type errors if the database instance is down, so we need these multiple error messages to appear on one line of output from the Perl script.

vi /opt/libexec/check_oracle.pl

This is the script that I crafted after spending, quite literally, hours days searching the Internet for Perl command syntax that would execute on the DiskStation:

#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(:config no_ignore_case);
my $output = "";
my ($script);

my $result = GetOptions(
 "s|script=s"         => \$script,
  );

$ENV{"LD_LIBRARY_PATH"} = "/volume1/downloads/oracle/instantclient_11_2/";

my $ExitValue = 0;
print "Test Script: " . $script . ":  ";
open my $in, "/volume1/downloads/oracle/instantclient_11_2/sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/$script |";

while (my $line = <$in>)
  {
  if ($line =~ /^(ORA-\d{5})/)
    {
    #$line =~ s/\v//g;
    chomp($line);
    $output = $output . $line . " ; ";
    $ExitValue = 2;
    }
  }

close($in);
print $output;
exit($ExitValue);

The script defines the LD_LIBRARY_PATH environment variable, so the Nagios user should (hopefully) have no problems executing the Perl script.  There are apparently a half-dozen different ways to remove the end of line characters from the SQL*Plus output, chomp($line); worked for my test, but $line =~ s/\v//g; did not work quite as expected.

Save the script and exit vi.  Next, we need to make the Perl script executable:

chmod +x /opt/libexec/check_oracle.pl

In the earlier article we had to modify a couple of the Nagios configuration files – this time we will need to modify some of the same files, starting with the commands.cfg file.

vi /opt/etc/objects/commands.cfg

Locate the ‘check_ping’ command definition that we modified in the earlier article.  Below the } character for that command definition, add the following, which will tell Nagios that our Perl script exists, and that the script command should be recognized by Nagios as check_oracle_ch:

# check_oracle_ch command definition by Charles Hooper
define command{
        command_name    check_oracle_ch
        command_line    $USER1$/check_oracle.pl -s $ARG1$
        }

Save the commands.cfg file and exit vi.  Let’s create another Nagios configuration file that will be used to list the Oracle Databases (and their scripts) that will be checked:

vi /opt/etc/objects/oracle.cfg

On a new line in the oracle.cfg file, add the following:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server123       ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST   ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest.sql    ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Additionally, we need to instruct Nagios that there is now an additional configuration file to read when starting.  Save the oracle.cfg file and exit vi.  Execute the following command:

vi /opt/etc/nagios.cfg

Below the cfg_file=/opt/etc/objects/switch.cfg line in the file (added in the previous article), add the following line:

cfg_file=/opt/etc/objects/oracle.cfg

Save the nagios.cfg file and exit vi.  Since we have modified the Nagios configuration, we must restart Nagios.  Find the first Nagios process ID, then kill that process (as was described in the previous article):

ps

kill 2683

Verify that Nagios does not return an error message when reading the configuration files:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are reported, then start Nagios:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Nagios should be writing log entries to a file name nagios.log (as defined in the nagios.cfg file).  The tail command permits viewing the last few (100 in this case) lines from a specified file:

tail -n 100 /opt/var/nagios.log

If Perl is not installed, Nagios may send an email with the following in the Additional Info section:

(Return code of 127 is out of bounds - plugin may be missing)

If the database instance is down, Nagios should send an email, possibly with the following in the Additional Info section (note that the script name is listed first – a different script will be used to test each database):

Test Script: sqltest.sql:  ORA-01034: ORACLE not available : ORA-27101: shared memory realm does not exist :

When the database instance is running, and the SQL statement(s) in the script executes successfully for the first time, Nagios should send an email with just the test script listed in the Additional Info section:

Test Script: sqltest.sql:

Now that the we have confirmed that Nagios and the Perl script work OK together, create additional SQL scripts in the /volume1/downloads/oracle/instantclient_11_2/ directory to connect to the other databases to be monitored (strongly consider changing the permissions for these files, since the files include usernames and passwords for your databases).  For each of those SQL scripts, create an additional service entry in the oracle.cfg file.  For example, if a script named sqltest42.sql is created to test a database on server42, the service entry might be created as shown below:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server42        ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST42 ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest42.sql  ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Don’t forget to verify the nagios configuration before restarting Nagios.

Anyone up to the challenge of rewriting my Working with Oracle’s Time Model Data 3 script in Perl so that the script will work on the DiskStation?








Follow

Get every new post delivered to your Inbox.

Join 137 other followers