On the Topic of Technology… 6

16 03 2014

March 16, 2014

(Back to the Previous Post in the Series)

It has been a while since my last post on this blog – I guess that the simple answer is that I was busy with a lot of non-Oracle Database related items, and was suffering from a bit of a writer’s block (nothing that a block dump can’t fix?).  I am expecting to soon receive the annual bill from WordPress for keeping this blog free of advertisements, as well as a bill for allowing the customized blog theme.

So, given the number of months since my last blog post,  I took the time to update the list of the top five most viewed articles for the past quarter.  The number one article shows how to install the Nagios network monitoring software on a Synology NAS (actually three different Synology NAS units), which means that a low cost NAS unit could be used to not only verify that a server used with Oracle Database responds to a ping request, but also that an Oracle database is reachable and healthy enough to provide a resultset for a simple SQL statement.  The number two article shows how to do a little mathematics with the help of Oracle Database, approximating the distance between two longitude and latitude coordinates.  The number three article shows how to use a programming language that was last updated in the late 1990s with the latest Microsoft operating system and what was the latest version of Oracle Database.

The advancement of technology certainly means that it is important for IT professionals to try staying on top of the advancements in their technology niche, without completely cutting ties with technology of the past, as illustrated by the current number three article on this blog.  For me, that means buying and then reading cover to cover various books, reading articles, and experimenting with technology.  It helps that I am an IT manager in addition to being an Oracle DBA, so my technology niche is rather broad.  In December 2013 I placed an order for the updated version of “Troubleshooting Oracle Performance“, in part because I enjoyed the first version of that book so much that I read it twice, and also because I have not had sufficient time to experiment with Oracle Database 12c – it appears that the second edition might ship next month.  Someone recently left a comment on another book that I reviewed here and on Amazon – I tried ordering that book twice without success, and now there is apparently a new version of the book on Amazon that includes coverage of Oracle Database 12c, and the book is in stock!  Someone will have to spend the $56, write a review, and let me know if the author fixed the items that I and readers of this blog so patiently and clearly mentioned in 2010.  Anyone interested in the challenge?

As I mentioned, the scope of my job responsibilities extends far beyond that of Oracle Database.  I just recently migrated the company’s email system from Microsoft Exchange 2007 to Microsoft Exchange 2013 SP1.  Anyone who remembers the fun of typing cryptic code on a command line would enjoy this experience.  Simply moving the public folders from the old server to the new server was an excellent example of command line fun, reminding me of the fun that I had years ago trying to compile X.509 certificate support into a Linux kernel.  One book that I read and reviewed was extensively detailed on the topic of public folders, yet the commands that were found in the book failed to execute without returning an error message at step 1.  The other book that I read and reviewed more or less skimmed the topic of public folders, so it was of no help for the task at hand.  No problem, I will just go to the source, Microsoft, for the solution.  A recent article on Microsoft’s site clearly listed all of the steps required to move the public folders from Exchange Server 2007 to Exchange Server 2013… all except for one very important step.  So, I am running command after command on the servers trying to move the public folders from the one server to the next, only having a partial idea of what these commands are doing.  Everything is going great, until I execute the last command listed here:

Get-PublicFolder -Recurse | Export-CliXML C:\PFMigration\Legacy_PFStructure.xml
Get-PublicFolderStatistics | Export-CliXML C:\PFMigration\Legacy_PFStatistics.xml
Get-PublicFolder -Recurse | Get-PublicFolderClientPermission | Select-Object Identity,User -ExpandProperty AccessRights | Export-CliXML C:\PFMigration\Legacy_PFPerms.xml
Get-PublicFolderDatabase | ForEach {Get-PublicFolderStatistics -Server $_.Server | Where {$_.Name -like "*\*"}}
Set-PublicFolder -Identity <public folder identity> -Name <new public folder name>
Get-OrganizationConfig | Format-List PublicFoldersLockedforMigration, PublicFolderMigrationComplete
Set-OrganizationConfig -PublicFoldersLockedforMigration:$false -PublicFolderMigrationComplete:$false
Get-PublicFolderMigrationRequest | Remove-PublicFolderMigrationRequest -Confirm:$false
Get-Mailbox -PublicFolder 
Get-PublicFolder
Get-Mailbox -PublicFolder | Where{$_.IsRootPublicFolderMailbox -eq $false} | Remove-Mailbox -PublicFolder -Force -Confirm:$false
Get-Mailbox -PublicFolder | Remove-Mailbox -PublicFolder -Force -Confirm:$false
.\Export-PublicFolderStatistics.ps1 <Folder to size map path> <FQDN of source server>
...

Spot the error?  Why is this server telling me that I need to provide a comma separated list of parameters when I execute the Export-PublicFolderStatistics.ps1 script?  So, I submit the script again with commas separating the parameters – no the same error is returned.  Must be a problem where I need to specify the parameters in double quotes also – no the same error is returned.  What the four letter word?  That is right, the return of trying to compile X.509 certificate support into the Linux kernel roughly a decade ago, only now on Microsoft’s premium messaging platform.

So, what is the missing step?  Exchange Server 2007 ships with Microsoft PowerShell 1.0 – this command requires Microsoft PowerShell 2.0 to execute, yet that requirement was never mentioned.  Oh yeah, we forgot a step, get over it – you have another set of 10 cryptic commands to enter – only to be greeted with a failure message during the public folder migration, stating that the migration failed because some folder name that once existed on Microsoft Exchange 5.5 contains a character that is now considered an invalid character in its name.  These problems never happen with an upgrade in the Oracle Database world, do they?  Advancement of technology, or Back to the Command Line.

I have also spent a bit of time experimenting with IP security cameras.  I put one in my vehicle and went for a drive.  Ah, 1969, someone obviously has not finished compiling the time saving feature into the camera’s firmware? (Click the picture for a larger view.)

NC-239WF-LicensePlateTest-Mod

-

Let’s try a different stop light – these two cars are either turning the wrong direction (obviously an indication of a bug in the camera’s firmware), or are running a red light. (Click the picture for a larger view.)

NC-239WF-RedLightTest

The camera did not pick up much interesting in the vehicle, so I set it up just in time to catch a game of what appears to be football… or maybe it was a game of sock-her? (Click the picture for a larger view.)

NC-239WF-DeerFightTest

Technology is fun, except when it hit you in the nose.





Oracle Enqueue Rules Leading to Deadlock Changes Again in 12.1?

1 08 2013

August 1, 2013

Four years ago while co-writing two chapters for an Oracle Database book, an interesting test case was put together to demonstrate how enqueues are handled when multiple sessions are competing for some of the same database resources – the test case was intended to demonstrate the neat, orderly, and predictable process implemented by Oracle Database.  Oracle Database 10.2.0.1 through 10.2.0.4 behaved essentially the same when executing the test case script, while Oracle Database 11.1.0.6, 11.1.0.7, and 11.2.0.1 behaved differently when executing the test case script (11.2.0.2 and 11.2.0.3 behave just like 11.2.0.1 when executing the script).

Oracle Database 11.1.0.x and 11.2.0.x terminated the test case early with a deadlock error.  Jonathan Lewis offered an exceptional explanation of what happened with my test case script in his Lock Horror article.  He also mentioned a couple of interesting notes in comments attached to my article, one of which stated that Oracle Database 9.2.0.5 (or 9.2.0.4) had also introduced a change in enqueue rules related to foreign keys.  Richard Foote put together a related article that explained the changes in locking mode with foreign keys in his Oracle11g: New Locking Modes When Policing FK Constraints article.  Trying not to forget related articles (see the note at the end of this blog article), Tom Kyte also authored a blog article titled Something I recently unlearned… that is somewhat related to the test case script that appeared in the Expert Oracle Practices book.  So, the rules have changed, deadlocks should now be expected when executing the test case script.

I just started experimenting with Oracle Database 12.1.0.1.  My very first experiment with the new database version was to test the behavior of the enqueue script to verify that it results in a deadlock on Oracle Database 12.1.0.1.  Consistency is one of the key factors in a database application – if an application executes:

SELECT 1 + 1 FROM DUAL;

a developer should be reasonably certain that Oracle Database will either return the value 2 or the binary number 10 when the above SQL statement is executed, regardless of whether Oracle Database 6.x or Oracle Database 12.x is used by an application written by the developer.  The same developer should have a reasonable expectation that his application, if it is compatible with Oracle Database 9.0.1, should work pretty much the same way on Oracle Database 12.1.0.1.  But, what happens when Oracle Corp. fixes bugs in Oracle Database?

Consistency is important, but it is also important to recognize that behavior may change consistently.

Let’s reintroduce the test case script from my earlier article (which also appeared in the Expert Oracle Practices book).  Executing this script requires four sessions, each of which is logged in as a different user (to minimize the difficulty associated with deciphering the results of the SQL statements):

  • Session 1 connected as TESTUSER (the owner of the tables)
  • Session 2 connected as USER2
  • Session 3 connected as USER3
  • Session 4 connected as SYS (only to query the various performance views)

First, we need a couple of test tables with a declared foreign key relationship, without an index on the foreign key column.  In Session 1:

CREATE TABLE T1(
  C1 NUMBER(10) PRIMARY KEY);

INSERT INTO
  T1
SELECT
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

CREATE TABLE T2(
  C1 NUMBER(10) PRIMARY KEY,
  C2 NUMBER(10),
  CONSTRAINT FK_T1_C1 FOREIGN KEY(C2) REFERENCES T1(C1) ENABLE);

INSERT INTO
  T2
SELECT
  ROWNUM,
  ROWNUM
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;

COMMIT;

GRANT ALL ON T1 TO PUBLIC;
GRANT ALL ON T2 TO PUBLIC;

CREATE PUBLIC SYNONYM T1 FOR T1;
CREATE PUBLIC SYNONYM T2 FOR T2;

In Session 1, insert a row into the parent and child tables without issuing a commit:

INSERT INTO T1 VALUES(100010);
INSERT INTO T2 VALUES(100010,100010);

In Session 2, insert a row into the parent and child tables, delete a row in the child table so that the primary key column value of the associated row in the parent table may be modified, and modify the parent row’s primary key column value:

INSERT INTO T1 VALUES(100020);
INSERT INTO T2 VALUES(100020,100020);
DELETE FROM T2 WHERE C1=50;
UPDATE T1 SET C1=100030 WHERE C1=50;

(Session 2 is hung)

Session 3 cannot see that a row was inserted by the now hung session 2, so session 3 attempts to insert a row with the same primary key value.  In Session 3:

INSERT INTO T1 VALUES(100020);

(Session 3 is hung)

The enqueues at this point on Oracle Database 10.2.0.4 differs from those found on Oracle Database 11.1.0.6, and (for consistency) those enqueues differ from what are found on Oracle Database 12.1.0.1.

In Session 4, let’s take a look at the enqueues (note that the join to the V$SESSION_WAIT performance view is unnecessary starting with Oracle Database 10.1.0.x, however I left that join in place to allow easy adaptation of the SQL statement so that it may work with older Oracle Database versions).

SET LINESIZE 165
SET PAGESIZE 1000
SET TRIMSPOOL ON
COLUMN SID FORMAT 9999
COLUMN PROGRAM FORMAT A11
COLUMN USERNAME FORMAT A8
COLUMN LMODE FORMAT 99
COLUMN REQUEST FORMAT 99
COLUMN CN FORMAT 99
COLUMN EVENT FORMAT A30
COLUMN STATE FORMAT A8
COLUMN S_I_W FORMAT 9999
COLUMN WT FORMAT 9999
COLUMN OBJ# FORMAT 999999
COLUMN FILE# FORMAT 999
COLUMN BLOCK# FORMAT 9999999
COLUMN P2 FORMAT 9999999
COLUMN P3 FORMAT 999999

SELECT
  S.SID,
  S.USERNAME,
  S.PROGRAM,
  S.STATUS,
  SW.EVENT,
  SW.WAIT_TIME WT,
  SW.STATE,
  SW.SECONDS_IN_WAIT S_I_W,
  S.SQL_ID,
  S.SQL_CHILD_NUMBER CN,
  S.ROW_WAIT_OBJ# OBJ#,
  S.ROW_WAIT_FILE# FILE#,
  S.ROW_WAIT_BLOCK# BLOCK#,
  S.ROW_WAIT_ROW# ROW#,
  SW.P1,
  SW.P2,
  SW.P3
FROM
  V$SESSION_WAIT SW,
  V$SESSION S
WHERE
  S.USERNAME IS NOT NULL
  AND SW.SID=S.SID
  AND SW.EVENT NOT LIKE '%SQL*Net%'
  AND SW.EVENT NOT IN ('Streams AQ: waiting for messages in the queue', 'wait for unread message on broadcast channel');

  SID USERNAME PROGRAM     STATUS   EVENT                             WT STATE    S_I_W SQL_ID         CN    OBJ# FILE#   BLOCK# ROW#         P1       P2      P3
----- -------- ----------- -------- ------------------------------ ----- -------- ----- ------------- --- ------- ----- -------- ---- ---------- -------- -------
  184 USER2    sqlplus.exe ACTIVE   enq: TM - contention               0 WAITING     89 4rtg0hv0atfkx   0      -1     0        0    0 1414332421    91943       0
  243 USER3    sqlplus.exe ACTIVE   enq: TX - row lock contention      0 WAITING     49 cv338j6z2530g   0      -1     0        0    0 1415053316    65548    1531

For comparison, here is the output of the same SQL statement on 11.1.0.7 from the earlier blog article:

__SID USERNAME PROGRAM     STATUS   EVENT                             WT STATE    S_I_W SQL_ID         CN    OBJ# FILE#   BLOCK# ROW#         P1       P2      P3
----- -------- ----------- -------- ------------------------------ ----- -------- ----- ------------- --- ------- ----- -------- ---- ---------- -------- -------
  307 USER2    sqlplus.exe ACTIVE   enq: TM - contention               0 WAITING    422 4rtg0hv0atfkx   0      -1     0        0    0 1414332421    82913       0
  314 USER3    sqlplus.exe ACTIVE   enq: TM - contention               0 WAITING    407 cv338j6z2530g   0      -1     0        0    0 1414332419    82913       0

For comparison, here is the output of the same SQL statement on 10.2.0.4 from the earlier blog article:

__SID USERNAME PROGRAM     STATUS   EVENT                             WT STATE    S_I_W SQL_ID         CN    OBJ# FILE#   BLOCK# ROW#         P1       P2      P3
----- -------- ----------- -------- ------------------------------ ----- -------- ----- ------------- --- ------- ----- -------- ---- ---------- -------- -------
  204 USER2    sqlplus.exe ACTIVE enq: TM - contention                 0 WAITING    213 4rtg0hv0atfkx   0      -1     0        0    0 1414332421    16472       0
  217 USER3    sqlplus.exe ACTIVE enq: TM - contention                 0 WAITING    201 cv338j6z2530g   0      -1     0        0    0 1414332418    16472       0

… USER3 on 12.1.0.1 seems to not be waiting on a table level (TM) enqueue.

In Session 4, let’s try the second SQL statement to examine the lock modes:

SELECT
  S.SID,
  S.USERNAME,
  S.PROGRAM,
  S.SQL_ID,
  S.SQL_CHILD_NUMBER CN,
  S.ROW_WAIT_OBJ#,
  S.ROW_WAIT_FILE#,
  S.ROW_WAIT_BLOCK#,
  S.ROW_WAIT_ROW#,
  L.LMODE,
  L.REQUEST,
  L.ID1,
  L.ID2,
  L.TYPE,
  L.BLOCK
FROM
  V$LOCK L,
  V$SESSION S
WHERE
  (L.ID1, L.ID2, L.TYPE) IN
    (SELECT
      ID1,
      ID2,
      TYPE
    FROM
      V$LOCK
    WHERE
      REQUEST > 0)
  AND L.SID=S.SID;

  SID USERNAME PROGRAM     SQL_ID         CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST        ID1        ID2 TY    BLOCK
----- -------- ----------- ------------- --- ------------- -------------- --------------- ------------- ----- ------- ---------- ---------- -- --------
  127 TESTUSER sqlplus.exe                              -1              0               0             0     3       0      91943          0 TM        1
  184 USER2    sqlplus.exe 4rtg0hv0atfkx   0            -1              0               0             0     3       5      91943          0 TM        1
  184 USER2    sqlplus.exe 4rtg0hv0atfkx   0            -1              0               0             0     6       0      65548       1531 TX        1
  243 USER3    sqlplus.exe cv338j6z2530g   0            -1              0               0             0     2       0      91943          0 TM        0
  243 USER3    sqlplus.exe cv338j6z2530g   0            -1              0               0             0     0       4      65548       1531 TX        0

For comparison, here is the output of the same SQL statement on 11.1.0.7 from the earlier blog article:

__SID USERNAME PROGRAM     SQL_ID         CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST        ID1        ID2 TY    BLOCK
----- -------- ----------- ------------- --- ------------- -------------- --------------- ------------- ----- ------- ---------- ---------- -- --------
  320 TESTUSER sqlplus.exe 0vbusv12hnbk6   0         12517              1           29656             0     3       0      82913          0 TM        1
  307 USER2    sqlplus.exe 4rtg0hv0atfkx   0            -1              0               0             0     3       5      82913          0 TM        1
  314 USER3    sqlplus.exe cv338j6z2530g   0            -1              0               0             0     0       3      82913          0 TM        0

For comparison, here is the output of the same SQL statement on 10.2.0.4 from the earlier blog article:

__SID USERNAME PROGRAM     SQL_ID         CN ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW# LMODE REQUEST        ID1        ID2 TY    BLOCK
----- -------- ----------- ------------- --- ------------- -------------- --------------- ------------- ----- ------- ---------- ---------- -- --------
  213 TESTUSER sqlplus.exe 0vbusv12hnbk6   0             0              2             799             0     3       0      16472          0 TM        1
  204 USER2    sqlplus.exe 4rtg0hv0atfkx   0            -1              0               0             0     3       5      16472          0 TM        0
  217 USER3    sqlplus.exe cv338j6z2530g   0            -1              0               0             0     0       2      16472          0 TM        0

So, on Oracle Database 12.1.0.1, USER3 is not waiting on a table level (TM) equeue (it is holding a level 2 TM enqueue, while on 10.2.0.4 that session was trying to acquire a level 2 TM lock, and on 11.1.0.7 that session was trying to acquire a level 3 TM lock), it is instead waiting on a row level (TX – transaction) enqueue that is held in exclusive mode (level 6) by USER2.  Quite honestly, that result is similar to what I was hoping to see four years ago when the test script was put together, because USER3 really is waiting due to a potential primary key collision, if session 2 issues a commit.

Session 2 is hung, so it cannot issue a commit.  So, in Session 1:

COMMIT;

Session 2 is no longer hung, and now shows:

1 row updated.

Session 3, of course, remains hung waiting for Session 2 to either issue a COMMIT or a ROLLBACK to resolve the potential primary key collision.

For comparison, Session 3 on 11.1.0.7 from the earlier blog article showed the following:

INSERT INTO T1 VALUES(100020)
            *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

… and Session 2 showed the following on 11.1.0.7:

1 row updated.

Is it possible that the Session 2 could have displayed a deadlock message, rather than Session 3 showing a deadlock message?  I seem to remember reading that there is a good chance that the session that has been in the enqueue the longest could be selected as the “vicitim” of the deadlock, but I might be incorrectly remembering that rule.  Oracle Database really did not completely resolve the deadlock, nor did it “kill” session 3 (as too many books and blog article state would happen) when the deadlock message appeared in that session’s SQL*Plus session.

For comparison, Oracle Database 10.2.0.4 behaved similar to Oracle Database 12.1.0.1, with Session 2 showing 1 row updated, and Session 3 remaining hung.  The more things change, the more they stay the same?

Thoughts?  Curious about the test cases found on the web pages that I linked to earlier?  Does Oracle Database 12.1.0.1 behave more like 11.2.0.3 or 10.2.0.4 when foreign key columns are not properly indexed?





Install the Oracle Client on a Synology DiskStation DS1813+ or DS412+ for Nagios to Monitor Databases

26 07 2013

July 26, 2013

(Back to the Previous Post in the Series)

In the previous article of this series I provided steps to install and run the Nagios network monitoring utility on either a Synology DiskStation DS1813+ or a DS412+ for the purpose of pinging network devices to verify that the devices respond to ping requests.  That article may have seemed a bit out of place on a blog that is primarily intended to include notes about using Oracle Database.  In the previous article I hinted that there is a check_oracle plugin for Nagios, suggesting that a Synology DiskStation DS1813+ or DS412+ could be used to monitor Oracle databases.

This blog article does NOT use the check_oracle plugin.  There are a small handful of official Nagios plugins for Oracle Database, but this article does not use any of those plugins either.  Instead, I took a chance at crafting a custom Perl script to monitor Oracle databases, the first time I have ever used Perl.  I thought back to some of the earlier blog articles where I introduced a couple of Oracle Database monitoring scripts that were written in VBScript, including Working with Oracle’s Time Model Data 3 and Oracle Statistics Chart Viewer - I started seeing the possibilities of using Nagios to alert DBAs about potential issues.  Yet, I have never used Perl, and at the time did not even have a clue how to concatenate strings in the language (it turns out that there are more than two techniques).

Verify that Perl is installed on the DiskStation by accessing the Package Center in the DiskStation’s DSM web interface.  If there is an Install button under the Perl heading, that indicates that Perl is not yet installed – click the Install button and wait until that button changes to Installed.

InstallOracleClient-1

Let’s start by installing the Oracle Client on the DiskStation.  I will use the Oracle Instant Client version 11.2.0.3 (the 12.1.0.1 Instant Client throws a different error message than did the 11.2.0.3 and 11.1.0.7 clients, so I reverted back to the 11.2.0.3 version).  To download the Oracle Instant Client for Linux, visit this link.  You will need an OTN account to download the client files – I downloaded the files using a desktop computer, and then transferred the files to a custom created Config share that I set up on the DiskStation.  This article requires the Basic Instant Client package and the SQL*Plus add-on package for the Instant Client.

Connect to the DiskStation using Telnet as the root user (see the previous article for Telnet connection directions).  For consistency, the .zip files for the Oracle Instant Client will be copied to the downloads directory that was created in the previous article.  We will create an oracle directory in the downloads directory:

mkdir /volume1/downloads/oracle/

Next, the downloaded files for the Oracle Instant Client are copied from the config share that I created on the DiskStation (to allow transporting files from a desktop computer) to the /volume1/downloads/oracle directory that was just created, and then the Oracle Instant Client files are unzipped, which automatically creates the instantclient_11_2 directory:

cp /volume1/config/instantclient-basic-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cp /volume1/config/instantclient-sqlplus-linux-11.2.0.3.0.zip /volume1/downloads/oracle/
cd /volume1/downloads/oracle/
unzip -o instantclient-basic-linux-11.2.0.3.0.zip
unzip -o instantclient-sqlplus-linux-11.2.0.3.0.zip
cd instantclient_11_2

The Oracle Instant Client 11.2.0.3 (and 11.1.0.7) require a file named libaio.so.1 that is not installed by default on the DiskStation.  After searching the Internet for a while, I found a copy of that file in the /lib directory on a Red Hat Enterprise Linux 3 server.  Using a desktop computer, I copied that file to the DiskStation’s config share, as well as a copy of the tnsnames.ora file that includes the database connection information for the databases that must be monitored.  The libaio.so.1 file will be placed in the /volume1/downloads/oracle/instantclient_11_2 directory, and the tnsnames.ora file will be placed in the /etc directory (one of the directories that is automatically searched).

cp /volume1/config/libaio.so.1 .
cp /volume1/config/tnsnames.ora /etc

Let’s start up SQL*Plus:

./sqlplus /nolog
./sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory

An error…

ls
BASIC_README    adrci   glogin.sql   libclntsh.so.11.1  libocci.so.11.1  libocijdbc11.so  libsqlplusic.so  ojdbc6.jar  sqltest.sql  xstreams.jar
SQLPLUS_README  genezi  libaio.so.1  libnnz11.so        libociei.so      libsqlplus.so    ojdbc5.jar       sqlplus     uidrvci

The file mentioned in the error message exists in the correct directory.  That error is caused by an undefined environment variable.  Fixing the error and trying again:

LD_LIBRARY_PATH="/volume1/downloads/oracle/instantclient_11_2"
export LD_LIBRARY_PATH
./sqlplus /nolog

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 26 13:22:16 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL>

Try to connect to one of the databases that are defined in the tnsnames.ora file, and then try executing a simple SQL statement before exiting SQL*Plus (change testuser, password, and DBName as appropriate for your environment):

CONNECT testuser/password@DBName
Connected.
SELECT SYSDATE FROM DUAL;

SYSDATE
---------
26-JUL-13

EXIT

Now that we have verified that the Instant Client version of SQL*Plus works from the DiskStation, we will create a simple SQL script to verify that the Instant Client version of SQL*Plus is able to accept script names from a command line.  The script will be created in the same directory where the Oracle Instant Client is located (see the basic directions for using vi, as found in the previous article):

vi /volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The first line in the script instructs SQL*Plus to abort the execution of the script when an error is encountered; if the connection attempt fails, there is no point in attempting to execute any SQL statements that follow.  The second line connects to the database (change testuser, password, and DBName as appropriate for your environment).  The third line executes a simple SQL statement, selecting from a table (rather than the virtual table DUAL) – change the SQL statement to a valid statement for your database.  The last statement exits SQL*Plus once the SQL statement finishes executing:

WHENEVER SQLERROR EXIT SQL.SQLCODE
CONNECT testuser/password@DBName
SELECT ID, DESCRIPTION FROM T1 WHERE ID LIKE '8X%';
EXIT

Try using SQL*Plus to execute the script:

./sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/sqltest.sql

The results from the SQL statement should display in the Telnet window, and then the normal DiskStation prompt should appear.  If the database instance was not running at the time of the execution, you would see something like this rather than the results of the SQL statement:

ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Process ID: 0
Session ID: 0 Serial number: 0

Nagios apparently expects its plugins to return one of four exit codes, as explained in this article, to determine whether or not the checked object is behaving as expected:

Exit Code Status
        0 OK
        1 WARNING
        2 CRITICAL
        3 UNKNOWN

The exit function in Perl permits returning a number, so that is one problem solved.  I want the Perl plugin to accept a command line parameter from Nagios so that I am able to check more than one database using the same Perl script – the Nagios command line parameter will specify the script to execute.  This article explains how to receive a command line parameter from Nagios, so that is a second problem that is solved.

Let’s put together a simple (OK, not simple for me) script that executes a SQL*Plus script that is specified by Nagios (the sqltest.sql file that was just created), and then returns 0 if no ORA- type error message is returned when the Oracle Instant Client’s SQL*Plus executes a script.  If an ORA- type error message is returned, then the script’s exit code is set to 2 and the ORA- errors are returned to Nagios.  Nagios apparently only accepts one line of output from the script, reading what is written by Perl’s print command.  As we saw earlier, the script may return multiple nested ORA- type errors if the database instance is down, so we need these multiple error messages to appear on one line of output from the Perl script.

vi /opt/libexec/check_oracle.pl

This is the script that I crafted after spending, quite literally, hours days searching the Internet for Perl command syntax that would execute on the DiskStation:

#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(:config no_ignore_case);
my $output = "";
my ($script);

my $result = GetOptions(
 "s|script=s"         => \$script,
  );

$ENV{"LD_LIBRARY_PATH"} = "/volume1/downloads/oracle/instantclient_11_2/";

my $ExitValue = 0;
print "Test Script: " . $script . ":  ";
open my $in, "/volume1/downloads/oracle/instantclient_11_2/sqlplus /nolog @/volume1/downloads/oracle/instantclient_11_2/$script |";

while (my $line = <$in>)
  {
  if ($line =~ /^(ORA-\d{5})/)
    {
    #$line =~ s/\v//g;
    chomp($line);
    $output = $output . $line . " ; ";
    $ExitValue = 2;
    }
  }

close($in);
print $output;
exit($ExitValue);

The script defines the LD_LIBRARY_PATH environment variable, so the Nagios user should (hopefully) have no problems executing the Perl script.  There are apparently a half-dozen different ways to remove the end of line characters from the SQL*Plus output, chomp($line); worked for my test, but $line =~ s/\v//g; did not work quite as expected.

Save the script and exit vi.  Next, we need to make the Perl script executable:

chmod +x /opt/libexec/check_oracle.pl

In the earlier article we had to modify a couple of the Nagios configuration files – this time we will need to modify some of the same files, starting with the commands.cfg file.

vi /opt/etc/objects/commands.cfg

Locate the ‘check_ping’ command definition that we modified in the earlier article.  Below the } character for that command definition, add the following, which will tell Nagios that our Perl script exists, and that the script command should be recognized by Nagios as check_oracle_ch:

# check_oracle_ch command definition by Charles Hooper
define command{
        command_name    check_oracle_ch
        command_line    $USER1$/check_oracle.pl -s $ARG1$
        }

Save the commands.cfg file and exit vi.  Let’s create another Nagios configuration file that will be used to list the Oracle Databases (and their scripts) that will be checked:

vi /opt/etc/objects/oracle.cfg

On a new line in the oracle.cfg file, add the following:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server123       ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST   ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest.sql    ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Additionally, we need to instruct Nagios that there is now an additional configuration file to read when starting.  Save the oracle.cfg file and exit vi.  Execute the following command:

vi /opt/etc/nagios.cfg

Below the cfg_file=/opt/etc/objects/switch.cfg line in the file (added in the previous article), add the following line:

cfg_file=/opt/etc/objects/oracle.cfg

Save the nagios.cfg file and exit vi.  Since we have modified the Nagios configuration, we must restart Nagios.  Find the first Nagios process ID, then kill that process (as was described in the previous article):

ps

kill 2683

Verify that Nagios does not return an error message when reading the configuration files:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are reported, then start Nagios:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Nagios should be writing log entries to a file name nagios.log (as defined in the nagios.cfg file).  The tail command permits viewing the last few (100 in this case) lines from a specified file:

tail -n 100 /opt/var/nagios.log

If Perl is not installed, Nagios may send an email with the following in the Additional Info section:

(Return code of 127 is out of bounds - plugin may be missing)

If the database instance is down, Nagios should send an email, possibly with the following in the Additional Info section (note that the script name is listed first – a different script will be used to test each database):

Test Script: sqltest.sql:  ORA-01034: ORACLE not available : ORA-27101: shared memory realm does not exist :

When the database instance is running, and the SQL statement(s) in the script executes successfully for the first time, Nagios should send an email with just the test script listed in the Additional Info section:

Test Script: sqltest.sql:

Now that the we have confirmed that Nagios and the Perl script work OK together, create additional SQL scripts in the /volume1/downloads/oracle/instantclient_11_2/ directory to connect to the other databases to be monitored (strongly consider changing the permissions for these files, since the files include usernames and passwords for your databases).  For each of those SQL scripts, create an additional service entry in the oracle.cfg file.  For example, if a script named sqltest42.sql is created to test a database on server42, the service entry might be created as shown below:

define service{
        use                     generic-service ; Inherit values from a template
        host_name               server42        ; This is a server name that is defined in the server.cfg file
        service_description     CHECK_DB_TEST42 ; A unique name given to the server, database, and script combination
        check_command           check_oracle_ch!sqltest42.sql  ; The command definition that was added to the commands.cfg file followed by ! and the name of the SQL script to execute
        normal_check_interval   5               ; Execute the script every 5 minutes under normal conditions
        retry_check_interval    1               ; Re-execute the script after receiving a return code of 2, every minute until its final/hard state is determined
        }

Don’t forget to verify the nagios configuration before restarting Nagios.

Anyone up to the challenge of rewriting my Working with Oracle’s Time Model Data 3 script in Perl so that the script will work on the DiskStation?





Install Nagios on a Synology DiskStation DS1813+ or DS412+

19 07 2013

July 19, 2013 (Modified July 27, 2013, July 28, 2013, November 19, 2013)

(Forward to the Next Post in the Series)

—-

Update July 27, 2013:

  • I now have Nagios running on an ARM based Synology DiskStation DS212+.  Most of the steps are the same as outlined below, however there are a few additional errors that must be addressed (see below additional steps).
  • All of the ./configure commands should have included –prefix=/opt (rather than –prefix=/usr/local or completely omitting that parameter).  That change eliminates the need to copy the Nagios plugins to the correct location.  Possibly related, the -i parameter was unnecessary for the snmp and Nagios plugins make and make install commands when the ./configure command included the –prefix=/opt prefix.
  • The wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20×86%20Linux%203.2.11&#8230; download step for the gcc compiler is apparently unnecessary, at least on the Synology DiskStation DS212+ (see below).

—-

This article describes how to compile and run Nagios on a Synology DiskStation DS1813+ (64 bit) or Synology DiskStation DS412+ (32 bit, the 32 bit steps should also apply to the DS1812+) NAS, both of which utilize Intel Atom processors (cat /proc/cpuinfo indicates that the DS412+ is using a 2.13GHz Atom D2700, while the DS1813+ is using a 2.13GHz Atom D2701), and utilize the DSM 4.2 operating system.  Not all Synology DiskStation NAS devices use Intel based CPUs – some of the less expensive DiskStations use ARM type processors (see this link to determine the type of CPU installed in a specific DiskStation).  It may be possible to produce a working version of Nagios on NAS devices that do not have Intel 32 bit or 64 bit processors, but I have not yet fully tested the procedure.

Warning: A lot of what follows is based on experimentation, with the end goal of having Nagios running on a Synology DiskStation having the ability to ping devices on the network or the Internet, with an email sent to an administrator when a device stops responding to ping requests, and to send a second email when the device resumes responding to ping requests.  This functionality represents a small fraction of Nagios’ capabilities through the use of plugins.  File paths vary from one Linux distribution to the next, so that adds a bit of challenge to make certain that the files are placed in the required directory.  Copying a file to the wrong directory may temporarily disable the DiskStation and require the reinstallation of the Synology DSM operating system.  The directions below are not final, and quite likely do not represent the most efficient approaches to accomplish the end goal – but the directions will hopefully be “close enough to correct” to allow the average reader of this blog to ping and send email alerts from a DiskStation.

I have relied on the free Nagios network monitoring solution since 2002 to provide an early warning of problems associated with network attached equipment including servers, production floor computers, switches, printers, wireless access points, IP cameras, Internet connection stability, etc.  While I rely on Nagios’ alerting system, I am not an expert at configuring the Nagios network monitoring system; the Nagios configuration documentation may be downloaded here.

First, make certain that the Telnet Service (or SSH Service if that is preferred) is enabled on the DiskStation.  In the DiskStation’s Control Panel, click Terminal.

InstallNagiosDiskStation1

Place a checkmark next to Enable Telnet service (if the item is not already checked), and then click the Apply button.

InstallNagiosDiskStation2

Verify that the computer that you intend to use has a Telnet client.  For Windows 7, access the Programs link in the Control Panel, and then click the Turn Windows features on or off link.  Make certain that there is a checkmark next to Telnet Client, then click the OK button.

InstallNagiosDiskStation3

Open a command line (in Windows, Start – Run – type  cmd  and press the Enter key).  On the command line, type telnet followed by either the name of the DiskStation or the IP address of the DiskStation, then press the Enter key.  When prompted for a username, type root and press the Enter key.  Type the admin user’s password (that is used to access the DSM interface in a web browser) and press the Enter key.

InstallNagiosDiskStation4

The command line on the DiskStation is very similar to the command line on a Unix or Linux computer, and is somewhat similar to a Windows command line or MS-DOS command line (use / rather than \, use ls rather than dir, use vi rather than edit):

InstallNagiosDiskStation5

We first need to add ipkg support to the DiskStation, detailed directions may be viewed at this link.  The exact directions may be different for other DiskStation models, but the following directions work for both the DS1813+ and DS412+ (note that all files downloaded from the Internet will be placed on volume1 in the downloads directory – copy and paste the lines to the Telnet session, one line at a time):

cd /volume1
mkdir downloads
cd downloads
wget http://ipkg.nslu2-linux.org/feeds/optware/syno-i686/cross/unstable/syno-i686-bootstrap_1.2-7_i686.xsh
chmod +x syno-i686-bootstrap_1.2-7_i686.xsh
sh syno-i686-bootstrap_1.2-7_i686.xsh

The vi editor is used on the DiskStation to modify files; that vi editor is a bit challenging to use at first sight, so you may need help with a couple of basic commands (see this quick reference for other commands).  The commands in vi are case sensitive (i is not the same as I).  When a file is opened, press the i key on the keyboard to allow making changes to the file (such as typing commands, or deleting commands).  When finished making changes to the file press the Esc key.  Once the Esc key is pressed, type ZZ to save the changed file and quit, or :q! to quit without saving the changes.

Next, we must modify the file that establishes the environment for the root user, when that user connects to the DiskStation.  This change is needed as part of the ipkg installation.  Edit the .profile file used by the root user:

vi /root/.profile

Add a # character in front of the two lines that contain the word PATH, then save the file (see the brief directions above to switch between command and insert mode in vi):

InstallNagiosDiskStation6

Next, reboot the DiskStation by clicking the Restart button in the Synology DSM interface (note: it should be possible to type reboot in the Telnet interface, however the DiskStation locked up the one time I attempted to execute that command).

InstallNagiosDiskStation7

Once the DiskStation reboots, reconnect to the DiskStation using Telnet, connecting as the root user, just as was done earlier.

The ipkg command should now work on the command line.  First, request that an updated list of available packages is downloaded, then display that list of packages:

ipkg update
ipkg list

Next, download a couple of packages that will be used by the Nagios network monitoring tool.  Note that using ipkg to install packages is a lot easier than compiling source code, so have fun with the ipkg utility.  When installing the optware-devel package, an error may appear stating that there is an incompatibility between wget and wget-ssl – just ignore that error for now.

ipkg update wget-ssl
ipkg install optware-devel
ipkg install gcc
ipkg install libtool
ipkg install mysql

Next, we need to compile a file and copy a couple of files:

cd /opt/share/libtool/libltdl/
./configure --prefix=/opt
make all
make install

cp /usr/syno/apache/modules/mod_ext_filter.so /opt/libexec/mod_ext_filter.so
cp /usr/syno/apache/modules/*.* /opt/libexec/

Now, install the Apache package:

ipkg install apache

If an error message is displayed on screen about mod_ext_filter.so, then modify the /opt/etc/apache2/httpd.conf file and add a # in front of the line LoadModule ext_filter_module libexec/mod_ext_filter.so and save the file.  Re-execute the ipkg install apache command (note that the up arrow on the keyboard may be pressed to quickly retype one of the previously executed commands).

InstallNagiosDiskStation8

Using the DiskStation’s Control Panel, create a nagios group and a nagcmd group (the nagcmd group probably will not be used for anything specific).  These groups do not require any special DiskStation permissions.

InstallNagiosDiskStation9

Using the DiskStation’s Control Panel, create a nagios user and add that user to the nagios and nagcmd groups.  The nagios user does not require any specific DiskStation permissions.

Next, switch back to the Telnet session, download the Nagios source code, and compile the source code:

DiskStation DS212+ Notes:

The following ./configure call was used on the DS212+:

./configure --prefix=/opt --with-command-group=nagios --disable-nanosleep --enable-nanosleep=no

The ./configure aborted with the following error message:

checking for pthread_create in -lpthread... no
checking for pthread_mutex_init in -lpthread... no
checking for pthread_create in -lpthreads... no
checking for pthread_create in -llthread... no
checking if we need -pthread for threads... no
checking for library containing nanosleep... no
Error: nanosleep() needed for timing operations.

The test that threw the error is located roughly 63% of the way through the configure file (on roughly line 5635).  If the exit 1 line in the configure file is commented out, then the configure step will complete.  However, the make all command will then fail with the following error messages:

/volume1/downloads/nagios/base/nebmods.c:363: undefined reference to `dlclose'
nebmods.o: In function `neb_load_module':
/volume1/downloads/nagios/base/nebmods.c:218: undefined reference to `dlopen'
/volume1/downloads/nagios/base/nebmods.c:249: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:266: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:299: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:225: undefined reference to `dlerror'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v2@GLIBC_PRIVATE'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v2@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v1@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v1@GLIBC_PRIVATE'
collect2: ld returned 1 exit status
make[1]: *** [nagios] Error 1
make[1]: Leaving directory `/volume1/downloads/nagios/base'
make: *** [all] Error 2

After a bit of searching on the Internet, I found a page that suggested making the following changes (note that I unsuccessfully tried a couple of other steps that may have also partially corrected the issue):

mkdir /opt/arm-none-linux-gnueabi/lib_disabled
mv /opt/arm-none-linux-gnueabi/lib/libpthread* /opt/arm-none-linux-gnueabi/lib_disabled

cp /lib/libpthread.so.0 /opt/arm-none-linux-gnueabi/lib/
cd /opt/arm-none-linux-gnueabi/lib/
ln -s libpthread.so.0 libpthread.so
ln -s libpthread.so.0 libpthread-2.5.so

After making the above changes, I was able to run the configure and make all commands without receiving an error.

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz
tar xzf nagios-3.5.0.tar.gz
cd nagios
./configure --prefix=/opt --with-command-group=nagios
make all
make install
make install-init
make install-config
make install-commandmode

We apparently need to copy a couple of files to different locations at this point:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Undo the changes that were earlier made to the /root/.profile file, where # characters were added in front of any line that contained the word PATH.  Remove those # characters and save the file:

vi /root/.profile

(This part still needs some fine tuning to make the web interface work with Nagios.)  Edit the Nagios Makefile and change the line beginning with HTTPD_CONF to show HTTPD_CONF=/opt/etc/apache2/conf.d  Then save the file.

cd /volume1/downloads/nagios
vi Makefile

InstallNagiosDiskStation10

Execute the following command:

make install-webconf

Create a nagiosadmin user for the web administration, specify a password when prompted:

htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Update November 19, 2013:

GabrielM reported in a comment below that it may be necessary to specify the full path to the htpasswd program:

/usr/syno/apache/bin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Install a couple of additional ipkg packages that will be used by Nagios (the last package adds a ping utility that may be used by Nagios – the security permissions on the DiskStation prevent non-root users from using the built-in ping utility):

ipkg install openssl
ipkg install openssl-dev
ipkg install sendmail
ipkg install inetutils

A step that may or may not be required is to download a functioning C++ compiler (some of the commands below point to files provided with the C++ compiler) - it appears that there should already be a compiler on the DiskStation at this point (in /opt/bin), so the successful completion of this task of downloading a usable C++ compiler might not be required.

DiskStation DS212+ Notes:

These wget and tar steps were completely skipped on the DS212+

For the DiskStation DS1813+ 64 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Cedarview%29/gcc420_glibc236_x64_cedarview-GPL.tgz
tar zxpf gcc420_glibc236_x64_cedarview-GPL.tgz -C /usr/local/

For the DiskStation DS412+ 32 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.2%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Bromolow%29/gcc421_glibc236_x86_bromolow-GPL.tgz
tar zxpf gcc421_glibc236_x86_bromolow-GPL.tgz -C /usr/local/

Now the net-snmp source code is downloaded and extracted:

DiskStation DS212+ Notes:

The ./configure call on the DS212 (might also work on the other DiskStation models):

./configure –prefix=/opt

The make call threw several errors, including:

/bin/sh: arm-none-linux-gnueabi-ld: not found
make[2]: *** [../blib/arch/auto/NetSNMP/default_store/default_store.so] Error 127

Before running the make command on the DS212+, execute the following command:

ln -s /opt/bin/ld /opt/bin/arm-none-linux-gnueabi-ld

The -i parameter may be omitted when running the make and make install commands.

cd /volume1/downloads
wget http://sourceforge.net/projects/net-snmp/files/net-snmp/5.7.2/net-snmp-5.7.2.tar.gz
tar xzf net-snmp-5.7.2.tar.gz
cd net-snmp-5.7.2

For the DiskStation DS1813+ 64 bit, execute the following to compile the net-snmp source (note that this command uses the compiler that was downloaded):

env CC=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-gcc \
LD=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ld \
RANLIB=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ranlib \
CFLAGS="-I/usr/local/x86_64-linux-gnu/include" \
LDFLAGS="-L/usr/local/x86_64-linux-gnu/lib" \
./configure --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux --prefix=/usr/local

For the DiskStation DS412+ 32 bit, execute the following to compile the net-snmp source (note: I could not use any of the different compilers that I tried downloading due to the compilers crashing with one of two error messages, so this command uses the compiler in /opt/bin):

env CC=/opt/bin/i686-linux-gnu-gcc \
LD=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ld \
RANLIB=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-linux-gnu/lib" \
./configure --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local

Several prompts will appear on the screen when either of the two commands is executed.  I entered the following for the prompts:

Default version of SNMP to use (3): 3
System Contact Information: (Enter)
System Location (Unknown): (Enter)
Location to write logfile (/var/log/snmpd.log): /opt/var/snmpd.log
Location to write persistent information (/var/net-snmp): (Enter)

Two additional commands to execute:

make -i
make install -i

Now we need to download the source code for the Nagios plugins (check_apt, check_breeze, check_by_ssh, check_clamd, check_cluster, check_dhcp, check_disk, check_disk_smb, check_dns, check_dummy, check_file_age, check_flexlm, check_ftp, check_http, check_icmp, check_ide_smart, check_ifoperstatup, check_ifstatus, check_imap, check_ircd, check_jabber, check_ldap, check_ldaps, check_load, check_log, check_mailq, check_mrtg, check_mrtgtraf, check_mysql, check_mysql_query, check_nagios, check_nntp, check_nntps, check_nt, check_ntp, check_ntp_peer, check_ntp_time, check_nwstat, check_oracle, check_overcr, check_ping, check_pop, check_procs, check_real, check_rpc, check_sensors, check_simap, check_smtp, check_snmp, check_spop, check_ssh, check_ssmtp, check_swap, check_tcp, check_time, check_udp, check_ups, check_users, check_wave) that allow Nagios to perform various monitoring tasks:

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
tar xzf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16/

Update November 19, 2013:

GabrielM reported in a comment below that the occasionally changing “current version” of the Nagios plugins makes it difficult to download the plugins from the source shown above.  If you open the http://prdownloads.sourceforge.net/sourceforge/nagiosplug/ web page in a web browser, the web browser will be redirected to http://sourceforge.net/projects/nagiosplug/files/ which contains the following statement:

“The Nagios Plugins are no longer distributed via SourceForge. For downloads and other information, please visit: https://www.nagios-plugins.org/
Source: README.md, updated 2013-10-01″

If you follow that link and then click the Download heading at the top of the page, there should be a link on the page that allows access to the current version of the Nagios plugins.  That link is currently: https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

The command that GabrielM provided should work:

wget https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

DiskStation DS212+ Notes:

The following configure, make, and make install commands were used:

./configure --prefix=/opt --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w"
make
make install

For the DiskStation DS1813+ 64 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux
make -i 
make install -i

For the DiskStation DS412+ 32 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local
make -i 
make install -i

Copy the Nagios plugins to the location expected by Nagios:

DiskStation DS212+ Notes:

The plugins were installed in the correct location on the DS212+

cp /usr/local/nagios/libexec/*.* /opt/libexec
cp /usr/local/nagios/libexec/* /opt/libexec
cp /usr/local/libexec/check_* /opt/libexec

Update November 19, 2013:

GabrielM reported in a comment below that the third command above may fail.  Depending on the compile options used, the first two commands or the third command may fail.  The first two commands are intended to accomplish the same task as the third command; the first two commands or the last command are expected to fail, but all three commands should not fail.  I should have explained this potential area of concern better.

Copy the Nagios startup script to the correct location so that Nagios will automatically start when the DiskStation is rebooted:

cp /usr/local/etc/rc.d/nagios /opt/etc/init.d/S81nagios

Verify that the ownership of the nagios directory is set correctly:

DiskStation DS212+ Notes:

The file is actually in the /opt/bin directory, so use this command instead:

chown nagios:nagios /opt/bin/nagios/nagios -R
chown nagios:nagios /usr/local/nagios -R

In addition to the main /opt/etc/nagios.cfg Nagios file, there are several other configuration files that are potentially used by Nagios (defined in the nagios.cfg file):

/opt/etc/objects/commands.cfg
/opt/etc/objects/contacts.cfg
/opt/etc/objects/timeperiods.cfg
/opt/etc/objects/templates.cfg
/opt/etc/objects/localhost.cfg
/opt/etc/objects/windows.cfg
/opt/etc/objects/server.cfg
/opt/etc/objects/switch.cfg
/opt/etc/objects/printer.cfg

We need to make a couple of adjustments in the  /opt/etc/objects/commands.cfg file.

vi /opt/etc/objects/commands.cfg

Change the ‘notify-host-by-email’ command definition section as follows:

define command{
    command_name notify-host-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘notify-service-by-email’ command definition section as follows:

define command{
    command_name notify-service-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘check_ping’ command definition section as follows (feel free to read the documentation for check_ping and specify different values):

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000,25% -c 5000,90% -p 3 
        }

Save the file and exit vi.

At this point, the Nagios network monitoring utility will likely experience an error similar to the following when attempting to send an alert email:

output=collect: Cannot write ./dfr6BFFPC7027203 (bfcommit, uid=1026, gid=25): Permission denied

Execute the following commands, which should fix the above problem:

chmod g+w /opt/var/spool/clientmqueue
chmod 444 /opt/etc/mail/*.cf
chmod 7555 /opt/sbin/sendmail

We will need to use su to test the execution of various commands as the nagios user.  Without this fix (described here), you might see the following error message:

su: warning: cannot change directory to /var/services/homes/nagios: No such file or directory su: /sbin/nologin: No such file or directory

Enter the following commands:

mkdir /var/services/homes
mkdir /var/services/homes/nagios
chown nagios:nagios /var/services/homes/nagios -R
vi /etc/passwd

Locate the line in the passwd file for the Nagios user.  Near the end of the line, /sbin/nologin should appear.  Replace that text with /bin/ash then save and exit vi.

Verify that the Nagios user is able to execute the check_ping plugin.  Replace MyDeviceHere with either an IP address or a network device name that is on your network:

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the ping command (called by check_ping) is not able to resolve a network device name, and the fully qualified dns name was not specified (MyDeviceHere.MyDomainHere.com), edit the /etc/resolv.conf file:

vi /etc/resolv.conf

On a new line in the file, add the following line (replacing MyDomainHere.com with your dns domain name for the network):

search MyDomainHere.com

Verify that sendmail works for the Nagios user.  At the prompt that appears, type a short message, press the Enter key, type a period, then press the Enter key again – replace MyEmailAddressHere@MyDomainHere.com with your email address):

su - nagios -c "/opt/sbin/sendmail -vt MyEmailAddressHere@MyDomainHere.com"

—-

It is important to always verify the Nagios configuration before starting (or restarting after a configuration change) Nagios.  To verify the configuration type the following:

/opt/bin/nagios -v /opt/etc/nagios.cfg

To start up Nagios as a background task (daemon), execute the following:

/opt/bin/nagios -d /opt/etc/nagios.cfg

To stop Nagios that is executing as a background task, type:

ps

InstallNagiosDiskStation11

Then search though the list of processes for the first line that shows /opt/bin/nagios -d /opt/etc/nagios.cfg.  The number at the left of that line, 31152 in this case, is used to stop Nagios.  To stop Nagios, type the following (replace 31152 with the number shown on your screen):

kill 31152

Side note: I tried installing quite a few different C++ compilers that supposedly work with the Synology DSM (see here).  As such, I had to find a way to remove a directory, that directory’s subdirectories, and files.  The following command will completely remove the /usr/local/i686-linux-gnu directory, should the need arise:

rm -rf /usr/local/i686-linux-gnu

At this point, Nagios will hopefully run as a background task, and it should be able to ping and send email alerts.  However, if you were following the above directions, we have not yet instructed Nagios which devices to monitor, and to whom the alert emails should be sent.  The next step is to define the email contacts by modifying the /opt/etc/objects/contacts.cfg file (see the documentation for assistance):

vi /opt/etc/objects/contacts.cfg

After setting up the contacts, we should probably tell Nagios which devices to monitor.  If there are a lot of devices on your network to be monitored, you might find that using Microsoft Excel rather than vi to create the object definitions makes the task more manageable.  Set up a simple worksheet with four columns.  Column A will be used to specify the short host_name for the object to be monitored.  Column B will be used to specify the alias (long description for the object).  Column C will be used to either specify the IP address for the device or the network name for the device.  Column D will be used to identify the group to which the object belongs and the file name to which the definition is saved (the Excel macro supports the following groups: ap, camera, computer, external, other, printer, server, switch).

InstallNagiosDiskStation13

The Excel macro is set up to read a tab delimited file, rather than reading the object description directly from the Excel worksheet.  Highlight all of the rows in the worksheet except for the top header row, and press Ctrl C (or edit – Copy) to copy the definitions to the Windows clipboard in tab delimited format.  Start Notepad (Start – Run – Notepad), and then press Ctrl V (or edit – Paste) to paste the tab delimited object descriptions into Notepad.  The Excel macro code expects the text file to be saved as nagioshosts.txt.

The Excel macro code follows (I image that not many computers still have a second floppy drive installed, so change the B:\Hardware Documentation\Synology\ path as appropriate for your environment):

Private Sub cmdProcessText_Click()
    Dim intFileNumRead As Integer
    Dim intFileNumAP As Integer
    Dim intFileNumCamera As Integer
    Dim intFileNumComputer As Integer
    Dim intFileNumExternal As Integer
    Dim intFileNumOther As Integer
    Dim intFileNumPrinter As Integer
    Dim intFileNumServer As Integer
    Dim intFileNumSwitch As Integer
    Dim intFileNumWrite As Integer

    Dim strLine As String
    Dim strItem() As String

    intFileNumRead = FreeFile
    Open "B:\Hardware Documentation\Synology\nagioshosts.txt" For Input As #intFileNumRead

    intFileNumAP = FreeFile
    Open "B:\Hardware Documentation\Synology\ap.cfg" For Output As intFileNumAP
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "# ap.cfg - lists the wireless access points to be monitored"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "define hostgroup{"; Chr(10);
    Print #intFileNumAP, "        hostgroup_name  ap                      ; The name of the hostgroup"; Chr(10);
    Print #intFileNumAP, "        alias           Local Access Points       ; Long name of the group"; Chr(10);
    Print #intFileNumAP, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumCamera = FreeFile
    Open "B:\Hardware Documentation\Synology\camera.cfg" For Output As intFileNumCamera
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "# camera.cfg - lists the IP cameras to be monitored"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "define hostgroup{"; Chr(10);
    Print #intFileNumCamera, "        hostgroup_name  camera                  ; The name of the hostgroup"; Chr(10);
    Print #intFileNumCamera, "        alias           Local IP Cameras          ; Long name of the group"; Chr(10);
    Print #intFileNumCamera, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumComputer = FreeFile
    Open "B:\Hardware Documentation\Synology\computer.cfg" For Output As intFileNumComputer
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "# computer.cfg - lists the shop floor computers to be monitored"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "define hostgroup{"; Chr(10);
    Print #intFileNumComputer, "        hostgroup_name  computer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumComputer, "        alias           Domain Computers          ; Long name of the group"; Chr(10);
    Print #intFileNumComputer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumExternal = FreeFile
    Open "B:\Hardware Documentation\Synology\external.cfg" For Output As intFileNumExternal
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "# external.cfg - lists the devices external to the LAN network to be monitored"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "define hostgroup{"; Chr(10);
    Print #intFileNumExternal, "        hostgroup_name  external               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumExternal, "        alias           Monitored devices External to the Network ; Long name of the group"; Chr(10);
    Print #intFileNumExternal, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumOther = FreeFile
    Open "B:\Hardware Documentation\Synology\other.cfg" For Output As intFileNumOther
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "# other.cfg - lists the miscellaneous devices to be monitored"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "define hostgroup{"; Chr(10);
    Print #intFileNumOther, "        hostgroup_name  other                 ; The name of the hostgroup"; Chr(10);
    Print #intFileNumOther, "        alias           Miscellaneous Devices ; Long name of the group"; Chr(10);
    Print #intFileNumOther, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumPrinter = FreeFile
    Open "B:\Hardware Documentation\Synology\printer.cfg" For Output As intFileNumPrinter
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "# printer.cfg - lists the printer devices to be monitored"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "define hostgroup{"; Chr(10);
    Print #intFileNumPrinter, "        hostgroup_name  printer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumPrinter, "        alias           Printers and Copiers  ; Long name of the group"; Chr(10);
    Print #intFileNumPrinter, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumServer = FreeFile
    Open "B:\Hardware Documentation\Synology\server.cfg" For Output As intFileNumServer
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "# server.cfg - lists the servers to be monitored"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "define hostgroup{"; Chr(10);
    Print #intFileNumServer, "        hostgroup_name  server               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumServer, "        alias           Server and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumServer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumSwitch = FreeFile
    Open "B:\Hardware Documentation\Synology\switch.cfg" For Output As intFileNumSwitch
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "# switch.cfg - lists the network equipment type devices to be monitored"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "define hostgroup{"; Chr(10);
    Print #intFileNumSwitch, "        hostgroup_name  switch               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumSwitch, "        alias           Switche and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumSwitch, "        }"; Chr(10); Chr(10); Chr(10);

    Do While Not (EOF(intFileNumRead))
        Line Input #intFileNumRead, strLine
        strItem = Split(strLine, vbTab)
        'strItem(0) = host_name
        'strItem(1) = alias
        'strItem(2) = address
        'strItem(3) = hostgroups
        Select Case strItem(3)
            Case "ap"
                intFileNumWrite = intFileNumAP
            Case "camera"
                intFileNumWrite = intFileNumCamera
            Case "computer"
                intFileNumWrite = intFileNumComputer
            Case "external"
                intFileNumWrite = intFileNumExternal
            Case "other"
                intFileNumWrite = intFileNumOther
            Case "printer"
                intFileNumWrite = intFileNumPrinter
            Case "server"
                intFileNumWrite = intFileNumServer
            Case "switch"
                intFileNumWrite = intFileNumSwitch
        End Select

        Print #intFileNumWrite, "define host{"; Chr(10);
        Select Case strItem(3)
            Case "ap"
                Print #intFileNumWrite, "        use             ap              ; Inherit default values from a template"; Chr(10);
            Case "camera"
                Print #intFileNumWrite, "        use             camera          ; Inherit default values from a template"; Chr(10);
            Case "computer"
                Print #intFileNumWrite, "        use             computer        ; Inherit default values from a template"; Chr(10);
            Case "external"
                Print #intFileNumWrite, "        use             external        ; Inherit default values from a template"; Chr(10);
            Case "other"
                Print #intFileNumWrite, "        use             other           ; Inherit default values from a template"; Chr(10);
            Case "printer"
                Print #intFileNumWrite, "        use             printer         ; Inherit default values from a template"; Chr(10);
            Case "server"
                Print #intFileNumWrite, "        use             server          ; Inherit default values from a template"; Chr(10);
            Case "switch"
                Print #intFileNumWrite, "        use             switch          ; Inherit default values from a template"; Chr(10);
        End Select
        Print #intFileNumWrite, "        host_name       "; strItem(0); "         ; The name we're giving to this device"; Chr(10);
        Print #intFileNumWrite, "        alias           "; strItem(1); "         ; A longer name associated with the device"; Chr(10);
        Print #intFileNumWrite, "        address         "; strItem(2); "         ; IP address of the device"; Chr(10);
        Print #intFileNumWrite, "        hostgroups      "; strItem(3); "         ; Host groups this device is associated with"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);

        Print #intFileNumWrite, "define service{"; Chr(10);
        Print #intFileNumWrite, "        use                     generic-service ; Inherit values from a template"; Chr(10);
        Print #intFileNumWrite, "        host_name               "; strItem(0); "        ; The name of the host the service is associated with"; Chr(10);
        Print #intFileNumWrite, "        service_description     PING            ; The service description"; Chr(10);
        Print #intFileNumWrite, "        check_command           check_ping!3000,25%!5000,90%    ; The command used to monitor the service"; Chr(10);
        Print #intFileNumWrite, "        normal_check_interval   5               ; Check the service every 5 minutes under normal conditions"; Chr(10);
        Print #intFileNumWrite, "        retry_check_interval    1               ; Re-check the service every minute until its final/hard state is determined"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);
    Loop

    Close #intFileNumRead
    Close #intFileNumAP
    Close #intFileNumCamera
    Close #intFileNumComputer
    Close #intFileNumExternal
    Close #intFileNumOther
    Close #intFileNumPrinter
    Close #intFileNumServer
    Close #intFileNumSwitch
End Sub

The files that are created use Unix/Linux standard line feed end of line marker characters, rather than the Windows standard carriage return/line feed combination characters.  As such, opening the generated files using Notepad is not advised.  Copy the generated files back to the /opt/etc/objects/ path on the DiskStation (copy the files to a Shared Folder on the DiskStation, then use the cp command to copy the files from the share location to /opt/etc/objects/ – the Shared Folders are typically created as a subdirectory in the /volume1/ directory).

If you decided to use some of the non-standard Nagios group names (as I did), those non-standard group names must be defined in the /opt/etc/objects/templates.cfg file:

vi /opt/etc/objects/templates.cfg

A portion of the additional entries that I made in this file include the following:

define host{
       name                    ap      ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, access points are monitored round t
       check_interval          5               ; Actively check the access point every 5 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each access point 10 times (max)
       check_command           check_ping      ; Default command to check if access points are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   30              ; Resend notifications every 30 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              ap ; Host groups that access points should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

define host{
       name                    camera  ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, cameras are monitored round t
       check_interval          60              ; Actively check the device every 60 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each device 10 times (max)
       check_command           check_ping      ; Default command to check if device are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   240             ; Resend notifications every 240 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              camera ; Host groups that cameras should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

Nagios will not know that it should read the additional configuration files until it is told to do so by modifying the /opt/etc/nagios.cfg file.

vi /opt/etc/nagios.cfg

Add the following lines to the nagios.cfg file:

# Charles Hooper's object types
cfg_file=/opt/etc/objects/ap.cfg
cfg_file=/opt/etc/objects/camera.cfg
cfg_file=/opt/etc/objects/computer.cfg
cfg_file=/opt/etc/objects/external.cfg
cfg_file=/opt/etc/objects/other.cfg
cfg_file=/opt/etc/objects/printer.cfg
cfg_file=/opt/etc/objects/server.cfg
cfg_file=/opt/etc/objects/switch.cfg

We have made a large number of changes to the configuration files, so it is important to verify that there are no errors in the configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are found in the configuration, terminate (kill) nagios and then restart as described above.

—-

Update July 28, 2013:

When attempting to start Nagios in daemon mode (/opt/bin/nagios -d /opt/etc/nagios.cfg) I encountered a couple of problems related to permissions for the Nagios user.  The nagios process was not listed when I used the ps command.  I then tried executing the following commands:

touch /opt/var/nagios.log
chown nagios:nagios /opt/var/nagios.log

Nagios was then able to start in daemon mode, but wrote messages similar to the following in the /opt/var/nagios.log file:

[1375058364] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058364] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058364] Bailing out due to errors encountered while attempting to daemonize… (PID=11451)
[1375058656] Nagios 3.5.0 starting… (PID=12936)
[1375058656] Local time is Sun Jul 28 20:44:16 EDT 2013
[1375058656] LOG VERSION: 2.0
[1375058656] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058656] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058656] Bailing out due to errors encountered while attempting to daemonize… (PID=12936)
[1375060107] Error: Unable to create temp file for writing status data: Permission denied
[1375060117] Error: Unable to create temp file for writing status data: Permission denied
[1375060127] Error: Unable to create temp file for writing status data: Permission denied
[1375060137] Error: Unable to create temp file for writing status data: Permission denied
[1375060147] Error: Unable to create temp file for writing status data: Permission denied
[1375060157] Error: Unable to create temp file for writing status data: Permission denied

I tried to set the permissions for a couple of other files, only to find another long list of Permission denied messages:

touch /opt/var/objects.cache
touch /opt/var/nagios.lock
touch /opt/var/nagios.tmp
chown nagios:nagios /opt/var/objects.cache
chown nagios:nagios /opt/var/nagios.lock
chown nagios:nagios /opt/var/nagios.tmp

I then recalled that I had seen similar messages on the DiskStation DS412+.  I then tried a different approach, creating a nagios directory in the /opt/var directory, creating a couple of subdirectories in that directory, and then assigning nagios as the owner of that directory structure:

mkdir /opt/var/nagios
mkdir /opt/var/nagios/archives
mkdir /opt/var/nagios/spool
mkdir /opt/var/nagios/spool/checkresults
chown nagios:nagios /opt/var/nagios -R
vi /opt/etc/nagios.cfg

In the nagios.cfg file, I made the following changes:

log_file=/opt/var/nagios/nagios.log
status_file=/opt/var/nagios/status.dat
lock_file=/opt/var/nagios/nagios.lock
temp_file=/opt/var/nagios/nagios.tmp
log_archive_path=/opt/var/nagios/archives
check_result_path=/opt/var/nagios/spool/checkresults
state_retention_file=/opt/var/nagios/retention.dat
debug_file=/opt/var/nagios/nagios.debug

After saving the file and exiting vi, I restarted Nagios in daemon mode.  Reading the last 100 lines of the Nagios log file is now accomplished with this command:

tail -n 100 /opt/var/nagios/nagios.log

—-

There are a lot of seemingly interesting Nagios plugins, including check_oracle (I believe that this plugin requires the Oracle client to be installed – good luck with that install).  On one of the DiskStations the check_snmp plugin did not compile, while on the other DiskStation the check_http plugin did not compile.

It might be interesting to see what solutions readers are able to develop from the above starting point.  The above information is the result of many hours of experimentation as well as a couple minutes reading through sections of the Nagios documentation (it reads like the Oracle Database documentation, so it should be an easy read once I am in the right mood) and hopelessly scanning the ‘net for information about obscure error messages.  Have fun, and try not to put the DiskStation out of service due to a mistaken file copy.

Update November 19, 2013:

Installing an updated version of the Synology DSM operating system may temporarily disable Nagios.  Make backups of all Nagios confirguration files (copying the files with the cp command to a directory in /volume1 is generally safe) before installing different versions of the Synology DSM operating system.

The DSM 4.3 operating system installation apparently removed the /var/services/homes directory.  That directory removal makes it impossible for the Nagios user to login to run various commands.  I assume that the removal of the homes directory is intentional, so a work around for that problem:

mkdir /var/services/home
mkdir /var/services/home/nagios
chown nagios:nagios /var/services/home/nagios -R
vi /etc/passwd

In the /etc/passwd file, change all /homes/ entries to /home/ then save and exit vi.

The installation of the different DSM version (including versions before 4.3) will likely also replace/remove the libltdl.* files located in /opt/local/lib and /usr/lib, so we need to copy those files back into the correct directories:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Once the above items are copied, try executing the check_ping command as the nagios user (replace MyDeviceHere with either an IP address or the name of a device on your network).

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the DiskStation reports that the check_ping command was not found, then copy that file back to the /opt/libexec/ directory.  If the above command was successful, try verifying the Nagios configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If the verification was successful, start Nagios as a daemon:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Execute the ps command and verify that the above command is listed in the running processes:

ps

Finally, verify that Nagios is still set to start automatically as a daemon:

ls /opt/etc/init.d/S81nagios

If a file is listed when the above command is executed, then Nagios should now be fully repaired.

-





On the Topic of Technology… 5

12 07 2013

July 7, 2013

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

As many readers of this blog are probably aware, Oracle Corporation has released Oracle Database 12.1.0.1 so far for the Linux, Solaris, and Windows platforms.  Oracle Database 12.1.0.1 may be downloaded from Oracle’s OTN website.  This article is not about Oracle Database 12.1.0.1, at least not specifically about that version.

In the previous article in this blog series last year, I mentioned experimenting with a Synology DiskStation DS212+, as well as a couple of IP based 640×480 resolution security cameras.  Since that time I have had the opportunity to purchase a couple of additional NAS devices including a Synology Diskstation DS112J, Synology Diskstation DS412+, and Synology Diskstation DS1813+.  The DS212+ and DS112J NAS devices have ARM type processors, while the DS412+ (32 bit?) and DS1813+ (64 bit) have Intel Atom D2700 series processors (the processor series for other Synology processors may be determined by visiting this link).  The processor type in the NAS partially determines the native capabilities of the NAS, as well as what else may be done with the NAS.  Setting up the Synology NAS devices to support FTP server functionality is fairly easy to accomplish, regardless of the processor type.  That FTP server functionality helps to support the upload functionality of the IP based security cameras.

As an experiment shortly after buying the Synology DiskStation DS212+, I attempted to install the network monitoring tool Nagios, in part to allow keeping track of which IP cameras were offline.  I hit a bit of a brick wall trying to find a precompiled package to permit the Nagios server functionality to run on the Synology DiskStations, which at the core run a version of Linux.  The closest thing that I could find was a plugin for Nagios to permit Nagios running on another machine to monitor a Synology NAS.  I first worked with Red Hat Linux in 1999, implemented dual inline manually-coded iptables firewalls based on a stripped down Red Hat Linux in early 2002, compiled/built a Linux based X.509 certificate supporting VPN server before the Linux kernel supported X.509 certificates (I tried compiling a patched version of the Red Hat kernel, patched with X.509 support, but eventually gave up and compiled the Working Overloaded Kernel), and even tried running Red Hat Enterprise Linux with Samba and Windbind as a member of the company’s file server team.  I first worked with Nagios in 2002, when one of my brothers introduced me to the Linux based network monitoring tool (previously called NetSaint).  Needless to say, I have experience working with Linux and manually compiling software on that platform, but that experience is apparently quite rusty.  The attempt to compile the Nagios source code on the Synology DiskStation DS212+ came to an abrupt halt when I received a message during the compile process essentially stating that the ARM type CPU (Marvell Kirkwood mv6282) did not support fine timer resolutions.

A couple of months later, I tried compiling the Nagios source on the Synology DiskStation DS412+, which features an Intel CPU architecture.  I encountered a couple of unexpected snags in the compile process, and had to put the project on hold for several months.  The paths to the various files on the Linux operating system running on the DiskStation differs a bit from the paths used by the Red Hat variants of Linux – that lack of standardization across the various Linux distributions has frustrated me from time to time over the years.

I recently purchased and reviewed a Synology DiskStation DS1813+.  In the review, I stated the following before testing the theory:

“Additionally, ipkg support permits the installation of roughly 900 additional applications, including C++ compilers – which in theory suggests that the source for the Nagios network monitoring utility can be downloaded and compiled on the DS1813+.”

I am curious to know whether or not anyone is able to get the Nagios server software to run on a Synology DiskStation DS412+ or DS1813+.

I suppose that I should not have proposed that the Nagios network monitoring utility might work on the DiskStation without actually confirming that the utility will work.  I am now able to confirm that the Nagios network monitoring utility will execute on the Synology DiskStation DS1813+, although the check_http plugin failed to compile.  The installation is anything but straight-forward – no how-tos that are close to being useful, and no Setup.exe to double-click.  The following screen capture also does not help (non-root users are not permitted to use the ping command on the DiskStations):

TopicOfTechnology5-1

At this time, I cannot provide a detailed instruction list for running the Nagios network monitoring utility on a Synology DiskStation.  However, as a starting point it is necessary to add ipkg support to the DiskStation.  The following ipkg items might be necessary: optware-devel, gcc, libtool, mysql, apache, openssl, openssl-dev, sendmail, inetutils.  With a bit of experimentation (and luck), you might see something similar to this when typing the ps command in a telnet session (I actually typed the command a second time so that the column headings would be visible – there certainly are a lot of background processes on the DiskStation):

TopicOfTechnology5-2

As I found out, just because Nagios is in the process list, that does not mean that it is able to do much of anything useful.  A work-around for the non-root ping issue is needed (I might have hinted part of the solution when listing the various ipkgs), as well as a work-around for the non-root sendmail problem that I did not mention.

When Nagios is working properly, unplugging a monitored device should result in an email message being sent (of course, if you unplug your computer, you probably will not receive an email message stating that the computer is down :-) ):

TopicOfTechnology5-3

There appear to be several Nagios plugins to monitor Oracle databases, although I have not had a chance yet to determine if any of those plugins will compile and work on a Synology DiskStation.  In theory it should wor… wait, I am not headed down that path yet!

In addition to a Synology DiskStation DS212+, the previous article in this series also showed a couple of smart 640×480 resolution IP cameras.  At the time of the previous article, I did not fully comprehend the usefulness of smart IP cameras.  Roughly 30 IP based cameras later, I now have a better understanding of their usefulness and limitations.  Last year I wrote reviews for three 640×480 model cameras here (it appears that Amazon now has this review attached to a different camera), here (it appears that Amazon now has this review attached to a different camera), and here (OK, there is a forth camera included in this review due to a model change over).  I was also burned badly (at a loss of $1343) when I bought two 1080P cameras last year that could not meet (or even approach) the manufacturer’s claims for the product.  All of those reviews include video samples produced by the cameras.

This year I bought and reviewed a couple of smart 720P resolution IP cameras, as well as a couple of different (from last year’s 1080P) smart 1080P resolution IP cameras.  As before, the reviews include sample video clips recorded by the cameras (the 720P and 1080P video was uploaded at the native resolution, but it appears that Amazon uses a pretty aggressive compression algorithm, which leads to some lost video quality).  The new 720P and 1080P cameras are not perfect, but the manufacturer appears to be taking steps to address the weaknesses that I outlined in the reviews.  I was sent another updated firmware for the 1080P cameras, as well as an updated PDF that includes the instructions that were missing from the included printed manual.  The support person for the camera company also stated that their website is currently under development, and will probably be online in the next 30 days.  My review mentioned the lack of success at using the recommended P2PCam264 app on a Motorola Xoom tablet for viewing the live video feed from the smart 720P and 1080P cameras.  The support person suggested using the AnyScene app on the Motorola Xoom tablet for viewing the live feed – that app seems to work.  The AnyScene app, while seemingly lacking the sound feed from the cameras, might even work a little too well.  I brought the Xoom tablet to a different network, only to find that the app is somehow able to still pull a live video feed from any of the configured cameras on the other network without poking any holes in the firewall on either network, and universal plug and play (uPNP) is disabled (below is a low resolution cell phone captured picture).  I am now left wondering what level of security risk this plug and play technology might pose.

TopicOfTechnology5-4

Sample PNG Generated from 720P Camera’s Video (Click to Display Larger Version):

topicoftechnology5-8

Sample PNG Generated from 1080P Camera’s Video (Same Scene as the Above Example – Click to Display Larger Version):

topicoftechnology5-9

Sample JPG 720P Image from an Edited Video (the 1080P video suffers from fewer out of focus problems and is the same resolution – just with a roughly 50% wider and taller viewing area):

TopicOfTechnology5-6

TriVision User Manual for PC – Aug 2012
TriVision User Manual for PC and Mac – Dec 2013





Everything or Nothing in SQL

23 05 2013

May 23, 2013

The following question recently came through an ERP mailing list (significantly rephrased):

I would like to use the Microsoft Query tool in Microsoft Excel to extract records from the ERP database.  I would like the list of parent records to be retrieved into Excel when all of the specified child records belong to the parent.

That sounds as though it would be an easy request.  The OP in the ERP mailing list later followed up with the following:

“If I may ask for another variation to pick your brain….

How would I rewrite this to show ONLY the work order(s) that have all of the 3 Parts, but instead of showing all of them, show only the ones that don’t include a specific other part?

For example, Include all the work order retrieved from your query, but then subtract those work orders that don’t contain a part or parts in another separate list.

I.E. All order containing parts (’147555′,’227643′,’155328′) but not containing ANY of these parts (’12345′, ’456789′)”

In the above follow up, the term “work order(s)” refer to the “parent records” that was previously mentioned, and the term “parts” refer to the “child records” that was previously mentioned.  Below is a graphical mock up, with the light-green colored rectangles representing the “parts”, and the white colored rectangle at the bottom left representing the “work order”.  The other white colored rectangles represent subassemblies that belong to the “work order”, and the cyan colored rectangles represent the operations that utilize the “parts”.  Through normalization, the white colored rectangles represent rows in one database table (WORK_ORDER – the “parent” table), the cyan colored rectangles represent rows in another table (OPERATION – the “child” table), and the light-green colored rectangles represent rows in a third table (REQUIREMENT – the “grandchild” table).

EverythingOrNothing

The first request still seems to be easy to solve, even though there are three tables that are potentially involved.  Let’s simplify the problem further and just focus on the “grandchild” table.  Let’s create a small version of that table and populate the table with some sample rows:

CREATE TABLE T1 (
  WORKORDER_TYPE VARCHAR2(1),
  WORKORDER_BASE_ID VARCHAR2(15),
  WORKORDER_LOT_ID VARCHAR2(3),
  WORKORDER_SUB_ID VARCHAR2(3),
  PIECE_NO NUMBER,
  PART_ID VARCHAR2(15),
  QTY NUMBER(12,4));

INSERT INTO
  T1
SELECT
  'W' WORKORDER_TYPE,
  TRIM(TO_CHAR(TRUNC(ROWNUM / 100 + 1)*1000,'00000')) || 'W' WORKORDER_BASE_ID,
  TO_CHAR(MOD(TRUNC(ROWNUM / 25 + 1), 999) + 1) WORKORDER_LOT_ID,
  TO_CHAR(TRUNC(MOD(ROWNUM - 1, 25) / 5) + 1) WORKORDER_SUB_ID,
  MOD(ROWNUM - 1, 25) + 1 PIECE_NO,
  CHR(65 + MOD(ROWNUM - 1, 26)) || CHR(65 + MOD(ROWNUM - 1, 15)) || MOD(ROWNUM - 1, 10)  PART_ID,
  TRUNC(DBMS_RANDOM.VALUE*100) + 1 QTY
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

In the above, the combination of the WORKORDER_BASE_ID column and the WORKORDER_LOT_ID column represent the “work order” that was mentioned by the OP.  The WORKORDER_SUB_ID column identifies the subassembly number (answering the question of to which of the white colored rectangles does the “part” belong).  The PIECE_NO column is used to uniquely identify the rows that belong to a specific subassembly number (that is the final column in the REQUIREMENT table’s primary key).  The PART_ID column is the specific “part” that is attached to a specific “work order”.  The QTY column indicates the number of the “part” that is required by the “work order” – with the sample data, this is the only column that will differ from one execution of the table creation script to the next execution.

The sample data is a bit too consistent, however, we will ignore any issues that consistency may present for now.  Let’s execute a simple SQL statement that indicates how many from the following “parts” list appear in the “work orders”: MI8, NJ9, KG6 (note the use of DISTINCT – a single PART_ID could appear on more than one subassembly in a single WORKORDER_BASE_ID, WORKORDER_LOT_ID combination, and we do not want to repeatedly count those “parts”):

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
FROM
  T1 TWHERE  T.PART_ID IN ('MI8','NJ9','KG6');

W WORKORDER_BASE_ WOR WOR PART_ID                  C
- --------------- --- --- --------------- ----------
W 01000W          3   3   KG6                      3
W 01000W          3   3   MI8                      3
W 01000W          3   3   NJ9                      3
W 05000W          19  1   KG6                      3
W 05000W          19  1   MI8                      3
W 05000W          19  1   NJ9                      3
W 09000W          34  4   KG6                      3
W 09000W          34  4   MI8                      3
W 09000W          34  4   NJ9                      3

Only 9 rows returned, with three different WORKORDER_BASE_ID, WORKORDER_LOT_ID combinations.  For a moment, let’s ignore that the three “parts” appear in all three “work orders”.  If we wanted a list of those “work orders” where _all_ three of the “parts” are present, and not just _one or more_ of the three “parts”, we could simply slide the above SQL statement into an inline view and specify that our C generated column must be equal to (or greater than – the use of greater than will be explained later) 3:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9
W 09000W          34  4   KG6
W 09000W          34  4   MI8
W 09000W          34  4   NJ9

While trying to determine the OP’s business case for the requests, I wondered if the OP was attempting to replace a specific “part” with another “part” within the ERP package, and simply required a report to understand the progress of that manual “part” replacement procedure in the ERP package.  Let’s move forward with this assumption by replacing a row containing a specific PART_ID value with a new PART_ID value, but only for a single “work order”:

DELETE FROM
  T1
WHERE
  WORKORDER_TYPE='W'
  AND WORKORDER_BASE_ID='09000W'
  AND WORKORDER_LOT_ID='34'
  AND PART_ID='NJ9';

1 row deleted.

INSERT INTO T1 VALUES (
  'W',
  '09000W',
  '34',
  '4',
  10,
  'REPLACEMENT-NJ9',
  10);

1 row created.

Let’s check the result of the SQL statement that we were in the process of developing:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=3
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   MI8
W 05000W          19  1   NJ9

Now, there are only six rows returned from two “work orders” – the third “work order” is no longer included in the query results.

With the use of the C >= n syntax, we are able to find “work orders” with at least n number of the specified “parts”.  So, if we were interested in finding cases where at least two of the three listed “parts” are attached to a “work order”, we just need to change the 3 to 2:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT T.PART_ID) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','NJ9','KG6'))
WHERE
  C>=2
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 01000W          3   3   NJ9
W 05000W          19  1   KG6
W 05000W          19  1   NJ9
W 05000W          19  1   MI8
W 09000W          34  4   KG6
W 09000W          34  4   MI8

The third “work order” is again included in the query resultset.

Other than putting the results into Microsoft Excel, the sample query satisfies the first request of the OP.  If we were not interested in returning the WORKORDER_SUB_ID and PART_ID columns, we could have simply used the following SQL statement:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','NJ9','KG6')
GROUP BY
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID
HAVING
  COUNT(*)>=3;

W WORKORDER_BASE_ WOR
- --------------- ---
W 05000W          19
W 01000W          3

Now to work on the second request, only retrieving the “work orders” when all of a specified list of “parts” are attached to the “work order” and none of another list of “parts” are attached to the “work order”.  As a starting point, we will re-use the inline view from the previous solution.  The IN list will be modified to include the “parts” that must be attached to the “work order”, as well as the “parts” that must not be attached to the “work order”.  The COUNT analytic function must be modified to include either a DECODE or CASE structure to divide the “parts” into the “must” and “must not” lists so that a count of each may be obtained:

SELECT
  T.WORKORDER_TYPE,
  T.WORKORDER_BASE_ID,
  T.WORKORDER_LOT_ID,
  T.WORKORDER_SUB_ID,
  T.PART_ID,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
  COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
FROM
  T1 T
WHERE
  T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9');

W WORKORDER_BASE_ WOR WOR PART_ID                  C         NC
- --------------- --- --- --------------- ---------- ----------
W 01000W          3   3   KG6                      2          0
W 01000W          3   3   MI8                      2          0
W 05000W          19  1   KG6                      2          0
W 05000W          19  1   MI8                      2          0
W 09000W          34  4   KG6                      2          1
W 09000W          34  4   MI8                      2          1
W 09000W          34  4   REPLACEMENT-NJ9          2          1

To satisfy the OP’s second request, we are interested in those rows where the C column value is (at least) 2 and the NC column value is 0.  Just as before, we will slide the above SQL statement into an inline view and add the specifications for the C and NC generated columns:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=0
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- -------
W 01000W          3   3   KG6
W 01000W          3   3   MI8
W 05000W          19  1   KG6
W 05000W          19  1   MI8

The above shows those “work orders” where the NJ9 “part” had not yet been replaced with the REPLACEMENT-NJ9 “part”.  If desired, we are also able to easily modify the OP’s second request to see the “work orders” where the “part” was already replaced just by changing NC=0 to NC=1:

SELECT
  WORKORDER_TYPE,
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PART_ID
FROM
  (SELECT
    T.WORKORDER_TYPE,
    T.WORKORDER_BASE_ID,
    T.WORKORDER_LOT_ID,
    T.WORKORDER_SUB_ID,
    T.PART_ID,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
    COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
  FROM
    T1 T
  WHERE
    T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
WHERE
  C>=2
  AND NC=1
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID;

W WORKORDER_BASE_ WOR WOR PART_ID
- --------------- --- --- ---------------
W 09000W          34  4   KG6
W 09000W          34  4   REPLACEMENT-NJ9
W 09000W          34  4   MI8

What other related information are we able to return?  What about all “parts” (the full bill of material) for each of the work orders that match the OP’s second request.  That modification is simple, just slide the majority of the above SQL statement into a subquery (found in the first WHERE clause):

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=0)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 01000W          3   1            1 ZK5                     48
W 01000W          3   1            2 AL6                     60
W 01000W          3   1            3 BM7                     42
W 01000W          3   1            4 CN8                     26
W 01000W          3   1            5 DO9                     49
W 01000W          3   2            6 EA0                     66
W 01000W          3   2            7 FB1                     39
W 01000W          3   2            8 GC2                     57
W 01000W          3   2            9 HD3                     42
W 01000W          3   2           10 IE4                     52
W 01000W          3   3           11 JF5                     16
W 01000W          3   3           12 KG6                     61
W 01000W          3   3           13 LH7                     39
W 01000W          3   3           14 MI8                     75
W 01000W          3   3           15 NJ9                     94
W 01000W          3   4           16 OK0                     67
W 01000W          3   4           17 PL1                     12
W 01000W          3   4           18 QM2                     41
W 01000W          3   4           19 RN3                     97
W 01000W          3   4           20 SO4                     15
W 01000W          3   5           21 TA5                     54
W 01000W          3   5           22 UB6                     59
W 01000W          3   5           23 VC7                     23
W 01000W          3   5           24 WD8                      7
W 01000W          3   5           25 YJ4                     22
W 05000W          19  1            1 JF5                     31
W 05000W          19  1            2 KG6                     42
W 05000W          19  1            3 LH7                     37
W 05000W          19  1            4 MI8                     85
W 05000W          19  1            5 NJ9                    100
W 05000W          19  2            6 OK0                     43
W 05000W          19  2            7 PL1                     58
W 05000W          19  2            8 QM2                     76
W 05000W          19  2            9 RN3                     66
W 05000W          19  2           10 SO4                     75
W 05000W          19  3           11 TA5                     50
W 05000W          19  3           12 UB6                     55
W 05000W          19  3           13 VC7                     18
W 05000W          19  3           14 WD8                     33
W 05000W          19  3           15 XE9                      8
W 05000W          19  4           16 YF0                     86
W 05000W          19  4           17 ZG1                      7
W 05000W          19  4           18 AH2                     25
W 05000W          19  4           19 BI3                     38
W 05000W          19  4           20 CJ4                     34
W 05000W          19  5           21 DK5                     88
W 05000W          19  5           22 EL6                     91
W 05000W          19  5           23 FM7                     52
W 05000W          19  5           24 GN8                     71
W 05000W          19  5           25 IE4                     55

Just as before, we are able to see all “parts” for the “work orders” that already have the replacement part, just by changing the NC=0 to NC=1:

SELECT
  *
FROM
  T1
WHERE
  (WORKORDER_TYPE,
   WORKORDER_BASE_ID,
   WORKORDER_LOT_ID) IN
    (SELECT
      WORKORDER_TYPE,
      WORKORDER_BASE_ID,
      WORKORDER_LOT_ID
    FROM
      (SELECT
        T.WORKORDER_TYPE,
        T.WORKORDER_BASE_ID,
        T.WORKORDER_LOT_ID,
        T.WORKORDER_SUB_ID,
        T.PART_ID,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('MI8','NJ9','KG6') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS C,
        COUNT(DISTINCT (CASE WHEN T.PART_ID IN ('REPLACEMENT-NJ9') THEN T.PART_ID ELSE NULL END)) OVER (PARTITION BY T.WORKORDER_TYPE, T.WORKORDER_BASE_ID, T.WORKORDER_LOT_ID) AS NC
      FROM
        T1 T
      WHERE
        T.PART_ID IN ('MI8','KG6',   'REPLACEMENT-NJ9'))
    WHERE
      C>=2
      AND NC=1)
ORDER BY
  WORKORDER_BASE_ID,
  WORKORDER_LOT_ID,
  WORKORDER_SUB_ID,
  PIECE_NO;

W WORKORDER_BASE_ WOR WOR   PIECE_NO PART_ID                QTY
- --------------- --- --- ---------- --------------- ----------
W 09000W          34  1            1 UF0                     80
W 09000W          34  1            2 VG1                     75
W 09000W          34  1            3 WH2                     75
W 09000W          34  1            4 XI3                     77
W 09000W          34  1            5 YJ4                     46
W 09000W          34  2            6 ZK5                     57
W 09000W          34  2            7 AL6                     64
W 09000W          34  2            8 BM7                     14
W 09000W          34  2            9 CN8                     30
W 09000W          34  2           10 DO9                     62
W 09000W          34  3           11 EA0                     45
W 09000W          34  3           12 FB1                     68
W 09000W          34  3           13 GC2                     25
W 09000W          34  3           14 HD3                     73
W 09000W          34  3           15 IE4                     32
W 09000W          34  4           10 REPLACEMENT-NJ9         10
W 09000W          34  4           16 JF5                     49
W 09000W          34  4           17 KG6                     23
W 09000W          34  4           18 LH7                     60
W 09000W          34  4           19 MI8                     75
W 09000W          34  5           21 OK0                     91
W 09000W          34  5           22 PL1                     66
W 09000W          34  5           23 QM2                     45
W 09000W          34  5           24 RN3                     49
W 09000W          34  5           25 TE9                      6

Using the sample table and data provided, are there any other solutions to the OP’s requests?





Grouping Data Sets by Week Number of the Month

1 05 2013

May 1, 2013

I saw a decent SQL brain teaser this morning in the comp.databases.oracle.server Usenet group.  The OP in the message thread is attempting to summarize data in one of his tables, with the summarizations broken down by month and then the week within that month.  Increasing the challenge, the OP required that the dates defining the week start and end dates fall within the month’s data set:

“Today is 4/30/2013.  The we_end [last day of the week containing April 30, 2013] value is 5/4/2013.  What I am really looking for is the weeks within the calendar month.  So, the first week for April 2013 should show 4/1 – 4/6.  And the last week should show 4/28 – 4/30″

How would you solve that particular problem?  Is there an Oracle built-in function that provides the grouping requested by the OP?

-

-

-

-

There are a couple different methods to solve the problem posed by the OP.  One method uses a virtual lookup table to determine the start date and end date of each week in the month, using the restriction that the start date of each week or the end date of each week might need to be adjusted to fall within the same month.  Interestingly, with the restrictions specified by the OP, there are 62 weeks this year.

As with all SQL statements, there is must be a sensible starting point.  First, let’s determine the start and end dates of each month in 2013.  The month start date is easy to determine, while outputting the month end date is a little more challenging – that date is one day less than the start of the next month:

SELECT 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=12; 

MONTH_STA MONTH_END 
--------- --------- 
01-JAN-13 31-JAN-13 
01-FEB-13 28-FEB-13 
01-MAR-13 31-MAR-13 
01-APR-13 30-APR-13 
01-MAY-13 31-MAY-13 
01-JUN-13 30-JUN-13 
01-JUL-13 31-JUL-13 
01-AUG-13 31-AUG-13 
01-SEP-13 30-SEP-13 
01-OCT-13 31-OCT-13 
01-NOV-13 30-NOV-13 
01-DEC-13 31-DEC-13

There are a couple of approaches that could be taken for the next step, finding the start of the first, second, third, fourth, fifth, and potentially sixth week in each of those months.  I am interested in determining the start of the second week – we are able to accomplish that task by using the NEXT_DAY function to find the next Sunday after the first of the month.  The end of the first week will also be calculated, even though it is not used in later calculations.  Note that this solution is subject to problems depending on the NLS settings on the client (and whether or not Sunday is considered the first day of the week):

SELECT 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
  ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
  NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
  NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
 FROM 
  DUAL 
CONNECT BY 
  LEVEL<=12; 

MONTH_STA MONTH_END END_FIRST START_SEC 
--------- --------- --------- --------- 
01-JAN-13 31-JAN-13 05-JAN-13 06-JAN-13 
01-FEB-13 28-FEB-13 02-FEB-13 03-FEB-13 
01-MAR-13 31-MAR-13 02-MAR-13 03-MAR-13 
01-APR-13 30-APR-13 06-APR-13 07-APR-13 
01-MAY-13 31-MAY-13 04-MAY-13 05-MAY-13 
01-JUN-13 30-JUN-13 01-JUN-13 02-JUN-13 
01-JUL-13 31-JUL-13 06-JUL-13 07-JUL-13 
01-AUG-13 31-AUG-13 03-AUG-13 04-AUG-13 
01-SEP-13 30-SEP-13 07-SEP-13 08-SEP-13 
01-OCT-13 31-OCT-13 05-OCT-13 06-OCT-13 
01-NOV-13 30-NOV-13 02-NOV-13 03-NOV-13 
01-DEC-13 31-DEC-13 07-DEC-13 08-DEC-13

The above may appear to be a number of related, but useless dates.  However, that is not the case.  Next, we will need a method to count through as many as six weeks per month.  This should work:

SELECT 
  LEVEL WEEK_NUMBER 
FROM 
  DUAL 
CONNECT BY 
  LEVEL<=6; 

WEEK_NUMBER 
----------- 
          1 
          2 
          3 
          4 
          5 
          6

Next, a Cartesian join will be created between the two row sources to permit stepping through each month, and each week within that month (we will refine the output later… note that this unrefined output provides a clue for a second method of solving the problem posed by the OP):

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7 START_WEEK, 
  MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1 END_WEEK 
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=12) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END; 

MONTH_STA WEEK_NUMBER START_WEE END_WEEK 
--------- ----------- --------- --------- 
01-JAN-13           1 30-DEC-12 05-JAN-13 
01-JAN-13           2 06-JAN-13 12-JAN-13 
01-JAN-13           3 13-JAN-13 19-JAN-13 
01-JAN-13           4 20-JAN-13 26-JAN-13 
01-JAN-13           5 27-JAN-13 02-FEB-13 
01-FEB-13           1 27-JAN-13 02-FEB-13 
01-FEB-13           2 03-FEB-13 09-FEB-13 
01-FEB-13           3 10-FEB-13 16-FEB-13 
01-FEB-13           4 17-FEB-13 23-FEB-13 
01-FEB-13           5 24-FEB-13 02-MAR-13 
01-MAR-13           1 24-FEB-13 02-MAR-13 
01-MAR-13           2 03-MAR-13 09-MAR-13 
01-MAR-13           3 10-MAR-13 16-MAR-13 
01-MAR-13           4 17-MAR-13 23-MAR-13 
01-MAR-13           5 24-MAR-13 30-MAR-13 
01-MAR-13           6 31-MAR-13 06-APR-13 
01-APR-13           1 31-MAR-13 06-APR-13 
01-APR-13           2 07-APR-13 13-APR-13 
01-APR-13           3 14-APR-13 20-APR-13 
01-APR-13           4 21-APR-13 27-APR-13 
01-APR-13           5 28-APR-13 04-MAY-13 
01-MAY-13           1 28-APR-13 04-MAY-13 
01-MAY-13           2 05-MAY-13 11-MAY-13 
01-MAY-13           3 12-MAY-13 18-MAY-13 
01-MAY-13           4 19-MAY-13 25-MAY-13 
01-MAY-13           5 26-MAY-13 01-JUN-13 
01-JUN-13           1 26-MAY-13 01-JUN-13 
...

The above output has a slight problem – the Sunday in the first week of the month may not be in the month of interest; additionally, the Saturday in the last week of the month may not be in the month of interest.  We may overcome those problems with the help of the DECODE function (the CASE expression may be used instead, if so desired):

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  DECODE(WN.WEEK_NUMBER, 1, MONTHS.MONTH_START, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7) START_WEEK,
  DECODE(SIGN(MONTHS.MONTH_END - (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7)), -1, MONTHS.MONTH_END, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1) END_WEEK
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
     NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2013','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=12) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END; 

MONTH_STA WEEK_NUMBER START_WEE END_WEEK 
--------- ----------- --------- --------- 
01-JAN-13           1 01-JAN-13 05-JAN-13 
01-JAN-13           2 06-JAN-13 12-JAN-13 
01-JAN-13           3 13-JAN-13 19-JAN-13 
01-JAN-13           4 20-JAN-13 26-JAN-13 
01-JAN-13           5 27-JAN-13 31-JAN-13 
01-FEB-13           1 01-FEB-13 02-FEB-13 
01-FEB-13           2 03-FEB-13 09-FEB-13 
01-FEB-13           3 10-FEB-13 16-FEB-13 
01-FEB-13           4 17-FEB-13 23-FEB-13 
01-FEB-13           5 24-FEB-13 28-FEB-13 
01-MAR-13           1 01-MAR-13 02-MAR-13 
01-MAR-13           2 03-MAR-13 09-MAR-13 
01-MAR-13           3 10-MAR-13 16-MAR-13 
01-MAR-13           4 17-MAR-13 23-MAR-13 
01-MAR-13           5 24-MAR-13 30-MAR-13 
01-MAR-13           6 31-MAR-13 31-MAR-13 
01-APR-13           1 01-APR-13 06-APR-13 
01-APR-13           2 07-APR-13 13-APR-13 
01-APR-13           3 14-APR-13 20-APR-13 
01-APR-13           4 21-APR-13 27-APR-13 
01-APR-13           5 28-APR-13 30-APR-13 
01-MAY-13           1 01-MAY-13 04-MAY-13 
01-MAY-13           2 05-MAY-13 11-MAY-13 
01-MAY-13           3 12-MAY-13 18-MAY-13 
01-MAY-13           4 19-MAY-13 25-MAY-13 
01-MAY-13           5 26-MAY-13 31-MAY-13 
... 
01-DEC-13           5 29-DEC-13 31-DEC-13 

62 rows selected.

Yes, this solution added 10 extra weeks to the year.  :-)

It is easy to extend this solution to cover more than just the year 2013.  For example, the following modification sets the starting period to January 2000, and runs for 20 years:

SELECT 
  MONTHS.MONTH_START, 
  WN.WEEK_NUMBER, 
  DECODE(WN.WEEK_NUMBER, 1, MONTHS.MONTH_START, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7) START_WEEK,
  DECODE(SIGN(MONTHS.MONTH_END - (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7)), -1, MONTHS.MONTH_END, MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-1)*7 - 1) END_WEEK
FROM 
  (SELECT 
    ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1)) MONTH_START, 
    ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),LEVEL)-1 MONTH_END, 
    NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1))-1,'SATURDAY') END_FIRST_WEEK,
     NEXT_DAY(ADD_MONTHS(TO_DATE('01-01-2000','MM-DD-YYYY'),(LEVEL-1)),'SUNDAY') START_SECOND_WEEK
   FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=(12*20)) MONTHS, 
  (SELECT 
    LEVEL WEEK_NUMBER 
  FROM 
    DUAL 
  CONNECT BY 
    LEVEL<=6) WN 
WHERE 
  (MONTHS.START_SECOND_WEEK + (WN.WEEK_NUMBER-2)*7)<=MONTHS.MONTH_END;

With the virtual lookup table complete, the next step would be to slide the above SQL statement into an inline view, and then join it to the data set that will be analyzed.  Assuming that the data set has a column named DATE_SENT, the WHERE clause may appear similar to the following:

WHERE
  DATE_SENT BETWEEN START_WEEK AND END_WEEK

The above is one solution, are there other solutions?  What about a solution that uses a little simple mathematics?  Such a solution could be exponentially more efficient.  First, we will create a simple table for experimentation with 1000 rows containing somewhat random dates on or after January 1, 2013:

CREATE TABLE T1 AS
SELECT
  TRUNC(TO_DATE('01-01-2013','MM-DD-YYYY')+DBMS_RANDOM.VALUE*366) DATE_SENT
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

Next, we will generate a SQL statement that produces output that is somewhat similar to the unrefined output found in the first solution (note that the SQL statement contains a DISTINCT clause to eliminate duplicate rows – there could be several rows in the row source with the same random date value):

SELECT DISTINCT
  DATE_SENT,
  TRUNC(DATE_SENT,'MM') MONTH_START,
  ADD_MONTHS(TRUNC(DATE_SENT,'MM'),1)-1 MONTH_END,
  NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY') MONTH_WEEK_START
FROM
  T1
ORDER BY
  DATE_SENT;

DATE_SENT MONTH_STA MONTH_END MONTH_WEE
--------- --------- --------- ---------
01-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
02-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
03-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
04-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
05-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
06-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
07-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
08-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
09-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
10-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
11-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
12-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
13-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
14-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
15-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
16-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
17-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
18-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
19-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
20-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
21-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
22-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
23-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
24-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
25-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
26-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
27-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
28-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
29-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
30-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
31-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12
01-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
02-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
03-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13
...
29-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
30-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
31-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13
01-JAN-14 01-JAN-14 31-JAN-14 29-DEC-13

In the above, the MONTH_WEEK_START column shows the Sunday of the week that includes the DATE_SENT column value.  At this point, it makes no difference if the MONTH_WEEK_START column value is in a different month than the DATE_SENT column value.

Next, we will determine the week within the month in which the DATE_SENT column value appears:

SELECT DISTINCT
  DATE_SENT,
  TRUNC(DATE_SENT,'MM') MONTH_START,
  ADD_MONTHS(TRUNC(DATE_SENT,'MM'),1)-1 MONTH_END,
  NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY') MONTH_WEEK_START,
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1 WEEK_NUMBER
FROM
  T1
ORDER BY
  DATE_SENT;

DATE_SENT MONTH_STA MONTH_END MONTH_WEE WEEK_NUMBER
--------- --------- --------- --------- -----------
01-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
02-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
03-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
04-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
05-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           1
06-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
07-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
08-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
09-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
10-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
11-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
12-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           2
13-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
14-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
15-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
16-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
17-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
18-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
19-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           3
20-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
21-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
22-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
23-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
24-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
25-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
26-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           4
27-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
28-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
29-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
30-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
31-JAN-13 01-JAN-13 31-JAN-13 30-DEC-12           5
01-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           1
02-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           1
03-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           2
04-FEB-13 01-FEB-13 28-FEB-13 27-JAN-13           2
...
24-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
26-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
27-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
28-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           4
29-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
30-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
31-DEC-13 01-DEC-13 31-DEC-13 01-DEC-13           5
01-JAN-14 01-JAN-14 31-JAN-14 29-DEC-13           1

Heading toward the request made by the OP in the Usenet thread, the following SQL statement will determine a simple COUNT of the number of rows that have a DATE_SENT date in each week of each month in the T1 table’s data set (note that if there are no rows in a given week, that week will be skipped in the output):

SELECT
  TRUNC(DATE_SENT,'MM') MONTH_START,
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1 WEEK_NUMBER,
  COUNT(*) DAYS_IN_SET
FROM
  T1
GROUP BY
  TRUNC(DATE_SENT,'MM'),
  TRUNC((DATE_SENT-(NEXT_DAY(TRUNC(DATE_SENT,'MM')-7,'SUNDAY')))/7)+1
ORDER BY
  1,
  2;

MONTH_STA WEEK_NUMBER DAYS_IN_SET
--------- ----------- -----------
01-JAN-13           1          13
01-JAN-13           2          15
01-JAN-13           3          23
01-JAN-13           4          21
01-JAN-13           5          13
01-FEB-13           1           5
01-FEB-13           2          21
01-FEB-13           3          15
01-FEB-13           4          14
01-FEB-13           5          11
01-MAR-13           1           4
01-MAR-13           2          18
01-MAR-13           3          19
01-MAR-13           4          20
01-MAR-13           5          20
01-MAR-13           6           4
01-APR-13           1          14
01-APR-13           2          26
01-APR-13           3          15
01-APR-13           4          23
01-APR-13           5           9
01-MAY-13           1          12
01-MAY-13           2          15
01-MAY-13           3          22
01-MAY-13           4          23
01-MAY-13           5          17
01-JUN-13           1           1
01-JUN-13           2          14
01-JUN-13           3          21
01-JUN-13           4          18
01-JUN-13           5          21
01-JUN-13           6           4
01-JUL-13           1          15
01-JUL-13           2          21
01-JUL-13           3          13
01-JUL-13           4          24
01-JUL-13           5          14
01-AUG-13           1           6
01-AUG-13           2          10
01-AUG-13           3          21
01-AUG-13           4          22
01-AUG-13           5          28
01-SEP-13           1          16
01-SEP-13           2          16
01-SEP-13           3          22
01-SEP-13           4          23
01-SEP-13           5           8
01-OCT-13           1          15
01-OCT-13           2          15
01-OCT-13           3          17
01-OCT-13           4          18
01-OCT-13           5          21
01-NOV-13           1           6
01-NOV-13           2          20
01-NOV-13           3          20
01-NOV-13           4          17
01-NOV-13           5          18
01-DEC-13           1          22
01-DEC-13           2          15
01-DEC-13           3          19
01-DEC-13           4          20
01-DEC-13           5           6
01-JAN-14           1           1

It might be just my opinion, but the mathematical solution appears to be easier to understand than the method that uses the virtual lookup table.  We are still able to derive the start date and end date of each week, if needed, to produce the OP’s solution.

Might there be other solutions to the OP’s SQL brain teaser?








Follow

Get every new post delivered to your Inbox.

Join 137 other followers