Monitoring Changes to Table Data

22 03 2012

March 22, 2012

Suppose that you receive a request stating that a particular table in one of your databases must be monitored for changes.  One such table might be the one that lists the ERP system’s suppliers and their addresses – you would not want someone to be able to set up a legitimate supplier, and later have the billing remit to name and address changed without being detected.

What approach would you take to solve the above problem?  A question similar to the above arrived recently in an ERP mailing list email – the original poster (OP) is using a SQL Server RDBMS, so that might change the proposed solution just a bit.

Here is the “How simple is too simple?” suggestion that I offered:

Periodically, create a VENDOR_SAVE table (drop it if it already exists, or delete all rows and re-insert from the original source table):

CREATE TABLE VENDOR_SAVE AS
SELECT
  *
FROM
  VENDOR;

Now, just wait a while.

The question then becomes, how do we detect:

  • A new row (record) added to the original source (VENDOR) table.
  • An old row (record) deleted from the original source (VENDOR) table.
  • A change to any column (program field) in the original source (VENDOR) table since the last time the VENDOR_SAVE table was created/refreshed.

Let’s start with the first two bullet points.  I will write the SQL statements so that the statements should work with Oracle Database 9.0.1 and above, and SQL Server with very few changes, even if I feel a bit ANSI about doing so.  I will use the COALESCE function, rather than the NVL function, and CASE syntax rather than the equivalent DECODE syntax.

To identify cases where a row has appeared in, or disappeared from the original source (VENDOR) table, we can simply perform a full outer join between the original source table and the historical mirror image of the original table (VENDOR_SAVE).  We are only interested in cases where the primary key column (ID) is found in exactly one of the two tables:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL;

So, the above SQL statement satisfies the first two bullet points.  The third bullet point is a little more challenging to accomplish… unless of course we employ UNION labor.  If we have two row sources with identical columns, and UNION the row sources together, the resulting row source will be absent of any entirely duplicated rows from the two original row sources (two rows will be reduced to a single row).  If there were no changes to any of the column values (or if the row was added to or deleted from the original source table), there will be a single row for the primary key column value.  If any columns were changed, there will be two rows containing the primary key column value.

Let’s build a SQL statement that UNIONs the rows from the two tables together, and counts the number of rows for each primary key value:

SELECT
  COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
  VL.ID,
  VL.NAME,
  VL.ADDR_1
FROM
  (SELECT
    *
  FROM
    VENDOR
  UNION
  SELECT
    *
  FROM
    VENDOR_SAVE) VL;

To complete the requirement for bullet point 3 above, we need to eliminate all rows from the result set where there is a single row for the primary key value:

SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1;

As a final step, we should join the two resultsets into a single resultset using UNION ALL:

SELECT
  COALESCE(V.ID,V2.ID) AS ID,
  COALESCE(V.NAME,V2.NAME) AS NAME,
  COALESCE(V.ADDR_1,V2.ADDR_1) AS ADDR_1,
  CASE WHEN V.ID IS NULL THEN 'VENDOR ADDED' ELSE 'VENDOR DELETED' END AS PROBLEM
FROM
  VENDOR V
FULL OUTER JOIN
  VENDOR_SAVE V2
ON (V.ID=V2.ID)
WHERE
  V.ID IS NULL
  OR V2.ID IS NULL
UNION ALL
SELECT DISTINCT
  VL2.ID,
  VL2.NAME,
  VL2.ADDR_1,
  'VALUE CHANGE' PROBLEM
FROM
  (SELECT
    COUNT(*) OVER (PARTITION BY VL.ID) AS CNT,
    VL.ID,
    VL.NAME,
    VL.ADDR_1
  FROM
    (SELECT
      *
    FROM
      VENDOR
    UNION
    SELECT
      *
    FROM
      VENDOR_SAVE) VL) VL2
WHERE
  VL2.CNT>1
ORDER BY
  ID;

While somewhat limited in usefulness, the above approach will indicate which rows in the original source table should be examined because the column values in those rows changed (or were added or deleted).

Another, potentially more useful approach involves setting up a logging trigger and logging table.  I previously shared a VBS script that helps to automate and standardize the process of creating the logging trigger and logging table.  A very similar feature is built into my Hyper-Extended Oracle Performance Monitor program – but, much like the VBS script, this solution is useless for the OP who uses a SQL Server backend.

What are the other options?  Oracle Database’s built-in auditing capabilities.  Auditing built into the ERP system (this particular ERP system writes all audit/change records to a single table that uses a VARCHAR2 column to capture the before/after images of the values stored in columns).  Any other options?  (The mess that someone can create with a bit of idle time on their hands…)








Follow

Get every new post delivered to your Inbox.

Join 144 other followers