Battling the Symptoms or Addressing the Root Cause

3 04 2010

April 3, 2010

A non-Oracle specific question arrived in an email from an ERP mailing list – I think that the user who wrote the email was probably running SQL Server, but that probably does not imply much other than potential differences in read-consistency and trigger code when compared to a user running the same ERP package with Oracle Database.

Paraphrasing the question:

I need to be able to automate the running of a utility (VMFIXOHQ) on a Windows client computer.  The utility does not offer a command line interface for specifying parameters, so the method of automation must be able to enter text into screen fields, click program buttons, and activate menu items so that the utility will automatically run on a nightly basis.  A program named Automate is able to accomplish this task, but is too expensive for this specific task.

I have in the past written task schedulers that would do exactly what the author of the email requested, but I did not offer the task scheduler to the original poster.  Why, well it is hard to describe why.  With a lead-in, I offered the following analogy:

IT Guy: “Doctor, I have a splitting headache and I seem to be having trouble remembering things.”
Doctor1: “Let me write you a prescription for a new desk with a padded writing surface.”
IT Guy: “OK, but I don’t see how that will help my headache.”
Doctor1: “You just told me that you only have splitting headaches while sitting at your desk.”
IT Guy: “That new desk works great.  The headaches still happen, but don’t last quite as long.”
Doctor1: “Now let me prescribe something to cure the imprint of the paperclip and the stapler on your forehead.”
Doctor2:Wouldn’t it be easier to find out why he keeps banging his head on his desk?

In the above analogy, Doctor1 was treating the symptoms of the problem.  Maybe he notice the IT guy’s red forehead, and thought that if the IT guy must bang his head on his desk, he really should have a softer surface for his forehead to hit.  Once the original problem was mitigated, a secondary, related problem remained – obviously, the IT guy should make certain to clear his desk before banging his head.

Doctor2, on the other hand, suggested a root cause analysis.  If the IT guy is banging his head on his desk, determine what triggers the IT guy to bang his head on his desk.  Maybe he can’t find the flyswatter.  Maybe he once hit his head and then by coincidence found a solution to a perplexing problem.  Maybe he is frustrated (he might have worn too small of a size of shoes, causing his feet to hurt).  Maybe someone is forcing him to bang his head?  Wouldn’t it be better to find out why, rather than just trying a number of things that might make the problem less severe, but never actually fix the problem?

An Oracle database example of this is simply throwing hardware at a performance problem because a root cause analysis is perceived as requiring too much time and being too expensive (computer hardware costs are decreasing while at the same time IT labor costs are increasing).  Sure, replace the server with one having 4 times as many CPUs and 4 times as much memory – after all, hardware is cheap compared to the perceived cost of a root cause analysis (at least that is what it says on the news).  Forget that such a cheap upgrade will require 4 times as many Oracle Database CPU licenses, accompanied by 4 times as much for annual Oracle support/maintenance fees.  On second thought, maybe a root cause analysis is really a much better and less costly approach, no matter if the performance problem is caused by a change to daylight savings time, someone verbally abusing the SAN, an upgrade of the Oracle Database version, or something else.

It might seem that I drifted a bit from the topic of the email that arrived from the ERP mailing list about scheduling the execution of the VMFIXOHQ utility.  That utility is not one that should be run daily, not one that should be run weekly, not one that should be run monthly, and not even one that should be run yearly (this doesn’t sound like anything in the Oracle Database universe, does it?).  That utility has a very specific purpose – it fixes the results of application and/or database trigger bugs that caused the stored on hand inventory counts for a specific part to differ from what is sitting on the shelf.  More accurately, a transaction is recorded in the database whenever parts are added to inventory, removed from inventory, or moved from one warehouse location to another, causing the on hand inventory counts for the parts are adjusted accordingly.  This VMFIXOHQ utility runs through these transactions from day 1 and effectively determines how many of the part should be sitting on the shelf based on the supporting inventory transactions.  Scheduling the running of the VMFIXOHQ utility does not address the real reason for the inventory counts being inaccurate; rather it is a band-aid (a padded desk, if you will) for the real problem – a code bug, missing trigger, improperly handled deadlock, or multi-session read-consistency issues.

Was I wrong not to tell the original poster how to schedule the running of this utility?  🙂