On the Topic of Programming 1

26 08 2012

August 26, 2012

(Forward to the Next Post in the Series)

As those of you who have read this blog’s About page probably know, my day to day job responsibilities involve a lot of activities that are not specific to Oracle Database performance tuning, or even remotely Oracle DBA type activities.  Those extra acttivites are part of what keeps the job fresh and interesting, whether I am swapping in a new roll of labels into a Zebra label printer (that was a three minute fun task this morning), troubleshooting phone system problems (from a fork lift truck “disabling” a phone, to programming the PBX with a clever “message delivery system” to waste the time of persistent telemarketers), swapping out the power supply in a computer very early in the morning, or engaged in a marathon of application programming into the evening and weekend hours.

One of the recent programming projects involves the continuing effort of decreasing the number of data islands, allowing the data previously contained in the islands to be accessed and cross-referenced with data generated by other applications.  One of the data island generators that I have been working to phase out is Superbase 3.0, a database platform that started life on a Commodore 64 in 1983.  Superbase 3.0 is a 16 bit client-side database, so it does not play well with the 64 bit Windows 7 that ships with most new desktop computers (16 bit applications will not run natively on 64 bit Windows, instead the 16 bit applications must be run inside a virtual machine such as Windows XP Mode, or run remotely using remote access software such as a VNC client).

The data contained in the old Superbase databases is critical to the company’s processes, so that data must be salvaged – meaning that the data must be transformed and imported into an Oracle database.  Unlike what a developer would likely create in a relational database, often with multiple tables used to store one “record” in the database, the long departed designer of the Superbase databases used a single row in a single database table to store one “record” in the database.  That organization reminds me a lot of the Microsoft Works package’s database from the early 1990s, with its fancy data entry forms which allowed users to escape the dull spreadsheet-like data entry screen.  Microsoft Excel from the early/mid 1990s could magically transform a dull spreadsheet data entry screen into a simple data entry form, in the process transforming the expensive Microsoft Excel into essentially a cheap database program.  It is a bit more of a challenge to locate the automatic data entry form creator in Excel 2010 than I recall it being in the early/mid 1990s version of Excel, but I suppose that helps further reduce the number of potential data islands:

So, what does the above discussion of Microsoft Excel have to do with anything related to Oracle Database?  The data contained in the Superbase databases must be transformed and inserted into an Oracle database.  It is good news that Superbase is able to export data to Microsoft Excel format.  The bad news is that the exported format is designed to work with Microsoft Excel 2.0 – a very old version of Microsoft Excel that seems to date back to 1987!  Time for a lot of manual data entry if that data must end up in an Oracle Database 11.2.0.x database… unless…

Microsoft Excel 2003 (version 12.0 if I remember correctly) is able to open Excel 2.0 files… success, even if the success is minor.  Now, how to go about tranferring the data from Excel into Oracle Database?  I suppose that I could have created a macro in Microsoft Excel to insert the data into Oracle Database, but at the time I was not interested in writing a macro that accomplished the task “the right way” using bind variables.  And just look at that data – some of the date values were imported as very small (roughly -65000) numbers, in some cases nearly 20 different spellings for the same customer name, and alpha-numeric text in columns that should be numeric.

So, how did I import the Superbase data that was now in Excel 2003 into the Oracle Database 11.2.0.x database without writing an Excel macro?  The particular computer with Excel 2003 that I was using also had a copy of Access 2003 installed.  Access 2003 is able to create a table “link” to an Excel 2003 spreadsheet’s worksheet, and handle that worksheet essentially the same as if it were a database table.  Now the data is “in” Microsoft Access 2003, but still not in an Oracle database.  Previous experience with this process pays off – before bringing the data into Microsoft Access, type each of the Oracle Database destination table’s column names into the first row of the Microsoft Excel spreadsheet, above the appropriate column’s data.  Importing the data into the Oracle database then becomes a simple four step process (assuming that no other data transformation is necessary)

  1. Link to the Excel spreadsheet’s worksheet and the destination table in the Oracle database.
  2. Create an Access view (stored Query) that selects all of the columns from the Excel worksheet that must be inserted into the Oracle database.
  3. Convert the view (stored Query) type to an Append type and select the linked Oracle Database table as the destination – Access will automatically find the correct destination column in the Oracle table, if the source column name (from the first row in the Excel worksheet) matches the destination column name.
  4. Execute the append type view.

A simple transformation of the data from 1994 database technology to 1987, 2003, and then on to 2011 in Oracle Database – and without writing a single line of code.  Remember that problem that I mentioned about alpha-numeric text in columns that should be numeric, such as “10&20” in a column named OPERATION_NUMBER (or OPERATION_SEQ_NO) – it turns out that that bit of inconsistent data cannot just be thrown out (thanks Microsoft Access 2003).  To fix that problem, I would need to add another column to the Oracle Database table, and then have Microsoft Access update that table using the Microsoft Excel spreadsheet data (fixing the “10&20”, “10 & 20”, “10  &20”, “10 AND 20”, “10,20” and “10, 20” variants into a standard format.  The SQL dialect in Microsoft Access is a bit odd at times, and I could not remember if the odd syntax applies to UPDATE statements also.  As an example of the odd syntax, the simple CREATE TABLEAS SELECT:

CREATE TABLE
  T2 AS
SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Must be written like the following in Microsoft Access:

SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
INTO
  T2
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Since I am not 100% confident in my SQL authoring skills in Microsoft Access, how do I move the data from the Excel spreadsheet into the new column of the Oracle Database table… and without writing a single line of programming code?  I simply created a temporary table (not a true temporary table, because the table data must be visible to more than one session) that contained the primary key column and a second column for the non-numeric numeric data.  Once the data was in the temporary Oracle table (using the simple four step process outlined above), I simply executed an UPDATE statement similar to this:

UPDATE
  T1
SET
  NON_NUMERIC_NUMERIC=(
    SELECT
      NON_NUMERIC_NUMERIC
    FROM
      T1_TEMP TT
    WHERE
      T1.PRIMARY_KEY=TT.PRIMARY_KEY)
WHERE
  T1.PRIMARY_KEY IN (
    SELECT
      PRIMARY_KEY
    FROM
      T1_TEMP);

With the data successfully transferred into an Oracle database table, the programming continues.  That brings me to the next article in this series, the internal conflicts of the “best” way to accomplish the programming task.


Actions

Information

3 responses

26 08 2012
Mladen Gogala

I’ve had tasks like that. My solution was Perl. Spreadsheet::ParseExcel module can read all known versions of Excel and putting the result into the database was a breeze.

27 08 2012
Charles Hooper

Mladen,

Thank you for sharing the approach that you used to perform a similar data transfer. It is helpful for people to share their experiences of what works and what does not.

I have never worked with Perl, but I understand that the language is well regarded (one of the “P”s in the abbreviation LAMP http://en.wikipedia.org/wiki/LAMP_(software_bundle) ). I wonder how well the Spreadsheet::ParseExcel module would do with the Excel 2.0 format – Microsoft Access cannot directly link to that data format (if I remember correctly), but that does not mean that other products cannot do the backward format integration more completely.

5 09 2012
Tony Sleight

I have started to use perl quite a lot in my current role and have found it most useful. We are currently researching a tool called Pentaho, which allows you to build a task from various products and bring them together as a kind of script which can be executed via windows task scheduler on a regular basis.
One example we have used it for is to extract XML from a database package it into a zip file and send the zip file off to various locations. The zip file is saved in a folder for one month as a simple archive process. Pentaho sequences the PL/SQL, zip file creation, zip file copy and archive maintenance. A log file is created and the user can create individual actions on failures of any of the sequence steps.

Leave a comment