Name that Table’s Column

9 11 2012

November 9, 2012

I have not had a lot of time to browse through forums lately, but I noticed an interesting thread in the comp.databases.oracle.server Usenet group.  The OP in the thread was curious why quotes (double quotes) were required around a particular column when referenced in a SQL statement, because specifying quotes around the column name is apparently a little challenging in the PHP scripting language.

I thought about this issue a bit, wondering “how did that happen” and then thought about the benefits of this approach.  In theory, a sophisticated programmer could build a 16 column table using a single four character column name (with different letters capitalized).  Such an approach is sure to evoke a couple of choice four letter words!

I thought that I would throw together a quick example table:

CREATE TABLE T1(
  My_Data NUMBER,
  "My_Data" NUMBER,
  "my_data" NUMBER,
  "MY_DATA" NUMBER);

That almost worked:

SQL> CREATE TABLE T1(
  2    My_Data NUMBER,
  3    "My_Data" NUMBER,
  4    "my_data" NUMBER,
  5    "MY_DATA" NUMBER);
  "MY_DATA" NUMBER)
  *
ERROR at line 5:
ORA-00957: duplicate column name

Columns 1 and 4 have the same name.  Let’s fix that problem and try again:

CREATE TABLE T1(
  My_Data NUMBER,
  "My_Data" NUMBER,
  "my_data" NUMBER,
  "MY_DATa" NUMBER);

Table created.

That worked, now we have a table with four columns, where all of the column names are the same.  Notice that the first column name was not wrapped in quotes.

Let’s insert a row into the table:

INSERT INTO T1 VALUES(
  1,
  2,
  3,
  4);

1 row created.

Let’s see what happens when we query the table:

SELECT
  *
FROM
  T1
WHERE
  MY_DATA=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

Notice that a row was returned, even though the upper/lower case of the column in the WHERE clause did not match the first column name when the T1 table was created (when the table was created, for the first column, the column name was created as if were specified in all uppercase letters).

Let’s see if a row is returned if we try the following (notice that the attempt at querying with the fourth column based on matching that column’s capitalization in the WHERE clause failed to restrict the query results based on the contents of the fourth column in the table):

SELECT
  *
FROM
  T1
WHERE
  MY_DATa=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

Let’s try again, this time wrapping the column name found in the WHERE clause in quotes:

SELECT
  *
FROM
  T1
WHERE
  "MY_DATA"=1;

   MY_DATA    My_Data    my_data    MY_DATa
---------- ---------- ---------- ----------
         1          2          3          4

A row was still returned (as expected).  So then, how do we determine which rows in the table have a value of 1 in the fourth column?  We need to place the column name found in the WHERE clause within quotes as follows:

SELECT
  *
FROM
  T1
WHERE
  "MY_DATa"=1;

no rows selected

Notice that this time no rows were returned (as intended, and expected).

I guess that the lesson here is to be careful when creating tables in Oracle Database.  Just because other database products may place column names within quotes as standard practice, that does not indicate that you, as the developer, should continue the practice when working with Oracle Databases.  Now get out there and create some 16 column tables with choice four letter words.  :-)





oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0

22 10 2012

October 22, 2012

Lately, I have been busy with a couple of items that are not completely Oracle Database specific.  I tried experimenting with high definition 1080p security cameras.  After wasting an unacceptable amount of money to purchase two Y-Cam Wireless High Definition 1080p video cameras (model YCBLHD6), and spending in excess of 38 hours fighting with the cameras to make them work as advertised, I wrote a video review of the product.  After writing a review, Y-Cam’s tech support representative contacted me again, so I pulled the cameras back out of the box and wasted another five hours trying to make the cameras work wirelessly – a feature that was lost after installing the third firmware version on the cameras.  The cameras magically connected wirelessly very early the next morning – but wireless communiation was only one of several problems that I was experiencing with this model of security camera.  What might cause a procedure to work magically when that same procedure failed several times in the past – and might this somehow have an odd relation to Oracle Database?  This is the explanation that I provided to Y-Cam’s tech support representative and posted as a follow up on Y-Cam’s support forum:

The camera’s Ethernet port and wireless adapter both have the same hardcoded MAC address. This MAC address is used by network switches to determine which devices are attached to each of the switch ports so that network traffic addressed to a specific MAC address is only sent down the switch port where that MAC address is known to be connected. By contrast a network hub broadcasts network traffic to all ports, rather than just the port to which a hardware device’s MAC address is attached; a hub is less likely to have an issue with two different network adapters having the same MAC address (Y-CAM support reported that they confirmed that the camera with the latest firmware worked with a non-broadcasted WPA2 protected wireless network). The network switch maintains a MAC address table for each switch port – when the camera’s Ethernet port is disconnected and the camera is power cycled, the switch’s MAC address tables may not be immediately updated to remove the camera’s MAC address from the wired Ethernet port so that the MAC address may be added to the switch port used for wireless connectivity. Power cycling the network switch clears the switch’s MAC address tables, allowing the camera’s MAC address to be added to the switch port used for wireless connectivity.

In short, it is necessary to power cycle the switch (or the Cisco Linksys E2000 router acting as only a wireless access point with integrated 4 port gigabit switch in my case) when moving the camera from a wired Ethernet connection to a wireless connection.

Makes a bit of sense… now if only motion detection worked as advertised and stability improved to the point that the camera does not require two or three reboots per day.  Thankfully, most other consumer products tend to work out of the box; imagine the ramifications if every time the weather conditions included rain, a vehicle would sputter, shake violently, and reduce power to 10% of normal when the driver attempted to accelerate.

Changing gears a bit.  You might have noticed that the title of this article appears to be some sort of error message, probably produced by an Oracle product.  I am in the process of writing a program that interfaces with an Oracle Database, rapidly retrieving a variety of information from various database tables – using bind variables in the SQL statements, of course:

The problem?  My program seems to crash randomly on computers running either the 32 bit or 64 bit version of Windows 7 (the program works without issue on computers running the 32 bit version of Windows XP).  The randomness of the crashes makes it a bit difficult to troubleshoot this particular problem, but the vast majority of the crashes are associated with populating the treeview control.  Is the crash caused by a typo (bug) in my program logic?  Is the crash caused by the Windows 7 version of the treeview control?  Is the crash caused by Oracle Database 11.2.0.2, as a result of receiving several different SQL statements in rapid-fire fashion?  Could there be another explanation?

To start the investigation, I first enabled a 10046 trace at level 12.  I then entered an employee ID and started selecting different date ranges until the program crashed.  The crash message varied (as did the point at which the crash happened), and while Windows offered to contact the developer to report the error, I declined the offer.  Unfortunately, I did not capture any of the exact error messages, although I did open one of the debug files created during a crash – “Unhandled exception at 0x0fc01b3d”:

mov cx, word ptr [edi] – Thanks Windows, I think that I will go back to the safety of the 10046 trace files (my best guess is that this instruction is attempting to copy a value from a variable memory pointer).

Digging into the 10046 trace file, I found that this was the last SQL statement executed by my program before the crash:

CLOSE #1301509800:c=0,e=10,dep=0,type=1,tim=4032143047926
=====================
PARSING IN CURSOR #1301508512 len=1452 dep=0 uid=172 oct=3 lid=172 tim=4032143048002 hv=929599856 ad='3eb8207a0' sqlid='4y6m7dsvqj4bh'
SELECT 
  EMPLOYEE_ID, 
  SHIFT_DATE, 
  CLOCK_IN, 
  CLOCK_OUT, 
  HOURS_WORKED, 
  HOURS_VACATION, 
  HOURS_HOLIDAY, 
  HOURS_BEREAVEMENT, 
  HOURS_JURY_DUTY, 
  DECODE(SHIFT_DATE_CODE,'VAC',1,0) VACATION, 
  DECODE(SHIFT_DATE_CODE,'HOL',1,0) HOLIDAY, 
  DECODE(SHIFT_DATE_CODE,'BER',1,0) BEREAVEMENT, 
  DECODE(SHIFT_DATE_CODE,'JUR',1,0) JURY, 
  DECODE(SHIFT_DATE_CODE,'ABS',1,0) ABSENT, 
  DECODE(SHIFT_DATE_CODE,'EXC',1,0) EXCUSED, 
  SUM(DECODE(SHIFT_DATE_CODE,'VAC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) VACATION_COUNT, 
  SUM(HOURS_VACATION) OVER (PARTITION BY EMPLOYEE_ID) VACATION_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'HOL',1,0)) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_COUNT, 
  SUM(HOURS_HOLIDAY) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'BER',1,0)) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_COUNT, 
  SUM(HOURS_BEREAVEMENT) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'JUR',1,0)) OVER (PARTITION BY EMPLOYEE_ID) JURY_COUNT, 
  SUM(HOURS_JURY_DUTY) OVER (PARTITION BY EMPLOYEE_ID) JURY_HOURS, 
  SUM(DECODE(SHIFT_DATE_CODE,'ABS',1,0)) OVER (PARTITION BY EMPLOYEE_ID) ABSENT_COUNT, 
  SUM(DECODE(SHIFT_DATE_CODE,'EXC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) EXCUSED_COUNT 
FROM 
  PAY_HOURS
WHERE 
  EMPLOYEE_ID= :1 
  AND SHIFT_DATE BETWEEN :2 AND :3 
  AND COALESCE(SHIFT_DATE_CODE,'ZZZ')<>'NOL' 
ORDER BY 
  COALESCE(SHIFT_DATE_CODE,'ZZZ'), 
  SHIFT_DATE
END OF STMT
PARSE #1301508512:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048001
BINDS #1301508512:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=4d936d40  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=4d936d60  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=4d936d68  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1301508512:c=0,e=129,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048323
WAIT #1301508512: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143048369
FETCH #1301508512:c=0,e=33,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048448
STAT #1301508512 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=43 us cost=6 size=50 card=1)'
STAT #1301508512 id=2 cnt=0 pid=1 pos=1 obj=0 op='WINDOW BUFFER (cr=3 pr=0 pw=0 time=34 us cost=6 size=50 card=1)'
STAT #1301508512 id=3 cnt=0 pid=2 pos=1 obj=0 op='FILTER  (cr=3 pr=0 pw=0 time=21 us)'
STAT #1301508512 id=4 cnt=0 pid=3 pos=1 obj=70154 op='TABLE ACCESS BY INDEX ROWID PAY_HOURS (cr=3 pr=0 pw=0 time=20 us cost=4 size=50 card=1)'
STAT #1301508512 id=5 cnt=0 pid=4 pos=1 obj=71786 op='INDEX RANGE SCAN SYS_C0022057 (cr=3 pr=0 pw=0 time=20 us cost=3 size=0 card=1)'
WAIT #1301508512: nam='SQL*Net message from client' ela= 1448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050020
PARSE #1301509800:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=266456858,tim=4032143050081
WAIT #1301509800: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050142

*** 2012-10-19 09:43:33.522
WAIT #1301509800: nam='SQL*Net message from client' ela= 5761077 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032148811261  

Great, I will take a closer look at the part of my program that executes that SQL statement.  But wait, someone famous once said: “Insanity is repeating the same mistakes and expecting different results.”  Let’s run my program again with a 10046 trace at level 12 enabled.  This time, I found that the following was the last SQL statement executed by my program before the crash:

PARSING IN CURSOR #1078422680 len=1257 dep=0 uid=172 oct=3 lid=172 tim=4031964081030 hv=4064672182 ad='3a68828c8' sqlid='4cj96t7t4bydq'
SELECT 
  LT.TRANSACTION_ID, 
  WO.PART_ID, 
  LT.WORKORDER_BASE_ID, 
  LT.WORKORDER_LOT_ID, 
  LT.WORKORDER_SUB_ID, 
  LT.OPERATION_SEQ_NO, 
  LT.TYPE, 
  LT.INDIRECT_ID, 
  LT.RESOURCE_ID, 
  LT.CLOCK_IN, 
  NVL2(LT.HOURS_WORKED,LT.CLOCK_OUT,NULL) CLOCK_OUT, 
  NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)) HOURS_WORKED, 
  LT.GOOD_QTY, 
  COUNT(*) OVER (PARTITION BY LT.EMPLOYEE_ID) LABOR_TICKETS, 
  SUM(NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2))) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_HOURS, 
  SUM(DECODE(LT.TYPE,'I',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_INDIRECT_HOURS, 
  SUM(DECODE(LT.TYPE,'S',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_SETUP_HOURS, 
  SUM(DECODE(LT.TYPE,'R',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_RUN_HOURS 
FROM 
  LABOR_TICKET LT, 
  WORK_ORDER WO 
WHERE 
  LT.EMPLOYEE_ID= :1 
  AND LT.SHIFT_DATE BETWEEN :2 AND :3 
  AND WO.TYPE(+)='W' 
  AND WO.SUB_ID(+)='0' 
  AND LT.WORKORDER_TYPE=WO.TYPE(+) 
  AND LT.WORKORDER_BASE_ID=WO.BASE_ID(+) 
  AND LT.WORKORDER_LOT_ID=WO.LOT_ID(+) 
  AND LT.WORKORDER_SPLIT_ID=WO.SPLIT_ID(+)
END OF STMT
PARSE #1078422680:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3199062715,tim=4031964081030
BINDS #1078422680:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=40476838  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=40476858  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=40476860  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #1078422680:c=0,e=188,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4162394183,tim=4031964081369
WAIT #1078422680: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031964081423
FETCH #1078422680:c=0,e=79,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=4162394183,tim=4031964081539
STAT #1078422680 id=1 cnt=1 pid=0 pos=1 obj=0 op='WINDOW BUFFER (cr=4 pr=0 pw=0 time=81 us cost=9 size=101 card=1)'
STAT #1078422680 id=2 cnt=1 pid=1 pos=1 obj=0 op='FILTER  (cr=4 pr=0 pw=0 time=35 us)'
STAT #1078422680 id=3 cnt=1 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 time=33 us cost=8 size=101 card=1)'
STAT #1078422680 id=4 cnt=1 pid=3 pos=1 obj=69863 op='TABLE ACCESS BY INDEX ROWID LABOR_TICKET (cr=4 pr=0 pw=0 time=28 us cost=7 size=73 card=1)'
STAT #1078422680 id=5 cnt=1 pid=4 pos=1 obj=70966 op='INDEX RANGE SCAN IND_LT_SHIFT_DATE (cr=3 pr=0 pw=0 time=21 us cost=6 size=0 card=1)'
STAT #1078422680 id=6 cnt=0 pid=3 pos=2 obj=70072 op='TABLE ACCESS BY INDEX ROWID WORK_ORDER (cr=0 pr=0 pw=0 time=3 us cost=1 size=28 card=1)'
STAT #1078422680 id=7 cnt=0 pid=6 pos=1 obj=71722 op='INDEX UNIQUE SCAN SYS_C0021989 (cr=0 pr=0 pw=0 time=2 us cost=0 size=0 card=1)'

*** 2012-10-19 09:40:38.424
WAIT #1078422680: nam='SQL*Net message from client' ela= 9641490 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031973723245 

A different SQL statement was found in the trace file when my program crashed… do the SQL statements have anything in common?  Both SQL statements are using analytic functions, and both are using date-type bind variables.  Interesting… what was that saying about insanity?  Let’s try another test of my program with an enabled 10046 trace to see the last SQL statement found in the trace before the crash:

PARSING IN CURSOR #881825936 len=210 dep=1 uid=0 oct=3 lid=0 tim=4030801312382 hv=864012087 ad='3eddb9a08' sqlid='96g93hntrzjtr'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
PARSE #881825936:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312381
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=2
EXEC #881825936:c=0,e=95,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312551
WAIT #881825936: nam='db file sequential read' ela= 6526 file#=1 block#=98252 blocks=1 obj#=427 tim=4030801319112
WAIT #881825936: nam='db file sequential read' ela= 759 file#=1 block#=100271 blocks=1 obj#=425 tim=4030801319923
FETCH #881825936:c=0,e=7396,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801319963
STAT #881825936 id=1 cnt=1 pid=0 pos=1 obj=425 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=2 pw=0 time=7394 us)'
STAT #881825936 id=2 cnt=1 pid=1 pos=1 obj=427 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=1 pw=0 time=6583 us)'
CLOSE #881825936:c=0,e=46,dep=1,type=3,tim=4030801320046
...
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=11
EXEC #881825936:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321733
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801321767
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801321802
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=33
EXEC #881825936:c=0,e=152,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321992
FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322027
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322062
BINDS #881825936:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0
  kxsbbbfp=348f64a0  bln=22  avl=04  flg=05
  value=166995
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24
  kxsbbbfp=348f64b8  bln=22  avl=02  flg=01
  value=35
EXEC #881825936:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801322195
FETCH #881825936:c=0,e=18,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322229
CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322265
WAIT #881751720: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030801323194

*** 2012-10-19 09:21:14.806
WAIT #881751720: nam='SQL*Net message from client' ela= 8851219 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030810174461 

You might have noticed that the trace file showed dep=1 in the PARSING IN CURSOR line, which indicates that the SQL statement was executed automatically outside of my program’s control – likely when attempting to optimize another SQL statement that was submitted by my program.

Let’s test that insanity theory again.  Another execution of my program with an enabled 10046 trace at level 12 showed this as the last SQL statement executed:

PARSING IN CURSOR #767713824 len=503 dep=0 uid=172 oct=3 lid=172 tim=4026166358219 hv=2534735629 ad='3b58041b0' sqlid='7x2t5dqbj9zsd'
SELECT 
  N.ID NCMT_ID, 
  N.NCMT_DATE, 
  N.PART_ID, 
  N.WORKORDER_BASE_ID, 
  N.WORKORDER_LOT_ID, 
  N.WORKORDER_PIECE_NO, 
  N.QTY, 
  N.DISCREPANCY_TYPE, 
  N.RESOURCE_ID, 
  N.CUSTOMER_ID, 
  NVL(N.TOTAL_COST,0) NCMT_COST, 
  COUNT(N.ID) OVER (PARTITION BY N.EMPLOYEE_ID) NCMT_COUNT, 
  SUM(NVL(N.TOTAL_COST,0)) OVER (PARTITION BY N.EMPLOYEE_ID) TOTAL_COST 
FROM 
  NON_CONFORMING N 
WHERE 
  N.EMPLOYEE_ID= :1 
  AND N.NCMT_DATE BETWEEN :2 AND :3 
  AND UPPER(N.ERROR_BY)='OPERATOR' 
ORDER BY 
  ID
END OF STMT
PARSE #767713824:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026166358219
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc25d58  bln=32  avl=06  flg=05
  value="DUDE"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc25d78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc25d80  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
...
PARSE #767713824:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045440
BINDS #767713824:
 Bind#0
  oacdty=96 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0
  kxsbbbfp=2dc21f50  bln=32  avl=07  flg=05
  value="DUDE2"
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32
  kxsbbbfp=2dc21f70  bln=07  avl=07  flg=01
  value="6/22/2012 0:0:0"
 Bind#2
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40
  kxsbbbfp=2dc21f78  bln=07  avl=07  flg=01
  value="10/19/2012 0:0:0"
EXEC #767713824:c=0,e=125,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045593
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237045634
FETCH #767713824:c=0,e=57,p=0,cr=10,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045727
WAIT #767713824: nam='SQL*Net message from client' ela= 3080 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237048857
CLOSE #767713824:c=0,e=7,dep=0,type=3,tim=4026237048899
PARSE #767713824:c=0,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237048951
WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237051177

*** 2012-10-19 08:06:40.151
WAIT #767713824: nam='SQL*Net message from client' ela= 98732762 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026335783988 

Well, that SQL statement also used date-type bind variables and analytic functions… maybe there is a problem with Oracle Database 11.2.0.2?  But, why is this suddenly a problem… what has changed?  Nearly two years ago I wrote a blog article that described an ODBC update problem that was present in the Oracle 11.2.0.1 Client – the first Oracle Client version that could be easily downloaded and installed on Windows 7 (it is also possible to patch the downloadable 10.2.0.3/4 client to 10.2.0.5).  The Oracle 11.2.0.1 patch 7 (10155837 for 32 bit Windows clients) fixes the false ODBC lock violation errors that are reported in Microsoft Access (and various other programs) when UPDATE SQL statements are executed, and those SQL statements contain bind variables.  That same Oracle Client version and patch version are still installed on the computer (and all other computers here that run Windows 7), so that is not the source of the change.  On a side note, have you found it difficult to locate updated 11.2.0.2 or 11.2.0.3 client versions in MOS (Metalink)?

So, the 10046 trace failed to find a consistent source for the crashes, and the Oracle client has not been changed on the Windows 7 computer in nearly two years.  Program bug?  Analytic query bug?  Bind variable related bug?  I went digging for the sqlnet.log file to see if that file provided any clues.  This is what I found:

Fri Oct 19 13:26:52 2012
Directory does not exist for read/write [C:\Oracle\product\11.2.0\client_1\log] [C:\Oracle\product\11.2.0\client_1\log\diag\clients]
Errors in file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc  (incident=401):
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
Incident details in: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc

That file’s contents proved to be a bit helpful, I think.  Let’s take a look at the contents of the file mentioned above:

Dump file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc
Dump continued from file: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc
oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0
] [] [] [] [] [] [] [] [] [] []
========= Dump for incident 401 (oci 24550 [3221225477]) ========
Tracing is in restricted mode!
----- Short Call Stack Trace -----
_dbgexPhaseII()+850<-_dbgexProcessError()+2061<-_dbgeExecuteForError()+43<-_dbgePostErrorDirect()+2368<-_kpeDbgSignalHandler()+225<-_skgesig_Win_UnhandledExceptionFilter()+140<-75D6003D<-7295FA2E<-0040AD0D<-77309EF0<-77309EC0<-00000000<-773079DD<-77306E72<-772FE0ED<-275C83DD<-275D219F<-275D1D88<-275D0E0E<-2759C677<-004CB035<-004DC4A3<-72991D31<-00410FD8<-7299202F----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
End of Incident Dump

Might the above indicate that my program logic is not the cause of the problem, but instead that it is the Oracle client that is crashing?  I started digging through various websites looking for a similar oci-24500 error message, but I found nothing that was helpful.  I searched through the bug database and technical article database on MOS (Metalink), and found nothing that was helpful.  By chance, I widened the search on MOS a bit and stumbled upon Mark Powell’s message in this Oracle Support Community thread (you need a MOS account to access the message).  That message pointed to two other MOS articles, at least one of which stated that the problem is (possibly – not exactly the same error message) corrected when the Oracle 11.2.0.3 Client is installed.  I struggled finding the 11.2.0.3 Client in MOS, and then found that it is located on disk 3 of 6 of the 11.2.0.3 server download (direct link – just specify operating system version and download disk 3 – p10404530_112030_platform_3of6.zip (requires a MOS account to download)).  The 11.2.0.3 client version magically corrected the seemingly random crashes in my programprogrammer error averted.

Insanity?  No more so than an expensive security camera that fails at motion detection, or a vehicle that decelerates when the driver commands the vehicle to accelerate during a light rain.  ;-)

P.S.: Sorry about the formatting of the blog article.  A WordPress update apparently disabled my custom CSS that expanded the article to the full width of the browser window – I am still in search of a fix for that problem.





Undo Quiz – Bringing the Right Tools for the Job

17 09 2012

September 17, 2012

Sometimes humor is a good recipe for undo.  A mid-50s man was driving down the road, riding a purple mule, when he saw a house with a new window.  Uninvited, the man decided to take a closer look using an obvious tool of choice, a tire iron (YPDNTI).  Elapsed time, roughly 60 seconds.  The man is obviously familiar with Oracle Database’s concept of transactions and issuing a ROLLBACK, but did not account for video recording in several autonomous transactions.  Even through a COMMIT was not issued, that does not imply that nothing happened in those 60 seconds. 

Riding a purple mule… strange.

On to the quiz.  Assume that there are three and only three users connected to an Oracle database (preferrably a test database of sorts).  The first session is connected as the SYS user, and the other two sessions are connected as a normal database user (the same user ID, but with access permissions for DBA_SOURCE).  Assume that session 1 creates a new undo tablespace that cannot grow beyond 1MB in size:

CREATE UNDO TABLESPACE SMALL_UNDO2 DATAFILE 'C:\Oracle\OraData\SAMPLE\small_undo2.dbf' SIZE 1M AUTOEXTEND OFF; 

And then that tablespace is set as the system default tablespace:

ALTER SYSTEM SET UNDO_TABLESPACE=SMALL_UNDO2;

In session 2, table T10 is created based on DBA_SOURCE:

CREATE TABLE T10 AS
SELECT
  *
FROM
  DBA_SOURCE;

In my case, I used the Oracle sample schema in Oracle Database 11.2.0.3 (tablespace for T10 is ASSM AUTOALLOCATE).  Let’s check the amount of space consumed (not necessarily all used for data strorage yet) for the T10 table:

SELECT
  ROUND(SUM(BYTES)/1024/1024,2) SIZE_MB
FROM
  USER_EXTENTS
WHERE
  SEGMENT_NAME='T10';

   SIZE_MB
----------
        88 

88MB – a nice even number.  With a 1MB maximum size for the undo tablespace, let’s see if the user is able to select all of the rows from the T10 table:

SET ARRAYSIZE 1000
SET AUTOTRACE TRACEONLY STATISTICS

SELECT
  *
FROM
  T10;

633054 rows selected.

Statistics
---------------------------------------------------
         11  recursive calls
          0  db block gets
      11352  consistent gets
      10635  physical reads
          0  redo size
   66576339  bytes sent via SQL*Net to client
       7322  bytes received via SQL*Net from client
        635  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     633054  rows processed 

That worked, but why?  Why didn’t the session receive an error when performing the 11,352 consistent gets to provide a consistent view of the data?

Maybe on the second execution we will see an error?

SELECT
  *
FROM
  T10;

633054 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11271  consistent gets
          0  physical reads
          0  redo size
   66576339  bytes sent via SQL*Net to client
       7322  bytes received via SQL*Net from client
        635  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     633054  rows processed

—–

The first quiz question is simple: How it is possible for Oracle Database to provide a read consistent view of a table that requires approximately 88MB of storage space when the file supporting the undo tablespace cannot exceed 1MB?  Maybe someone will be kind enough to share the answer with the OP in this message thread:
dbaforums.org/oracle/index.php?s=439a689327b6fbd09ba0018b320041d8&showtopic=21972

The second quiz question is also simple: Why did the first select of the table require 81 more consistent gets than the second select from that table?

—–

Continuing with the test case, this time working in session 3.  Let’s delete some rows from the table created in session 2:

DELETE FROM
  T10
WHERE
  ROWNUM<=300000; 

Oracle Database responses with the following message:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=300000;
  T10
  *
ERROR at line 2:
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO2'

The third question is slightly more challenging: Why is Oracle attempting to extend a segment when the session is deleting rows; also, what does the number 8 signify?

Let’s try again with a smaller number of rows:

DELETE FROM
  T10
WHERE
  ROWNUM<=30;

30 rows deleted. 

That worked.  Interestingly, the first time I tried the test case, I named the tablespace SMALL_UNDO and specified “RETENTION GUARANTEE” when creating the undo tablespace, and the following message appeared when executing the above command:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=30;
  T10
  *
ERROR at line 2:
ORA-30036: unable to extend segment by 8 in undo tablespace 'SMALL_UNDO'

After the above message appeared, session 3 successfully deleted 3 rows, 30 rows, and then 300 rows from table T10 without receiving an error:

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=3;

3 rows deleted.

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=30;

30 rows deleted.

SQL> DELETE FROM
  2    T10
  3  WHERE
  4    ROWNUM<=300;

300 rows deleted. 

The fourth question is again slightly more challenging: why was the 30 row delete from table T10 successful when the SMALL_UNDO2 undo tablespace was the default undo tablespace, yet that delete failed when the SMALL_UNDO tablespace was set as the default tablespace – until 3 rows were first deleted from the table?

Going back to the forum thread, another interesting question arises.  The fifth question is again challenging: What type of compression is used in undo logs?

I was curious about the last question.  In session 1 I executed the following:

SELECT
  R.NAME,
  T.XIDUSN,
  T.XIDSLOT,
  T.XIDSQN
FROM
  V$TRANSACTION T,
  V$ROLLNAME R
WHERE
  T.XIDUSN=R.USN; 

NAME                               XIDUSN    XIDSLOT     XIDSQN
------------------------------ ---------- ---------- ----------
_SYSSMU15_897375467$                   15          1          8

ALTER SESSION SET TRACEFILE_IDENTIFIER='UNDO_BLOCKS';
ALTER SYSTEM DUMP UNDO BLOCK '_SYSSMU15_897375467$' XID 15 1 8;

A portion of the generated trace file follows:

********************************************************************************
Undo Segment:  _SYSSMU15_897375467$ (15)
xid: 0x000f.001.00000008
Low Blk   :   (0, 0) 
High Blk  :   (2, 7) 
Object Id :   ALL 
Layer     :   ALL 
Opcode    :   ALL 
Level     :   2 

********************************************************************************
UNDO BLK:  Extent: 1   Block: 1   dba (file#, block#): 6,0x00000071
xid: 0x000f.001.00000008  seq: 0x4   cnt: 0x2c  irb: 0xa   icl: 0x0   flg: 0x0000

 Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x1f24     0x02 0x1e64     0x03 0x1dbc     0x04 0x1d34     0x05 0x1c68     
0x06 0x1b9c     0x07 0x1ae4     0x08 0x1a5c     0x09 0x1998     0x0a 0x18d8     
0x0b 0x1830     0x0c 0x17a8     0x0d 0x16f0     0x0e 0x162c     0x0f 0x15a4     
0x10 0x14ec     0x11 0x1424     0x12 0x1378     0x13 0x12f0     0x14 0x1234     
0x15 0x1168     0x16 0x10b8     0x17 0x1030     0x18 0x0f74     0x19 0x0ecc     
0x1a 0x0e44     0x1b 0x0d88     0x1c 0x0cbc     0x1d 0x0be8     0x1e 0x0b44     
0x1f 0x0a88     0x20 0x09bc     0x21 0x08e8     0x22 0x0840     0x23 0x0784     
0x24 0x06b8     0x25 0x05e4     0x26 0x0540     0x27 0x0484     0x28 0x03b8     
0x29 0x02e4     0x2a 0x023c     0x2b 0x0180     0x2c 0x00b4     

*-----------------------------
* Rec #0xa  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x09   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.09
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 16(0x10) size/delt: 89
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1f
col  4: [60]
 20 20 20 20 70 72 61 67 6d 61 20 42 55 49 4c 54 49 4e 28 27 58 4f 52 27 2c
 38 2c 20 33 2c 20 39 29 3b 20 2d 2d 20 50 45 4d 53 5f 49 4e 54 45 47 45 52
 2c 20 49 4e 54 5f 58 4f 52 0a

*-----------------------------
* Rec #0x9  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x08   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.08
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 15(0xf) size/delt: 90
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1e
col  4: [61]
 20 20 66 75 6e 63 74 69 6f 6e 20 58 4f 52 20 28 4c 45 46 54 20 42 4f 4f 4c
 45 41 4e 2c 20 52 49 47 48 54 20 42 4f 4f 4c 45 41 4e 29 20 72 65 74 75 72
 6e 20 42 4f 4f 4c 45 41 4e 3b 0a

*-----------------------------
* Rec #0x8  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x07   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.07
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 14(0xe) size/delt: 30
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1d
col  4: [ 1]  0a

*-----------------------------
* Rec #0x7  slt: 0x01  objn: 79928(0x00013838)  objd: 79928  tblspc: 4(0x00000004)
*       Layer:  11 (Row)   opc: 1   rci 0x06   
Undo type:  Regular undo   Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo 
op: 0x02  ver: 0x01  
compat bit: 4 (post-11) padding: 0
op: C  uba: 0x01800071.0004.06
KDO Op code: IRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x01091a2e  hdba: 0x01091a2a
itli: 2  ispac: 0  maxfr: 4858
tabn: 0 slot: 13(0xd) size/delt: 78
fb: --H-FL-- lb: 0x0  cc: 5
null: -----
col  0: [ 3]  53 59 53
col  1: [ 8]  53 54 41 4e 44 41 52 44
col  2: [ 7]  50 41 43 4b 41 47 45
col  3: [ 3]  c2 04 1c
col  4: [49]
 20 20 2d 2d 20 20 77 68 65 6e 20 74 68 65 79 20 6f 63 63 75 72 20 69 6e 20
 63 6f 6e 64 69 74 69 6f 6e 61 6c 20 73 74 61 74 65 6d 65 6e 74 73 2e 0a 

It would not surprise me too much if a guide existed to assist in decompressing the undo logs (whatever that term implies).





On the Topic of Programming 3

3 09 2012

September 3, 2012

(Back to the Previous Post in the Series)

In the previous article in this series, a question was posed regarding how many SQL statements would be required to generate the following report, which displays in Microsoft Excel:

A significant number of items regarding the report require explanation:

  • Employee IDs and employee names appear in column A (the left-most column).  There could be many employees, but only those employees reporting labor transactions for machining operations in production batches (work order lots) that completed during one of the weeks should be listed.
  • Part IDs, part descriptions, and operations (Sub n OP nn in the above example report) are listed in column B (the second column from the left).  The number of operations listed for an employee could range from one to possibly 50 (or more, depending on the date range of the report), but that list will only include machining type operations (not welding, blasting, painting, laser cutting, etc.).
  • Total hours are reported in column C.  The second output row for an employee shows the sum of the numbers directly below for the employee.  The numbers on the 3rd, 6th, 9th, etc. rows for the employee in column C are the sum of the hours the employee worked in the operation for all manufacturing batches that completed in the report time period.
  • The first row for each employee, in columns E though H in the above report (could very well extend beyond column H, depending on the number of weeks selected), are the Monday through Sunday dates for the week.
  • The second row for each employee, in columns E through H in the above report show the weighted average efficiency for the employee in that week.  The weighting is based on the percentage of hours the employee spent in the operations, for those operations in batches that completed during the week.  This number of hours is calculated, but not printed in the report.
  • The numbers on the 4th, 7th, 10th, etc. rows for each employee, in columns E through H, shows the average number of hours the operation required, for those batches that completed during the week.  If an employee did not work in the operation one week, but did another, then a blank should appear in the week that the employee did not work in the operation.  All employees that worked in the operation for the week will have the same average printed.  For example, Cary Grant and James Stewart both worked on WIDGET101 operation Sub 0 OP 30.  The average hours for the first week is 10.50, and that value appears for both employees.  During the second week, only James Stewart worked on that operation, and the average hours for that week decreased to 7.56.
  • The numbers of on the 3rd, 6th, 9th, etc. rows for each employee, shows the ratio of the engineering standard hours divided by the average hours for the operation in that week.  The same values will appear for all employees reporting time to the operation in the week.
  • The percentage in column I is the simple average of the weighted averages to the left.  There may be causes where a blank will appear for one of the weighted averages for an employee, so it is not possible to simply sum the weighted averages and divide by the number of weeks.

This appeared to be a simple report when it was just a sketch on a sheet of paper!

How many SQL statement executions will be necessary to generate the above report?  Before answering that question, let’s take another look at how the data is organized and related between the tables:

OK, so the above diagram might be a little confusing.

The engineering standard hours are found by querying the OPERATION table with the WORKORDER_TYPE=’W’ and the WORKORDER_BASE_ID equal to the part ID that is of interest (the WORK_ORDER table also contains engineering standard information, accessed by searching for TYPE=’M’ and the BASE_ID equal to the part ID tht is of interest).  In theory, there could be multiple engineering standards for the same part ID and operation, which is why we might need to join to the PART table to make certain that the current engineering standard is retrieved, but we will ignore that potential issue for now.  It would be a good idea to also check the WORK_ORDER table to obtain a distinct list of part IDs that had a close date in the time range that is of interest (there is no sense in forcing the database RDBMS to retrieve the engineering standards for parts that have not been produced in the last 17 years), so we will need to specify that the TYPE column in the WORK_ORDER table is equal to W and that the CLOSE_DATE is in the date range of the report.  Many operations are NOT machining operations, and we can only determine the type of operation by determining the type machine that is selected for use, so we must also query the SHOP_RESOURCE table to determine if the operation is processed at a machining center.

The actual production hours are found by querying the LABOR_TICKET table with the the WORKORDER_TYPE=’W’ (to eliminate spurious indirect labor) and joined to the SHOP_RESOURCE table to make certain that the operation is a machining type operation (as explained above).  To determine the employees’ name, the LABOR_TICKET table is joined to the EMPLOYEE table.  To determine the part ID that is produced in the labor ticket transaction, the LABOR_TICKET table must be joined to the WORK_ORDER table, with the TYPE column in the WORK_ORDER table set to ‘W’ and the SUB_ID column in the WORK_ORDER table set to ‘0’ so that the main part ID for the work order is returned rather than a subordinate part ID.  To retrieve the description of the part ID that is produced, the WORK_ORDER table must be joined to the PART table.

That sketch of a report on a sheet of paper sure looked simple, but the degree of difficulty is all relative (or relational).

How many query executions will be required to produce the report that will appear in Microsoft Excel?  Should we retrieve the entire database over the network and process the data client-side?  I don’t program in the Java programming language, but I am guessing that the majority of Java programmers would not attempt to retrieve the entire database.   Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?

In the words of the TV show “Name that Tune“, I can name that tune in 1 note.  Because there are many to many joins between the various table, it is not possible to construct a single simple SQL statement that retrieves the correct result.  However, it is possible to construct a single complex query using inline views or subquery factoring (WITH blocks), along with analytic functions to retrieve the correct, necessary information to the client for presentation in Microsoft Excel.  The correct answer to the above question must:

  • Minimize the amount of data flowing across the network.  SQL*Net compression may help, but of course that can only possibly help if more than one row is retrieved at a time.  Ideally, the array fetch size should be set to a large value (possibly 100 up to the maximum of 5,000) to make the best use of SQL*Net compression.
  • Minimize as best as possible the performance impact on the database server, the database server is a shared resource without unlimited capacity.  Setting a reasonably large array fetch size may help reduce the number of memory accesses (specifically consistent gets) on the database server.  Make certain that available indexes are usable by the query – be careful about using functions (such as TRUNC) on date type columns in the WHERE clause.
  • When a query accesses many tables, and also when there are multiple inline views (or factored subqueries), the Oracle optimizer may not always find the optimal join order.  The _OPTIMIZER_MAX_PERMUTATIONS hidden parameter defaults to a value of 2,000, which of course restricts the number of possible join permutations attempted per query block in a query, so with many tables in a query block (more than 6, for instance), the limit for the maximum number of permutations may easily be hit.  Statistics must be reasonably accurate for cardinality estimates to be close to accurate, otherwise the query optimizer may select not only an inappropriate join order but also an inappropriate join type.  Histograms may help (or harm) the query optimizer, indexes may help (or harm), establishing default cardinality and cost for custom PL/SQL functions may help, and adding hints to help control the cardinality estimates or join orders may also help.
  • Lastly, and most importantly, the queries must provide correct information.

I struggled with this problem a bit.  The report is intended to measure efficiency, so it would be somewhat laughable if the report performed inefficiently.  How could I minimize execution time for this particular report?  If Oracle Database is licensed by CPU core (or CPU socket in the case of the Standard Edition), does it make sense to perform all of the computations on the database server, or does it make sense to off-load some of that processing to the client’s dual, quad, or eight core CPU?  If the report were rendered by a shared resource, such as a web server, application server, or Citrix server, would I want that same CPU offloading as I would have planned with a dedicated client computer?  The report shows the information in a (modified) pivot table format, should I use the PIVOT function that was introduced with Oracle Database 11.1; or maybe I should use client-side control break logic that was taught in one of my university-level structured programming classes years ago?

That sketch of a report on a sheet of paper sure looked simple, now I need a control break (must be simple one key solution for that problem – check your keyboard :-) ).

It is possible to group the manufacturing lots by the Monday of the work week of the completion date (CLOSE_DATE) with the following function call:

NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7)

Looks fairly simple.  If I had a date range of four weeks in mind, I might be inclined to do something like this in the WHERE clause of the query:

CLOSE_DATE BETWEEN (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-28) AND (NEXT_DAY(TO_DATE('20-AUG-2012','DD-MON-YYYY'),'MONDAY')-1)

However, I did not do that.  Why not?  One reason is that the Oracle query optimizer would likely have a hard time determining the approximate number of rows that would be returned from the table with the CLOSE_DATE column.  The NEXT_DAY(TO_DATE()) combination hides the actual date range of interest, so the optimizer falls back to using default cardinality percentages for that predicate.  Secondly, what happens when the user of the program is interested in something other than four weeks – the solution may be cumbersome to maintain.  Thirdly, the above did not make use of bind variables, so every time the date range is changed, the query optimizer will perform a hard parse of the query.  A better approach calculates the date range on the client-side and submits the WHERE clause like this:

CLOSE_DATE BETWEEN TO_DATE('30-JUL-2012','DD-MON-YYYY') AND TO_DATE('26-AUG-2012','DD-MON-YYYY')

Or, better yet using bind variables:

CLOSE_DATE BETWEEN :START_DATE AND :END_DATE

So, how many query executions will this report require?  Just one query, executed a single time.  The particular programming environment that I used requires ? to appear in bind variable positions, so when you see a ? just read it as either :START_DATE or :END_DATE.

SELECT
  L.EMPLOYEE_ID,
  E.LAST_NAME||', '||E.FIRST_NAME EMPLOYEE_NAME,
  L.PART_ID,
  L.WORKORDER_SUB_ID,
  L.OPERATION_SEQ_NO,
  L.DESCRIPTION,
  L.CLOSE_WEEK,
  L.PART_EFF_WEEK,
  L.AVG_HRS_PC,
  L.EMP_ACCUM_EFF_WEEK,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID) EMP_HOURS_WORKED,
  SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.PART_ID, L.WORKORDER_SUB_ID,L.OPERATION_SEQ_NO) EMP_PART_HOURS_WORKED,
  SUM(L.EMP_ACCUM_EFF_WEEK) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_EFF_WEEK
FROM
  (SELECT
    O.PART_ID,
    O.DESCRIPTION,
    O.WORKORDER_SUB_ID,
    O.OPERATION_SEQ_NO,
    O.CLOSE_WEEK,
    O.RECEIVED_QTY,
    O.ACT_RUN_HRS,
    L.EMPLOYEE_ID,
    L.HOURS_WORKED,
    ROUND(O.ACT_RUN_HRS/O.RECEIVED_QTY,2) AVG_HRS_PC,
    ENG.ENG_HRS,
    ROUND(DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) PART_EFF_WEEK,
    SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK) EMP_HRS_WEEK,
    ROUND(L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK)),4) EMP_PERCENT_WEEK,
    ROUND((L.HOURS_WORKED/(SUM(L.HOURS_WORKED) OVER (PARTITION BY L.EMPLOYEE_ID,L.CLOSE_WEEK))) * DECODE((O.ACT_RUN_HRS/O.RECEIVED_QTY),0,1,ENG.ENG_HRS/(O.ACT_RUN_HRS/O.RECEIVED_QTY)),4) EMP_ACCUM_EFF_WEEK
  FROM
    (SELECT
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(RECEIVED_QTY) RECEIVED_QTY,
      SUM(ACT_RUN_HRS) ACT_RUN_HRS
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.RECEIVED_QTY,
        WO.CLOSE_DATE,
        SUM(O.ACT_SETUP_HRS+O.ACT_RUN_HRS) ACT_RUN_HRS,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO OPERATION_SEQ_NO,
        P.DESCRIPTION
      FROM
        WORK_ORDER WO,
        PART P,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND WO.PART_ID=P.ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.RECEIVED_QTY,
        WO.PART_ID,
        O.WORKORDER_SUB_ID,
        O.SEQUENCE_NO,
        P.DESCRIPTION)
    GROUP BY
      PART_ID,
      DESCRIPTION,
      WORKORDER_SUB_ID,
      OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) O,
    (SELECT /*+ LEADING(WO LT) */
      WO.PART_ID,
      LT.EMPLOYEE_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7 CLOSE_WEEK,
      SUM(LT.HOURS_WORKED) HOURS_WORKED
    FROM
      (SELECT
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID
      FROM
        WORK_ORDER WO,
        OPERATION O,
        SHOP_RESOURCE SR
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
        AND WO.TYPE=O.WORKORDER_TYPE
        AND WO.BASE_ID=O.WORKORDER_BASE_ID
        AND WO.LOT_ID=O.WORKORDER_LOT_ID
        AND WO.SPLIT_ID=O.WORKORDER_SPLIT_ID
        AND O.RESOURCE_ID=SR.ID
        AND SR.BUILDING_ID='Machine'
        AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      GROUP BY
        WO.TYPE,
        WO.BASE_ID,
        WO.LOT_ID,
        WO.SPLIT_ID,
        WO.CLOSE_DATE,
        WO.PART_ID) WO,
      LABOR_TICKET LT,
      SHOP_RESOURCE SR
    WHERE
      WO.TYPE=LT.WORKORDER_TYPE
      AND WO.BASE_ID=LT.WORKORDER_BASE_ID
      AND WO.LOT_ID=LT.WORKORDER_LOT_ID
      AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
      AND LT.TYPE IN ('R','S')
      AND LT.HOURS_WORKED<>0
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
      AND LT.RESOURCE_ID=SR.ID
    GROUP BY
      WO.PART_ID,
      LT.WORKORDER_SUB_ID,
      LT.OPERATION_SEQ_NO,
      LT.EMPLOYEE_ID,
      NEXT_DAY(TRUNC(CLOSE_DATE),'MONDAY')-7) L,
    (SELECT /*+ LEADING(WO) */
      WO2.TYPE,
      WO2.BASE_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO OPERATION_SEQ_NO,
      SUM(O.SETUP_HRS+O.RUN_HRS) ENG_HRS,
      WO.PART_ID
    FROM
      (SELECT
        WO.PART_ID
      FROM
        WORK_ORDER WO
      WHERE
        WO.TYPE='W'
        AND WO.SUB_ID='0'
        AND WO.CLOSE_DATE BETWEEN ? AND ?
        AND WO.RECEIVED_QTY>0
      GROUP BY
        WO.PART_ID) WO,
      WORK_ORDER WO2,
      OPERATION O,
      SHOP_RESOURCE SR
    WHERE
      WO2.TYPE='M'
      AND WO.PART_ID=WO2.BASE_ID
      AND WO2.LOT_ID='0'
      AND WO2.SPLIT_ID='0'
      AND WO2.SUB_ID='0'
      AND WO2.TYPE=O.WORKORDER_TYPE
      AND WO2.BASE_ID=O.WORKORDER_BASE_ID
      AND WO2.LOT_ID=O.WORKORDER_LOT_ID
      AND WO2.SPLIT_ID=O.WORKORDER_SPLIT_ID
      AND O.RESOURCE_ID=SR.ID
      AND SR.BUILDING_ID='Machine'
      AND SR.LOGICAL_GROUP IN ('10 TURNING CENTERS','20 VERTICAL M.C.','30 630MM PALLET HMC','40 800MM-1000MM PALLET HMC','50 1000MM PALLET HMC','55 LARGE 2-TABLE HMC/HBM','60 SINGLE TABLE HBM','60 SINGLE TABLE HDM','70 LARGE FLOOR MACHINES')
    GROUP BY
      WO2.TYPE,
      WO2.BASE_ID,
      WO.PART_ID,
      O.WORKORDER_SUB_ID,
      O.SEQUENCE_NO) ENG
  WHERE
    O.PART_ID=L.PART_ID
    AND O.WORKORDER_SUB_ID=L.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=L.OPERATION_SEQ_NO
    AND O.CLOSE_WEEK=L.CLOSE_WEEK
    AND O.PART_ID=ENG.PART_ID
    AND O.WORKORDER_SUB_ID=ENG.WORKORDER_SUB_ID
    AND O.OPERATION_SEQ_NO=ENG.OPERATION_SEQ_NO) L,
  EMPLOYEE E
WHERE
  L.EMPLOYEE_ID=E.ID
ORDER BY
  L.EMPLOYEE_ID,
  L.PART_ID,
  TO_NUMBER(L.WORKORDER_SUB_ID),
  L.OPERATION_SEQ_NO,
  L.CLOSE_WEEK;

I suppose that I could have performed more of the processing on the database server by sliding the above query into an inline view and used the PIVOT function, and/or additional analytic functions to calculate the employee’s weighted efficiency average per week and the simple averages of the wieghted averages – doing so would have eliminated some potentially complex client-side logic that needed to be programmed a single time, but counter point is that every time the report executed, it would require a more of the database server’s resources than were absolutely required.

That sketch of a report on a sheet of paper sure looked simple, but it turned into a three part blog article series.  Interesting, now the person would like to analyze the data by part ID, listing the employees working on the operations for the part.  Do I smell another three part blog article series?  Sorry, no – I just changed the ORDER BY clause so that the PART_ID column was listed first, and made a small change to the client-side control break logic.  Now the person wants to analyze 104 weeks worth of data rather than just four weeks, and only a single part ID or employee.  Too late, those changes were anticipated, and included in the original specification in the client-side programming, implemented with a simple change to the query and pre-planning for a variable number of weeks in the report.





On the Topic of Programming 2

2 09 2012

September 2, 2012

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Over the course of the last six months I developed several efficiency reports for the machining facility where I work, measuring the average time required to complete a customer’s part compared to the engineering standard “target” run time for producing the part.  Sounds simple, right?  If the engineering target is 30 widgets per hour, and only 20 widgets per hour are produced, then the average efficiency is 20/30 = 66.67%.  It does not take a person with a degree in mathematics to arrive at the conclusion that the widgets are not being produced as quickly as expected.  What is the scope of the measurement time frame for the average: a year, a month, a week, a day, one work shift in a day, a single machining center for a day, a single machining center and employee, or something else?

OK, now flip the engineering standard run times a bit so that the times are a little more consistent with the machining facility where I work.  It might take four hours, six hours, 10 hours, 20 hours, or possibly even 200 hours to complete a single operation at a machining center (one of several operations in the manufacturing process for the widget) to produce a single widget.  With the long run times of the operations, calculating the efficiency of a machining center or an employee for a specified period of time was a daunting task… a task that I was asked to solve roughly 12 years ago (with the help of Oracle Database 8.0.5).

The hours per widget run time (rather than widgets per hour) situation presented various problems for calculating employee efficiency, especially when an operation at a machining center did not complete before the end of an employee’s shift.  Consider a situation where an operation is expected to require eight hours to complete, but only when the tooling used at the machining center is new.  As such, the first shift employee installs new tooling in the machining center every morning before starting a new widget.  The first shift employee spends about 30 minutes finishing up the widget that was started the previous night, changes the tooling, and then starts the machining on the next widget.  So, the first shift employee reports that one widget (started the previous night) completed after 0.5 hours and the second widget completed in 7.5 hours.  The first shift employee’s efficiency, considering that two widgets were completed during his shift, is easily calculated as (8 + 8) / (0.5 + 7.5) * 100 = 200%.  The second shift employee’s efficiency is a consistent 0% because the operation for the widget never completes during his shift because the sharpness of the tooling deteriorates through the day (thus causing the machining operation to take longer).  This obviously leads to odd questions: 1) Why is the second shift employee more efficient when the first shift employee is on vacation (equivalent to the question in Oracle performance tuning: why does my report run faster when it rains Monday mornings?)?  Why is the second shift employee more efficient when working a nine or 10 hour shift, rather than an eight hour shift?  The fun questions that one is able to answer when familiar with the data…

When an employee starts working on a machining operation, a labor ticket transaction is created detailing the fact that the work for the machining operation is in-process.  When the machining operation completes (or when it is time to leave for the day), the employee closes the labor ticket transaction and reports the number of widgets that completed.  These labor ticket transactions are conveniently stored in a table named LABOR_TICKET.  Back in 2006 or 2007, I decided to take another attempt at solving the efficiency problem – after all, I now had access to analytic functions in Oracle Database (analytic functions did not exist in Oracle Database 8.0.5).  I expected that by extending the time range beyond a single employee and/or shift, I might be able to extract useful efficiency information from the database.  I put together a rather long SQL statement that looked something like this:

SELECT DISTINCT
  1 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-30) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-30) AND TRUNC(SYSDATE)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
UNION ALL
SELECT DISTINCT
  2 TYPE,
  WO.PART_ID,
  TO_NUMBER(LT.WORKORDER_SUB_ID) WORKORDER_SUB_ID,
  LT.OPERATION_SEQ_NO,
  TRUNC(SYSDATE-60) SHIFT_DATE,
  LT.EMPLOYEE_ID,
  LT.RESOURCE_ID,
  SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) HOURS_WORKED,
  SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID) GOOD_QTY,
  NULL HRS_PC,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO)),2) AVG_HRS_PC_TIME_ALL,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID)),2) AVG_HRS_PC_TIME_EMP,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,RESOURCE_ID)),2) AVG_HRS_PC_TIME_RES,
  ROUND(DECODE(SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID),0,0,
    SUM(LT.HOURS_WORKED) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)/
      SUM(LT.GOOD_QTY) OVER (PARTITION BY WO.PART_ID,LT.WORKORDER_SUB_ID,LT.OPERATION_SEQ_NO,EMPLOYEE_ID,RESOURCE_ID)),2) AVG_HRS_PC_TIME_EMP_RES
FROM
  WORK_ORDER WO,
  LABOR_TICKET LT
WHERE
  WO.TYPE='W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID='WIDGET1'
  AND LT.SHIFT_DATE BETWEEN TRUNC(SYSDATE-60) AND TRUNC(SYSDATE-31)
  AND WO.TYPE=LT.WORKORDER_TYPE
  AND WO.BASE_ID=LT.WORKORDER_BASE_ID
  AND WO.LOT_ID=LT.WORKORDER_LOT_ID
  AND WO.SPLIT_ID=LT.WORKORDER_SPLIT_ID
...

The above SQL statement grew in length substantially based on the number of time periods that I selected to compare.  This data was returned to a program for display purposes, so reformatting the output was always a possibility.  Consider a situation where the programmer is unfamiliar with Oracle specific SQL and is asked to generate the same output in a program that he is constructing.  For a program seeking to measure efficiency, the programmer’s solution would likely involve repeated executions of SQL statements to retrieve the bits and pieces of information that need to be presented – this row by row processing will likely be very slow over a high latency WAN (VPN) connection (but certainly faster than never receiving the information, so the programmer will still be a hero), and could very well hinder the performance of the Oracle Database server.

The above solution is good, but problems such as the progressive wearing of the tooling cutter, as mentioned earlier, could still lead to unexpected differences in efficiency of different employees that are working as fast as is possible.  Fast forward a few years.  The right questions are asked from people with a more complete understanding of efficiency measurements – examine the efficiency calculation problem from an entirely different angle.  Not long ago I was handed an example of a simple efficiency report, and asked to reproduce that report with live data, output in a Microsoft Excel spreadsheet.  The example was little more than a simple sketch, so for this blog article I put together a color-coded example of the report format in Microsoft Excel:

A little explanation is required.  The entirely different angle for calculating employee efficiency in the execution of widget operations involves not looking at the date of the labor ticket transaction, or even the number of widgets produced by a particular employee in a particular time frame.  Instead, the date of the manufacturing batch’s (lot in this particular ERP system) completion date, the quantity produced in the batch, and the total hours to execute a single machining operation for the lot become the criteria when comparing against the engineering standards to determine efficiency.  The manufacturing batch’s completion date is used to divide the batches into specific timeframes (in this case weeks: Monday through Sunday).  All employees working on a particular operation, where the batch completion date is in a certain week, will receive the same efficiency rating for that particular operation (EFF_WEEK1, EFF_WEEK2, EFF_WEEK3, etc.) as all other employees working on the same operation with the same batch completion date range.  It is not uncommon for employees to generate labor transactions for multiple operations for production of the same part, as well as operations for different parts that have batch completion dates in the same week, so the employee’s efficiency rating (W_AVG_EFF1, W_AVG_EFF2, W_AVG_EFF3, etc.) weights the individual part efficiencies based on the percentage of machining time the employee spent in a given operation compared to all of the other operations the employee worked.  The employee’s efficiency rating (Avg W_VE_EFF) is the simple average of the employee’s weekly efficiency ratings.

In the above example report, the blue colored text is a static label.  The red colored text is a calculated static label that shows the Monday through Sunday date range for the week.  The black colored text is a simple “rip and read” from the database – no calculation is required.  The purple colored text indicates a calculated value.  The numbers at the right will not appear on the completed report, but are included to indicate which EFF_WEEKn and AVG_WEEKn values will be identical if the employee worked on an operation whose batch closed in the given week (notice that there are blanks in columns of the sample report, indicating that the employee did not work on that operation in the batch completion week).

The programming challenge is to determine the number of SQL statements that would be needed to retrieve the information from the database, and the number of times those SQL statements would need to be executed.  Would the number of SQL statement executions depend on the number of employees?  Would the number of SQL statement executions depend on the number of different part operations whose batches closed in a particular week?  Would the number of SQL statement executions depend on the number weeks included in the report?  Before you think about the problem, we should probably investigate the data organization in the various tables.  The picture below shows the tables (and the specific columns) required for the report, showing the tables for the engineering standards in green boxes and the tables needed for the labor transaction analysis in black boxes.  The lines show the association of the data in the different tables.

(The above image was adjusted 7 hours after this blog article was initially posted.  The linking between the green OPERATION table and the green WORK_ORDER table was corrected, and the red colored text was added to indicate whether the table was to be used for retrieving the engineering master standards (WORKORDER_TYPE=’M’) or information from the production work order batches (WORKORDER_TYPE=’W’ or TYPE=’W’).)

It might also be interesting to think about what processing will be performed on the data returned by the SQL statements – that could be a pivotal design decision for the SQL statements.  Curve balls are to be expected – what is currently an examination of four weeks’ worth of efficiency numbers today might be an examination of 52 weeks tomorrow.  Oh, here is a thought, what if instead on the next day it is important to focus on the various part efficiencies and the employees who worked on the operations, rather than focusing on the employees and the various parts that the employees worked on?

Part 3 of this series will attempt to answer some of the above questions.

Late addition, sample of a completed report in Excel:





On the Topic of Programming 1

26 08 2012

August 26, 2012

(Forward to the Next Post in the Series)

As those of you who have read this blog’s About page probably know, my day to day job responsibilities involve a lot of activities that are not specific to Oracle Database performance tuning, or even remotely Oracle DBA type activities.  Those extra acttivites are part of what keeps the job fresh and interesting, whether I am swapping in a new roll of labels into a Zebra label printer (that was a three minute fun task this morning), troubleshooting phone system problems (from a fork lift truck “disabling” a phone, to programming the PBX with a clever “message delivery system” to waste the time of persistent telemarketers), swapping out the power supply in a computer very early in the morning, or engaged in a marathon of application programming into the evening and weekend hours.

One of the recent programming projects involves the continuing effort of decreasing the number of data islands, allowing the data previously contained in the islands to be accessed and cross-referenced with data generated by other applications.  One of the data island generators that I have been working to phase out is Superbase 3.0, a database platform that started life on a Commodore 64 in 1983.  Superbase 3.0 is a 16 bit client-side database, so it does not play well with the 64 bit Windows 7 that ships with most new desktop computers (16 bit applications will not run natively on 64 bit Windows, instead the 16 bit applications must be run inside a virtual machine such as Windows XP Mode, or run remotely using remote access software such as a VNC client).

The data contained in the old Superbase databases is critical to the company’s processes, so that data must be salvaged – meaning that the data must be transformed and imported into an Oracle database.  Unlike what a developer would likely create in a relational database, often with multiple tables used to store one “record” in the database, the long departed designer of the Superbase databases used a single row in a single database table to store one “record” in the database.  That organization reminds me a lot of the Microsoft Works package’s database from the early 1990s, with its fancy data entry forms which allowed users to escape the dull spreadsheet-like data entry screen.  Microsoft Excel from the early/mid 1990s could magically transform a dull spreadsheet data entry screen into a simple data entry form, in the process transforming the expensive Microsoft Excel into essentially a cheap database program.  It is a bit more of a challenge to locate the automatic data entry form creator in Excel 2010 than I recall it being in the early/mid 1990s version of Excel, but I suppose that helps further reduce the number of potential data islands:

So, what does the above discussion of Microsoft Excel have to do with anything related to Oracle Database?  The data contained in the Superbase databases must be transformed and inserted into an Oracle database.  It is good news that Superbase is able to export data to Microsoft Excel format.  The bad news is that the exported format is designed to work with Microsoft Excel 2.0 – a very old version of Microsoft Excel that seems to date back to 1987!  Time for a lot of manual data entry if that data must end up in an Oracle Database 11.2.0.x database… unless…

Microsoft Excel 2003 (version 12.0 if I remember correctly) is able to open Excel 2.0 files… success, even if the success is minor.  Now, how to go about tranferring the data from Excel into Oracle Database?  I suppose that I could have created a macro in Microsoft Excel to insert the data into Oracle Database, but at the time I was not interested in writing a macro that accomplished the task “the right way” using bind variables.  And just look at that data – some of the date values were imported as very small (roughly -65000) numbers, in some cases nearly 20 different spellings for the same customer name, and alpha-numeric text in columns that should be numeric.

So, how did I import the Superbase data that was now in Excel 2003 into the Oracle Database 11.2.0.x database without writing an Excel macro?  The particular computer with Excel 2003 that I was using also had a copy of Access 2003 installed.  Access 2003 is able to create a table “link” to an Excel 2003 spreadsheet’s worksheet, and handle that worksheet essentially the same as if it were a database table.  Now the data is “in” Microsoft Access 2003, but still not in an Oracle database.  Previous experience with this process pays off – before bringing the data into Microsoft Access, type each of the Oracle Database destination table’s column names into the first row of the Microsoft Excel spreadsheet, above the appropriate column’s data.  Importing the data into the Oracle database then becomes a simple four step process (assuming that no other data transformation is necessary)

  1. Link to the Excel spreadsheet’s worksheet and the destination table in the Oracle database.
  2. Create an Access view (stored Query) that selects all of the columns from the Excel worksheet that must be inserted into the Oracle database.
  3. Convert the view (stored Query) type to an Append type and select the linked Oracle Database table as the destination – Access will automatically find the correct destination column in the Oracle table, if the source column name (from the first row in the Excel worksheet) matches the destination column name.
  4. Execute the append type view.

A simple transformation of the data from 1994 database technology to 1987, 2003, and then on to 2011 in Oracle Database – and without writing a single line of code.  Remember that problem that I mentioned about alpha-numeric text in columns that should be numeric, such as “10&20″ in a column named OPERATION_NUMBER (or OPERATION_SEQ_NO) – it turns out that that bit of inconsistent data cannot just be thrown out (thanks Microsoft Access 2003).  To fix that problem, I would need to add another column to the Oracle Database table, and then have Microsoft Access update that table using the Microsoft Excel spreadsheet data (fixing the “10&20″, “10 & 20″, “10  &20″, “10 AND 20″, “10,20” and “10, 20″ variants into a standard format.  The SQL dialect in Microsoft Access is a bit odd at times, and I could not remember if the odd syntax applies to UPDATE statements also.  As an example of the odd syntax, the simple CREATE TABLEAS SELECT:

CREATE TABLE
  T2 AS
SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Must be written like the following in Microsoft Access:

SELECT
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM,
  MIN(T1.CHECK_COUNT) AS CHECK_COUNT_START,
  MAX(T1.CHECK_COUNT) AS CHECK_COUNT_END
INTO
  T2
FROM
  T1
GROUP BY
  T1.SID,
  T1.SERIAL,
  T1.USERNAME,
  T1.PROGRAM;

Since I am not 100% confident in my SQL authoring skills in Microsoft Access, how do I move the data from the Excel spreadsheet into the new column of the Oracle Database table… and without writing a single line of programming code?  I simply created a temporary table (not a true temporary table, because the table data must be visible to more than one session) that contained the primary key column and a second column for the non-numeric numeric data.  Once the data was in the temporary Oracle table (using the simple four step process outlined above), I simply executed an UPDATE statement similar to this:

UPDATE
  T1
SET
  NON_NUMERIC_NUMERIC=(
    SELECT
      NON_NUMERIC_NUMERIC
    FROM
      T1_TEMP TT
    WHERE
      T1.PRIMARY_KEY=TT.PRIMARY_KEY)
WHERE
  T1.PRIMARY_KEY IN (
    SELECT
      PRIMARY_KEY
    FROM
      T1_TEMP);

With the data successfully transferred into an Oracle database table, the programming continues.  That brings me to the next article in this series, the internal conflicts of the “best” way to accomplish the programming task.





Parallel Parking – What is causing the Query to be Executed in Parallel?

31 07 2012

July 31, 2012

I encountered an interesting OTN thread that caused me to stop and think for a couple of minutes.  The OP (original poster) in the OTN thread wondered why parallel execution was used for a query when the parallel degree for the table accessed by the query and its indexes were specified to have a parallel degree of 1.  For some reason, while the OP provided DBMS_XPLAN generated execution plan output, the Note section from the output was not included in the OTN post.  With parallel execution enabled in the session (ALTER SESSION ENABLE PARALLEL QUERY;) the execution plan showed a cost of 2,025.  With parallel execution disabled in the session (ALTER SESSION DISABLE PARALLEL QUERY;) the execution plan showed a cost of 36,504.  From this, we can determine that the parallel degree for the query is 20 (36504 / 2025 / 0.9 = 20.03).  The table is partitioned on a date column, apparently partitioned by month.

Interesting?

Could the parallel execution be caused by the PARALLEL_AUTOMATIC_TUNING parameter’s value?  The DBMS_XPLAN output indicates that the OP is using Oracle Database 11.2.0.2, where that parameter is deprecated.  What about the PARALLEL_DEGREE_POLICY parameter?  When that parameter is set to AUTO, the query optimizer is free to plan a parallel execution for a query, even when the tables (and their indexes) accessed by the query are set to a parallel degree of 1.

A table creation script for a couple of tests (note that the table created in the test case is using INTERVAL partitioning using an approach very similar to that found on page 576 in the book Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition except that I have a DATE column in the table rather than a TIMESTAMP column):

DROP TABLE T1 PURGE;

CREATE TABLE T1 (
  C1 NUMBER,
  C2 VARCHAR2(10),
  C3 DATE,
  C4 VARCHAR2(20))
  PARTITION BY RANGE(C3)
  INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
  ( PARTITION p0 VALUES LESS THAN (TO_DATE('1-1-2007', 'DD-MM-YYYY')));

INSERT INTO
  T1
SELECT
  ROWNUM,
  RPAD(ROWNUM,10,'A'),
  TRUNC(SYSDATE)+MOD(ROWNUM-1,3000),
  RPAD('A',20,'A')
FROM
  (SELECT
    1 C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<100000),
  (SELECT
    1 C1
  FROM
    DUAL
  CONNECT BY
    LEVEL<1000);

COMMIT; 

Let’s see, a Cartesian join between a rowsource with 1,000 rows and a rowsource with 100,000 rows produces (on 11.2.0.1, 11.2.0.2, and 11.2.0.3):

99899001 rows created. 

Interesting, I might have to investigate that oddity at a later time.  I was expecting 1,000 * 100,000 = 100,000,000 rows.

Let’s try a simple test:

select count(*) from t1;

SET LINESIZE 140
SET PAGESIZE 1000

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

SQL_ID  5bc0v4my7dvr5, child number 0
-------------------------------------
select count(*) from t1

Plan hash value: 2705263620

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       | 59271 (100)|          |       |       |
|   1 |  SORT AGGREGATE      |      |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|      |   115M| 59271   (2)| 00:07:55 |     1 |1048575|
|   3 |    TABLE ACCESS FULL | T1   |   115M| 59271   (2)| 00:07:55 |     1 |1048575|
-------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

1,048,575 partitions… I was expecting about 100 partitions because there should be 3,000 distinct dates in column C3.  Maybe I should find some time to investigate?  Moving on…

To test the theory that the unexpected parallel execution might be caused by a non-default value for the PARALLEL_DEGREE_POLICY parameter, we need to first use the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure to indicate performance metrics for the server.  To save time, I found a blog article (by another OakTable Network member) that shows a back-door approach that does not require the CALIBRATE_IO procedure (the usual warnings apply here – we are technically modifying objects in the SYS schema).  The back-door approach, when connected as the SYS user (note that the script first selects from RESOURCE_IO_CALIBRATE$ so that if values are currently present, those values may be restored after testing completes):

SELECT
  *
FROM
  RESOURCE_IO_CALIBRATE$;

DELETE FROM
  RESOURCE_IO_CALIBRATE$;

INSERT INTO
  RESOURCE_IO_CALIBRATE$
VALUES(
  CURRENT_TIMESTAMP,
  CURRENT_TIMESTAMP,
  0,
  0,
  200,
  0,
  0);

COMMIT; 

I will start by posting results that I obtained from Oracle Database 11.2.0.3.

Let’s try the previous select from T1 again (note that I am first setting the PARALLEL_DEGREE_POLICY parameter at the session level to its default value):

ALTER SESSION SET PARALLEL_DEGREE_POLICY=MANUAL;

select count(*) from t1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5bc0v4my7dvr5',NULL,'TYPICAL'));

SQL_ID  5bc0v4my7dvr5, child number 0
-------------------------------------
select count(*) from t1

Plan hash value: 2705263620

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       | 59271 (100)|          |       |       |
|   1 |  SORT AGGREGATE      |      |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|      |   115M| 59271   (2)| 00:00:03 |     1 |1048575|
|   3 |    TABLE ACCESS FULL | T1   |   115M| 59271   (2)| 00:00:03 |     1 |1048575|
-------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Let’s try again with a modified value for that parameter:

ALTER SESSION SET PARALLEL_DEGREE_POLICY=AUTO;

select count(*) from t1;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('5bc0v4my7dvr5',NULL,'TYPICAL'));

SQL_ID  5bc0v4my7dvr5, child number 2
-------------------------------------
select count(*) from t1

Plan hash value: 974985148

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       | 32790 (100)|          |       |       |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR       |          |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   115M| 32790   (2)| 00:00:02 |     1 |1048575|  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |   115M| 32790   (2)| 00:00:02 |     1 |1048575|  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: Computed Degree of Parallelism is 2 

In the above, the Note section indicates that the automatic degree of parallelism was set to 2, and that dynamic sampling was performed at level 2.  I have yet to collect statistics on table T1, so that explains why dynamic sampling was used when the query was executed with and without parallel execution.  If you take a close look at the OTN thread that is linked to at the start of this blog article, you will see that the dynamic sampling level was set to 6, but only for the parallel execution.  The simple answer for this change in behavior is that it is expected, as described in this article by Maria Colgan (another OakTable Network member).  We can confirm that the cost calculation is working as expected: 59271 / 32790 / 0.9 = 2.01.

Let’s try executing the query of table T1 using a WHERE clause that is similar to what is found in the OTN thread:

ALTER SESSION SET PARALLEL_DEGREE_POLICY=MANUAL;

select count(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  1ggk1hy0r22kt, child number 0
-------------------------------------
select count(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |   235K(100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     9 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2286K|    19M|   235K(100)| 00:00:10 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2286K|    19M|   235K(100)| 00:00:10 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - dynamic sampling used for this statement (level=2) 

Let’s try again, this time with the non-default value for the PARALLEL_DEGREE_POLICY:

ALTER SESSION SET PARALLEL_DEGREE_POLICY=AUTO;

select count(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  1ggk1hy0r22kt, child number 1
-------------------------------------
select count(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2946076746

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |   835K(100)|          |       |       |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     9 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     9 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     9 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   112M|   968M|   835K(100)| 00:00:33 |    69 |    71 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |   112M|   968M|   835K(100)| 00:00:33 |    69 |    71 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      SHARED(16)
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter(("C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "C3"<=TO_DATE(' 2012-10-14
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: Computed Degree of Parallelism is 16 because of degree limit 

Notice anything strange… really strange about the above output?  The new version of the query is limited to just 3 partitions, yet the calculated cost of the non-parallel execution is approximately 235,000 (much higher than the calculated cost of selecting from the full table without the WHERE clause) – that is just one reason why relying on cost for tuning performance purposes (as suggested by a couple of books) is potentially misleading.  But wait, the plan using parallel execution with an automatic degree of parallelism of 16 has a plan cost of approximately 835,000 and the optimizer still selected that plan!  I strongly suspect that a query optimizer bug caused a miscalculation of the plan cost; I could not reproduce the odd cost calculation on Oracle Database 11.2.0.1 or 11.2.0.2.  Still suspicious of the cost in the execution plan, at the end of testing I dropped and recreated the table.  After recreating the test table, the following execution plan was generated with a less crazy cost calculation:

SQL_ID  1ggk1hy0r22kt, child number 1
-------------------------------------
select count(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2946076746

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       | 23822 (100)|          |       |       |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     9 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     9 |            |          |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     9 |            |          |       |       |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |  2286K|    19M| 23822 (100)| 00:00:01 |    69 |    71 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |  2286K|    19M| 23822 (100)| 00:00:01 |    69 |    71 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      SHARED(11)
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter(("C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "C3"<=TO_DATE(' 2012-10-14
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: Computed Degree of Parallelism is 11 

Let’s double-check the calculated cost in the parallel execution plan: 235,000 / 23822 / 0.9 = 10.96 – that calculated cost is consistent with the Computed Degree of Parallelism.

I am curious about whether or not dynamic sampling might affect just the parallel execution.  Let’s collect statistics on table T1 with 100% sampling (note that it could take a long time for this statistics collection to complete):

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>NULL)

Let’s try our query again (note that I have forced a hard parse of the SQL statement by changing the capitalization of a couple parts of the SQL statement):

ALTER SESSION SET PARALLEL_DEGREE_POLICY=MANUAL;

SELECT COUNT(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  2q2uz15vyzqsy, child number 1
-------------------------------------
SELECT COUNT(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |  1816 (100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2531K|    19M|  1816   (3)| 00:00:01 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2531K|    19M|  1816   (3)| 00:00:01 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Notice in the above that the execution plan for the non-parallel execution has a calculated cost that decreased from 235,000 to just 1,816, even though the same Pstart (69) and Pstop (71) columns are listed, and approximately the same estimated number of rows (~2,286,000 vs ~2,531,000) are expected to be returned.  So, if you are attempting to performance tune by attempting to simply reduce the calculated cost of a query (not a valid approach), you might be lead to believe that collecting statistics improved the performance of this query by a factor of 129.4 (the performance did improve a little due to dynamic sampling no longer being required, but the performance did not improve nearly that much).

Let’s try the query again with a modified value for the PARALLEL_DEGREE_POLICY parameter:

ALTER SESSION SET PARALLEL_DEGREE_POLICY=AUTO;

select COUNT(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  2q2uz15vyzqsy, child number 0
-------------------------------------
SELECT COUNT(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |  1816 (100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2531K|    19M|  1816   (3)| 00:00:01 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2531K|    19M|  1816   (3)| 00:00:01 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      NO_PARALLEL
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

The Note section indicates that parallel execution was considered, but not used.  So, after collecting statistics, the parallel degree for this query decreased from 16 (or 11 in the non-bug affected execution plan) to 1 – a non-parallel execution.

The above output was generated with Oracle Database 11.2.0.3 using the following system (CPU) statistics:

COLUMN PNAME FORMAT A14
COLUMN PVAL2 FORMAT A20

SELECT
  PNAME,
  PVAL1,
  PVAL2
FROM
  SYS.AUX_STATS$;

PNAME               PVAL1 PVAL2
-------------- ---------- ----------------
STATUS                    COMPLETED
DSTART                    07-11-2012 15:11
DSTOP                     07-11-2012 15:11
FLAGS                   1
CPUSPEEDNW     1720.20725
IOSEEKTIM              10
IOTFRSPEED           4096
SREADTIM                8
MREADTIM               10
CPUSPEED             2664
MBRC                   16
MAXTHR           19181568
SLAVETHR

While experimenting with Oracle Database 11.2.0.2, I used the following system (CPU) statistics:

COLUMN PNAME FORMAT A14
COLUMN PVAL2 FORMAT A20

SELECT
  PNAME,
  PVAL1,
  PVAL2
FROM
  SYS.AUX_STATS$;

PNAME               PVAL1 PVAL2
-------------- ---------- ----------------
STATUS                    COMPLETED
DSTART                    02-29-2012 19:07
DSTOP                     02-29-2012 19:07
FLAGS                   1
CPUSPEEDNW     2116.57559
IOSEEKTIM              10
IOTFRSPEED           4096
SREADTIM               .4
MREADTIM               .8
CPUSPEED             1922
MBRC                   16
MAXTHR          155910144
SLAVETHR

11.2.0.2 showed the following execution plan after collecting statistics for table T1 and setting the PARALLEL_DEGREE_POLICY parameter to AUTO:

SQL_ID  81dw7f2yz0jxp, child number 1
-------------------------------------
select COUNT(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |  4082 (100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2531K|    19M|  4082  (29)| 00:00:01 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2531K|    19M|  4082  (29)| 00:00:01 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      NO_PARALLEL
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

In the above execution plan, notice that the calculated cost is 4,082 (this is higher than the calculated cost returned by Oracle Database 11.2.0.3).  The SREADTIM and MREADTIM system statistics were manually set to very small numbers – due to a bug in Oracle Database 11.2.0.1 and 11.2.0.2 these system statistics could be set to much large values, such as 2,491 and 9,163, respectively.  What might happen to the calculated cost if we were to increase the SREADTIM and MREADTIM system statistics by a factor of 10?  Such a change would indicate to the query optimizer that each disk access will required 10 times longer to complete.  Should the calculated cost of this query increase or decrease after changing the system statistics?

First, let’s manually alter the value of the system statistics:

EXEC DBMS_STATS.SET_SYSTEM_STATS('SREADTIM',4)
EXEC DBMS_STATS.SET_SYSTEM_STATS('MREADTIM',8)

Next, execute the query and display the execution plan (notice that capitalization of the word SELECT was changed to force a hard parse of the query):

ALTER SESSION SET PARALLEL_DEGREE_POLICY=AUTO;

SELECT COUNT(*) from t1
WHERE C3 BETWEEN to_date('01/08/2012', 'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy');

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL +OUTLINE'));

SQL_ID  2q2uz15vyzqsy, child number 1
-------------------------------------
SELECT COUNT(*) from t1 WHERE C3 BETWEEN to_date('01/08/2012',
'dd/mm/yyyy') and to_date('14/10/2012', 'dd/mm/yyyy')

Plan hash value: 2744578615

--------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |  3020 (100)|          |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |     8 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |  2531K|    19M|  3020   (4)| 00:00:01 |    69 |    71 |
|*  3 |    TABLE ACCESS FULL      | T1   |  2531K|    19M|  3020   (4)| 00:00:01 |    69 |    71 |
--------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.2')
      ALL_ROWS
      NO_PARALLEL
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C3"<=TO_DATE(' 2012-10-14 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "C3">=TO_DATE(' 2012-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

Notice the changes in the execution plan?  The calculated cost shown in the execution plan decreased from 4,082 to 3,020, and the %CPU for the third line in the execution plan decreased from 29 to 4.  Why did the calculated cost decrease when the optimizer is informed that disk accesses require 10 times as long to complete?  For the answer, take a look at this blog article written by Randolf Geist (another OakTable Network member).

OK, so I drifted a bit from the focus of the OTN message thread that provoked this article.  What do you think caused parallel execution to be used when the OP executed the query found in the OTN thread?








Follow

Get every new post delivered to your Inbox.

Join 156 other followers