October 22, 2012
Lately, I have been busy with a couple of items that are not completely Oracle Database specific. I tried experimenting with high definition 1080p security cameras. After wasting an unacceptable amount of money to purchase two Y-Cam Wireless High Definition 1080p video cameras (model YCBLHD6), and spending in excess of 38 hours fighting with the cameras to make them work as advertised, I wrote a video review of the product. After writing a review, Y-Cam’s tech support representative contacted me again, so I pulled the cameras back out of the box and wasted another five hours trying to make the cameras work wirelessly – a feature that was lost after installing the third firmware version on the cameras. The cameras magically connected wirelessly very early the next morning – but wireless communiation was only one of several problems that I was experiencing with this model of security camera. What might cause a procedure to work magically when that same procedure failed several times in the past – and might this somehow have an odd relation to Oracle Database? This is the explanation that I provided to Y-Cam’s tech support representative and posted as a follow up on Y-Cam’s support forum:
The camera’s Ethernet port and wireless adapter both have the same hardcoded MAC address. This MAC address is used by network switches to determine which devices are attached to each of the switch ports so that network traffic addressed to a specific MAC address is only sent down the switch port where that MAC address is known to be connected. By contrast a network hub broadcasts network traffic to all ports, rather than just the port to which a hardware device’s MAC address is attached; a hub is less likely to have an issue with two different network adapters having the same MAC address (Y-CAM support reported that they confirmed that the camera with the latest firmware worked with a non-broadcasted WPA2 protected wireless network). The network switch maintains a MAC address table for each switch port – when the camera’s Ethernet port is disconnected and the camera is power cycled, the switch’s MAC address tables may not be immediately updated to remove the camera’s MAC address from the wired Ethernet port so that the MAC address may be added to the switch port used for wireless connectivity. Power cycling the network switch clears the switch’s MAC address tables, allowing the camera’s MAC address to be added to the switch port used for wireless connectivity.
In short, it is necessary to power cycle the switch (or the Cisco Linksys E2000 router acting as only a wireless access point with integrated 4 port gigabit switch in my case) when moving the camera from a wired Ethernet connection to a wireless connection.
Makes a bit of sense… now if only motion detection worked as advertised and stability improved to the point that the camera does not require two or three reboots per day. Thankfully, most other consumer products tend to work out of the box; imagine the ramifications if every time the weather conditions included rain, a vehicle would sputter, shake violently, and reduce power to 10% of normal when the driver attempted to accelerate.
—
Changing gears a bit. You might have noticed that the title of this article appears to be some sort of error message, probably produced by an Oracle product. I am in the process of writing a program that interfaces with an Oracle Database, rapidly retrieving a variety of information from various database tables – using bind variables in the SQL statements, of course:
The problem? My program seems to crash randomly on computers running either the 32 bit or 64 bit version of Windows 7 (the program works without issue on computers running the 32 bit version of Windows XP). The randomness of the crashes makes it a bit difficult to troubleshoot this particular problem, but the vast majority of the crashes are associated with populating the treeview control. Is the crash caused by a typo (bug) in my program logic? Is the crash caused by the Windows 7 version of the treeview control? Is the crash caused by Oracle Database 11.2.0.2, as a result of receiving several different SQL statements in rapid-fire fashion? Could there be another explanation?
To start the investigation, I first enabled a 10046 trace at level 12. I then entered an employee ID and started selecting different date ranges until the program crashed. The crash message varied (as did the point at which the crash happened), and while Windows offered to contact the developer to report the error, I declined the offer. Unfortunately, I did not capture any of the exact error messages, although I did open one of the debug files created during a crash – “Unhandled exception at 0x0fc01b3d”:
mov cx, word ptr [edi] – Thanks Windows, I think that I will go back to the safety of the 10046 trace files (my best guess is that this instruction is attempting to copy a value from a variable memory pointer).
–
Digging into the 10046 trace file, I found that this was the last SQL statement executed by my program before the crash:
CLOSE #1301509800:c=0,e=10,dep=0,type=1,tim=4032143047926 ===================== PARSING IN CURSOR #1301508512 len=1452 dep=0 uid=172 oct=3 lid=172 tim=4032143048002 hv=929599856 ad='3eb8207a0' sqlid='4y6m7dsvqj4bh' SELECT EMPLOYEE_ID, SHIFT_DATE, CLOCK_IN, CLOCK_OUT, HOURS_WORKED, HOURS_VACATION, HOURS_HOLIDAY, HOURS_BEREAVEMENT, HOURS_JURY_DUTY, DECODE(SHIFT_DATE_CODE,'VAC',1,0) VACATION, DECODE(SHIFT_DATE_CODE,'HOL',1,0) HOLIDAY, DECODE(SHIFT_DATE_CODE,'BER',1,0) BEREAVEMENT, DECODE(SHIFT_DATE_CODE,'JUR',1,0) JURY, DECODE(SHIFT_DATE_CODE,'ABS',1,0) ABSENT, DECODE(SHIFT_DATE_CODE,'EXC',1,0) EXCUSED, SUM(DECODE(SHIFT_DATE_CODE,'VAC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) VACATION_COUNT, SUM(HOURS_VACATION) OVER (PARTITION BY EMPLOYEE_ID) VACATION_HOURS, SUM(DECODE(SHIFT_DATE_CODE,'HOL',1,0)) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_COUNT, SUM(HOURS_HOLIDAY) OVER (PARTITION BY EMPLOYEE_ID) HOLIDAY_HOURS, SUM(DECODE(SHIFT_DATE_CODE,'BER',1,0)) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_COUNT, SUM(HOURS_BEREAVEMENT) OVER (PARTITION BY EMPLOYEE_ID) BEREAVEMENT_HOURS, SUM(DECODE(SHIFT_DATE_CODE,'JUR',1,0)) OVER (PARTITION BY EMPLOYEE_ID) JURY_COUNT, SUM(HOURS_JURY_DUTY) OVER (PARTITION BY EMPLOYEE_ID) JURY_HOURS, SUM(DECODE(SHIFT_DATE_CODE,'ABS',1,0)) OVER (PARTITION BY EMPLOYEE_ID) ABSENT_COUNT, SUM(DECODE(SHIFT_DATE_CODE,'EXC',1,0)) OVER (PARTITION BY EMPLOYEE_ID) EXCUSED_COUNT FROM PAY_HOURS WHERE EMPLOYEE_ID= :1 AND SHIFT_DATE BETWEEN :2 AND :3 AND COALESCE(SHIFT_DATE_CODE,'ZZZ')<>'NOL' ORDER BY COALESCE(SHIFT_DATE_CODE,'ZZZ'), SHIFT_DATE END OF STMT PARSE #1301508512:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048001 BINDS #1301508512: Bind#0 oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0 kxsbbbfp=4d936d40 bln=32 avl=06 flg=05 value="DUDE" Bind#1 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32 kxsbbbfp=4d936d60 bln=07 avl=07 flg=01 value="10/19/2012 0:0:0" Bind#2 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40 kxsbbbfp=4d936d68 bln=07 avl=07 flg=01 value="10/19/2012 0:0:0" EXEC #1301508512:c=0,e=129,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048323 WAIT #1301508512: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143048369 FETCH #1301508512:c=0,e=33,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,plh=2997266572,tim=4032143048448 STAT #1301508512 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=43 us cost=6 size=50 card=1)' STAT #1301508512 id=2 cnt=0 pid=1 pos=1 obj=0 op='WINDOW BUFFER (cr=3 pr=0 pw=0 time=34 us cost=6 size=50 card=1)' STAT #1301508512 id=3 cnt=0 pid=2 pos=1 obj=0 op='FILTER (cr=3 pr=0 pw=0 time=21 us)' STAT #1301508512 id=4 cnt=0 pid=3 pos=1 obj=70154 op='TABLE ACCESS BY INDEX ROWID PAY_HOURS (cr=3 pr=0 pw=0 time=20 us cost=4 size=50 card=1)' STAT #1301508512 id=5 cnt=0 pid=4 pos=1 obj=71786 op='INDEX RANGE SCAN SYS_C0022057 (cr=3 pr=0 pw=0 time=20 us cost=3 size=0 card=1)' WAIT #1301508512: nam='SQL*Net message from client' ela= 1448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050020 PARSE #1301509800:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=266456858,tim=4032143050081 WAIT #1301509800: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032143050142 *** 2012-10-19 09:43:33.522 WAIT #1301509800: nam='SQL*Net message from client' ela= 5761077 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4032148811261
Great, I will take a closer look at the part of my program that executes that SQL statement. But wait, someone famous once said: “Insanity is repeating the same mistakes and expecting different results.” Let’s run my program again with a 10046 trace at level 12 enabled. This time, I found that the following was the last SQL statement executed by my program before the crash:
PARSING IN CURSOR #1078422680 len=1257 dep=0 uid=172 oct=3 lid=172 tim=4031964081030 hv=4064672182 ad='3a68828c8' sqlid='4cj96t7t4bydq' SELECT LT.TRANSACTION_ID, WO.PART_ID, LT.WORKORDER_BASE_ID, LT.WORKORDER_LOT_ID, LT.WORKORDER_SUB_ID, LT.OPERATION_SEQ_NO, LT.TYPE, LT.INDIRECT_ID, LT.RESOURCE_ID, LT.CLOCK_IN, NVL2(LT.HOURS_WORKED,LT.CLOCK_OUT,NULL) CLOCK_OUT, NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)) HOURS_WORKED, LT.GOOD_QTY, COUNT(*) OVER (PARTITION BY LT.EMPLOYEE_ID) LABOR_TICKETS, SUM(NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2))) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_HOURS, SUM(DECODE(LT.TYPE,'I',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_INDIRECT_HOURS, SUM(DECODE(LT.TYPE,'S',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_SETUP_HOURS, SUM(DECODE(LT.TYPE,'R',NVL(LT.HOURS_WORKED,ROUND((SYSDATE-LT.CLOCK_IN)*24,2)),0)) OVER (PARTITION BY LT.EMPLOYEE_ID) TOTAL_RUN_HOURS FROM LABOR_TICKET LT, WORK_ORDER WO WHERE LT.EMPLOYEE_ID= :1 AND LT.SHIFT_DATE BETWEEN :2 AND :3 AND WO.TYPE(+)='W' AND WO.SUB_ID(+)='0' AND LT.WORKORDER_TYPE=WO.TYPE(+) AND LT.WORKORDER_BASE_ID=WO.BASE_ID(+) AND LT.WORKORDER_LOT_ID=WO.LOT_ID(+) AND LT.WORKORDER_SPLIT_ID=WO.SPLIT_ID(+) END OF STMT PARSE #1078422680:c=0,e=51,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3199062715,tim=4031964081030 BINDS #1078422680: Bind#0 oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0 kxsbbbfp=40476838 bln=32 avl=06 flg=05 value="DUDE" Bind#1 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32 kxsbbbfp=40476858 bln=07 avl=07 flg=01 value="10/19/2012 0:0:0" Bind#2 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40 kxsbbbfp=40476860 bln=07 avl=07 flg=01 value="10/19/2012 0:0:0" EXEC #1078422680:c=0,e=188,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4162394183,tim=4031964081369 WAIT #1078422680: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031964081423 FETCH #1078422680:c=0,e=79,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=4162394183,tim=4031964081539 STAT #1078422680 id=1 cnt=1 pid=0 pos=1 obj=0 op='WINDOW BUFFER (cr=4 pr=0 pw=0 time=81 us cost=9 size=101 card=1)' STAT #1078422680 id=2 cnt=1 pid=1 pos=1 obj=0 op='FILTER (cr=4 pr=0 pw=0 time=35 us)' STAT #1078422680 id=3 cnt=1 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 time=33 us cost=8 size=101 card=1)' STAT #1078422680 id=4 cnt=1 pid=3 pos=1 obj=69863 op='TABLE ACCESS BY INDEX ROWID LABOR_TICKET (cr=4 pr=0 pw=0 time=28 us cost=7 size=73 card=1)' STAT #1078422680 id=5 cnt=1 pid=4 pos=1 obj=70966 op='INDEX RANGE SCAN IND_LT_SHIFT_DATE (cr=3 pr=0 pw=0 time=21 us cost=6 size=0 card=1)' STAT #1078422680 id=6 cnt=0 pid=3 pos=2 obj=70072 op='TABLE ACCESS BY INDEX ROWID WORK_ORDER (cr=0 pr=0 pw=0 time=3 us cost=1 size=28 card=1)' STAT #1078422680 id=7 cnt=0 pid=6 pos=1 obj=71722 op='INDEX UNIQUE SCAN SYS_C0021989 (cr=0 pr=0 pw=0 time=2 us cost=0 size=0 card=1)' *** 2012-10-19 09:40:38.424 WAIT #1078422680: nam='SQL*Net message from client' ela= 9641490 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4031973723245
A different SQL statement was found in the trace file when my program crashed… do the SQL statements have anything in common? Both SQL statements are using analytic functions, and both are using date-type bind variables. Interesting… what was that saying about insanity? Let’s try another test of my program with an enabled 10046 trace to see the last SQL statement found in the trace before the crash:
PARSING IN CURSOR #881825936 len=210 dep=1 uid=0 oct=3 lid=0 tim=4030801312382 hv=864012087 ad='3eddb9a08' sqlid='96g93hntrzjtr' select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2 END OF STMT PARSE #881825936:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312381 BINDS #881825936: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0 kxsbbbfp=348f64a0 bln=22 avl=04 flg=05 value=166995 Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24 kxsbbbfp=348f64b8 bln=22 avl=02 flg=01 value=2 EXEC #881825936:c=0,e=95,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801312551 WAIT #881825936: nam='db file sequential read' ela= 6526 file#=1 block#=98252 blocks=1 obj#=427 tim=4030801319112 WAIT #881825936: nam='db file sequential read' ela= 759 file#=1 block#=100271 blocks=1 obj#=425 tim=4030801319923 FETCH #881825936:c=0,e=7396,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801319963 STAT #881825936 id=1 cnt=1 pid=0 pos=1 obj=425 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=2 pw=0 time=7394 us)' STAT #881825936 id=2 cnt=1 pid=1 pos=1 obj=427 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=1 pw=0 time=6583 us)' CLOSE #881825936:c=0,e=46,dep=1,type=3,tim=4030801320046 ... BINDS #881825936: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0 kxsbbbfp=348f64a0 bln=22 avl=04 flg=05 value=166995 Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24 kxsbbbfp=348f64b8 bln=22 avl=02 flg=01 value=11 EXEC #881825936:c=0,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321733 FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801321767 CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801321802 BINDS #881825936: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0 kxsbbbfp=348f64a0 bln=22 avl=04 flg=05 value=166995 Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24 kxsbbbfp=348f64b8 bln=22 avl=02 flg=01 value=33 EXEC #881825936:c=0,e=152,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801321992 FETCH #881825936:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322027 CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322062 BINDS #881825936: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=48 off=0 kxsbbbfp=348f64a0 bln=22 avl=04 flg=05 value=166995 Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=24 kxsbbbfp=348f64b8 bln=22 avl=02 flg=01 value=35 EXEC #881825936:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=3,plh=2239883476,tim=4030801322195 FETCH #881825936:c=0,e=18,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=3,plh=2239883476,tim=4030801322229 CLOSE #881825936:c=0,e=10,dep=1,type=3,tim=4030801322265 WAIT #881751720: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030801323194 *** 2012-10-19 09:21:14.806 WAIT #881751720: nam='SQL*Net message from client' ela= 8851219 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=4030810174461
You might have noticed that the trace file showed dep=1 in the PARSING IN CURSOR line, which indicates that the SQL statement was executed automatically outside of my program’s control – likely when attempting to optimize another SQL statement that was submitted by my program.
Let’s test that insanity theory again. Another execution of my program with an enabled 10046 trace at level 12 showed this as the last SQL statement executed:
PARSING IN CURSOR #767713824 len=503 dep=0 uid=172 oct=3 lid=172 tim=4026166358219 hv=2534735629 ad='3b58041b0' sqlid='7x2t5dqbj9zsd' SELECT N.ID NCMT_ID, N.NCMT_DATE, N.PART_ID, N.WORKORDER_BASE_ID, N.WORKORDER_LOT_ID, N.WORKORDER_PIECE_NO, N.QTY, N.DISCREPANCY_TYPE, N.RESOURCE_ID, N.CUSTOMER_ID, NVL(N.TOTAL_COST,0) NCMT_COST, COUNT(N.ID) OVER (PARTITION BY N.EMPLOYEE_ID) NCMT_COUNT, SUM(NVL(N.TOTAL_COST,0)) OVER (PARTITION BY N.EMPLOYEE_ID) TOTAL_COST FROM NON_CONFORMING N WHERE N.EMPLOYEE_ID= :1 AND N.NCMT_DATE BETWEEN :2 AND :3 AND UPPER(N.ERROR_BY)='OPERATOR' ORDER BY ID END OF STMT PARSE #767713824:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026166358219 BINDS #767713824: Bind#0 oacdty=96 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0 kxsbbbfp=2dc25d58 bln=32 avl=06 flg=05 value="DUDE" Bind#1 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32 kxsbbbfp=2dc25d78 bln=07 avl=07 flg=01 value="10/19/2012 0:0:0" Bind#2 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40 kxsbbbfp=2dc25d80 bln=07 avl=07 flg=01 value="10/19/2012 0:0:0" ... PARSE #767713824:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045440 BINDS #767713824: Bind#0 oacdty=96 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=01 csi=178 siz=48 off=0 kxsbbbfp=2dc21f50 bln=32 avl=07 flg=05 value="DUDE2" Bind#1 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=32 kxsbbbfp=2dc21f70 bln=07 avl=07 flg=01 value="6/22/2012 0:0:0" Bind#2 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=1000010 frm=00 csi=178 siz=0 off=40 kxsbbbfp=2dc21f78 bln=07 avl=07 flg=01 value="10/19/2012 0:0:0" EXEC #767713824:c=0,e=125,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045593 WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237045634 FETCH #767713824:c=0,e=57,p=0,cr=10,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237045727 WAIT #767713824: nam='SQL*Net message from client' ela= 3080 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237048857 CLOSE #767713824:c=0,e=7,dep=0,type=3,tim=4026237048899 PARSE #767713824:c=0,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3960464466,tim=4026237048951 WAIT #767713824: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026237051177 *** 2012-10-19 08:06:40.151 WAIT #767713824: nam='SQL*Net message from client' ela= 98732762 driver id=1413697536 #bytes=1 p3=0 obj#=71274 tim=4026335783988
Well, that SQL statement also used date-type bind variables and analytic functions… maybe there is a problem with Oracle Database 11.2.0.2? But, why is this suddenly a problem… what has changed? Nearly two years ago I wrote a blog article that described an ODBC update problem that was present in the Oracle 11.2.0.1 Client – the first Oracle Client version that could be easily downloaded and installed on Windows 7 (it is also possible to patch the downloadable 10.2.0.3/4 client to 10.2.0.5). The Oracle 11.2.0.1 patch 7 (10155837 for 32 bit Windows clients) fixes the false ODBC lock violation errors that are reported in Microsoft Access (and various other programs) when UPDATE SQL statements are executed, and those SQL statements contain bind variables. That same Oracle Client version and patch version are still installed on the computer (and all other computers here that run Windows 7), so that is not the source of the change. On a side note, have you found it difficult to locate updated 11.2.0.2 or 11.2.0.3 client versions in MOS (Metalink)?
So, the 10046 trace failed to find a consistent source for the crashes, and the Oracle client has not been changed on the Windows 7 computer in nearly two years. Program bug? Analytic query bug? Bind variable related bug? I went digging for the sqlnet.log file to see if that file provided any clues. This is what I found:
Fri Oct 19 13:26:52 2012 Directory does not exist for read/write [C:\Oracle\product\11.2.0\client_1\log] [C:\Oracle\product\11.2.0\client_1\log\diag\clients] Errors in file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc (incident=401): oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0 ] [] [] [] [] [] [] [] [] [] [] Incident details in: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc
That file’s contents proved to be a bit helpful, I think. Let’s take a look at the contents of the file mentioned above:
Dump file c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\incident\incdir_401\ora_824_8864_i401.trc Dump continued from file: c:\users\hooper\oracle\oradiag_hooper\diag\clients\user_hooper\host_1131635621_76\trace\ora_824_8864.trc oci-24550 [3221225477] [Unhandled exception: Code=c0000005 Flags=0 ] [] [] [] [] [] [] [] [] [] [] ========= Dump for incident 401 (oci 24550 [3221225477]) ======== Tracing is in restricted mode! ----- Short Call Stack Trace ----- _dbgexPhaseII()+850<-_dbgexProcessError()+2061<-_dbgeExecuteForError()+43<-_dbgePostErrorDirect()+2368<-_kpeDbgSignalHandler()+225<-_skgesig_Win_UnhandledExceptionFilter()+140<-75D6003D<-7295FA2E<-0040AD0D<-77309EF0<-77309EC0<-00000000<-773079DD<-77306E72<-772FE0ED<-275C83DD<-275D219F<-275D1D88<-275D0E0E<-2759C677<-004CB035<-004DC4A3<-72991D31<-00410FD8<-7299202F----- START Event Driven Actions Dump ---- ---- END Event Driven Actions Dump ---- ----- START DDE Actions Dump ----- Executing SYNC actions Executing ASYNC actions ----- END DDE Actions Dump (total 0 csec) ----- End of Incident Dump
Might the above indicate that my program logic is not the cause of the problem, but instead that it is the Oracle client that is crashing? I started digging through various websites looking for a similar oci-24500 error message, but I found nothing that was helpful. I searched through the bug database and technical article database on MOS (Metalink), and found nothing that was helpful. By chance, I widened the search on MOS a bit and stumbled upon Mark Powell’s message in this Oracle Support Community thread (you need a MOS account to access the message). That message pointed to two other MOS articles, at least one of which stated that the problem is (possibly – not exactly the same error message) corrected when the Oracle 11.2.0.3 Client is installed. I struggled finding the 11.2.0.3 Client in MOS, and then found that it is located on disk 3 of 6 of the 11.2.0.3 server download (direct link – just specify operating system version and download disk 3 – p10404530_112030_platform_3of6.zip (requires a MOS account to download)). The 11.2.0.3 client version magically corrected the seemingly random crashes in my program – programmer error averted.
Insanity? No more so than an expensive security camera that fails at motion detection, or a vehicle that decelerates when the driver commands the vehicle to accelerate during a light rain. 😉
P.S.: Sorry about the formatting of the blog article. A WordPress update apparently disabled my custom CSS that expanded the article to the full width of the browser window – I am still in search of a fix for that problem.
Recent Comments