June 24, 2016
I have been working with an Enterprise Resource Planning (ERP) system platform for a bit over 16 years. Through various methods of observation, including the use of 10046 extended Oracle SQL traces, Process Monitor traces, Wireshark traces, and just general observation of the various components of the ERP system, I noticed a strong emphasis on code modularization. In some cases, excessive modularization that causes interesting patterns to appear in Oracle SQL traces and Process Monitor traces. For example, a Process Monitor trace might show repeated patterns: access to the same Windows registry entries, followed by file path searches to access the same set of files, followed by communication with the Oracle Database server, followed by accesses to additional Windows registry entries, and additional files – this sequence of events might repeat hundreds, thousands, or millions of times as if the program were executing in a tight loop.
I am currently in the role of a senior developer, with opportunities for Oracle DBA work, direct support of the same ERP system, Citrix Server support, and whatever else requires attention from a software perspective (plus I have the opportunity to work with a great team of fellow IT people and dabble a bit with the hardware side as well). This transition took place a couple of months ago, right around the time of one of my previous blog posts. I have written a couple of interesting programs in the last couple of months, including a system for the Maintenance department to use to track planned and unplanned equipment maintenance, tightly integrating into the ERP system. I have also spent some time reviewing the programming source code and modifying programs written by a fairly well known consulting firm that specializes in writing custom software for the ERP system.
The developer of those programs was brilliant in his programming style – so brilliant, that as a mere senior developer I have difficulty tracking some of his programs’ execution from one black box procedure to another, with some variables being passed by memory address location and modified in the second black box, and other variables packaged into a new variable as comma delimited lists that are then passed into the second black box. The second black box may call a third black box that separates the comma delimited list passed in by the first black box. The second black box, now having the de-delimited list of variables, may then pass a SQL statement to another black box to retrieve a value from the database, and then call that same black box again to retrieve another value from the same table row in the database. This programmer was brilliant, burying some important code three, four, five, or six black box levels deep, far beyond my capability as an outside developer to track the program execution (at least without instrumenting the code to write to a log file, indicating I am now in black box ABCD, I received values “M,N,O,P,Q” and translated that into R=M, S=O, T=P, U=Q, and am preparing to execute black box EFGH passing in variables J, K, and L). It is a brilliant design, until someone has to debug the execution. Why did this label print claiming that there were 50 parts in this weighed parts bin, when the parts bin next to it from the previous work order lot, filled to about the same level with the same type of part, has a label indicating that its part bin contained 390 parts? (This is an actual problem that I investigated in the last week – buried deeply in the black box design I found the culprit, a SQL statement with ROWNUM in the WHERE clause where the SQL statement also contained an ORDER BY clause.)
While debugging another programming project written by the same consulting firm, I found an interesting example of brilliant programming style and crafting black boxes that left me wondering, is brilliant programming style and crafting black boxes more important than application performance? Is not inefficiencies in application performance a bug?
I stumbled across a short code block in the other programming project that was retrieving information from an Oracle database. As an outsider, I thought that roughly 21.5 seconds was an excessive amount of time for this short code block to execute, retrieving information about 3,083 objects in a database table and populating a treeview control in the user interface. Here is the VB.Net code that I found:
Private Sub TubsToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles TubsToolStripMenuItem.Click Try Dim i As Integer Dim strSQL As String, rs As ADODB.Recordset, rs2 As ADODB.Recordset Dim Anchor As Point = New Point(0, MenuStrip1.Height) Dim rsTub As Data.DataTable Dim dt As Data.DataRow Dim OldCursor As Windows.Forms.Cursor OldCursor = Windows.Forms.Cursor.Current Windows.Forms.Cursor.Current = Windows.Forms.Cursors.WaitCursor Label1.Visible = False Label2.Visible = False DataGridView1.Visible = False DataGridView2.Visible = False Button1.Visible = False Button3.Visible = False btnAddNew.Visible = False ' ' Display the information for all the tubs. ' tv1.BeginUpdate() tv1.Nodes.Clear() ' ' Add the root level node. ' tv1.Nodes.Add("Tubs") ' ' Get the information from the database. ' strSQL = "Select Distinct TubID from MTS_TubAttributes" rs = DBSelect(strSQL) ' ' Add this information to rsTub. ' rsTub = New Data.DataTable rsTub.Columns.Add("TUBID", Type.GetType("System.String")) rsTub.Columns.Add("TUB", Type.GetType("System.Int32")) While Not rs.EOF dt = rsTub.NewRow dt("TUBID") = rs("TubID").Value dt("TUB") = CDbl(rs("TubID").Value) rsTub.Rows.Add(dt) rs.MoveNext() End While Call SafeRSCloseAndClean(rs) ' ' Done with that, sort the tubs by tub ' Dim strExpr As String Dim strSort As String strExpr = "TubID IS NOT NULL" ' Sort descending by column named CompanyName. strSort = "Tub ASC" Dim foundRows() As DataRow ' Use the Select method to find all rows matching the filter. foundRows = rsTub.Select(strExpr, strSort) For i = 0 To foundRows.GetUpperBound(0) tv1.Nodes(0).Nodes.Add(foundRows(i).Item("TubID")) ' ' Add the details. ' strSQL = "Select * from MTS_TubAttributes Where TubID = '" & foundRows(i).Item("tubID") & "' Order By AttributeID" rs2 = DBSelect(strSQL) While Not rs2.EOF tv1.Nodes(0).Nodes(i).Nodes.Add(rs2("AttributeID").Value & " - " & rs2("AttributeValue").Value) rs2.MoveNext() End While Call SafeRSCloseAndClean(rs2) Next foundRows = Nothing rsTub.Dispose() tv1.EndUpdate() MakeTV_Visible() Windows.Forms.Cursor.Current = OldCursor Catch ex As Exception Call LogError("TubsMenuItem", Err.Number, Err.Description) End Try End Sub
Is that brilliant style with beautiful black boxes?
- Create a SQL statement to select a distinct list of TUBID objects from the MTS_TUBATTRIBUTES table
- Pass that SQL statement to a black box to retrieve a recordset from the database
- Manipulate the results into a sorted list
- Loop through the sorted list building SQL statements (with literals rather than bind variables) to retrieve the attributes rows that describe the TUBID from the same MTS_TUBATTRIBUTES table
- Pass the generated SQL statement to a black box to retrieve a second recordset from the database
- Create the treeview control nodes from the second recordset rows.
- Close the second recordset by passing it to another black box.
- Jump back to step 4 another 3,082 times.
Now imagine, instead of the program running on a computer that is on the same local network as the database server (with a typical ping time of less that 0.001 seconds), what would happen if this program were run on a computer that has a WAN connection to the database server (let’s assume a 0.060 second, or 60ms typical ping time). In a best case scenario (which is not possible), just sending the 3,083 distinct SQL statements to the database to be parsed (without waiting for a recordset to be returned) will waste at least 185 seconds (3,083 * 0.060 = 184.98), even if it takes the Oracle Database server 0.000000 seconds to hard parse each of the 3,083 unique SQL statements. That is a fantastic way to look busy without actually completing much, if any, useful work.
I have a hard time accepting inefficient design as anything but a bug, even if it requires deviating from brilliant style with fantastic black boxes, so I retrofitted the brilliant style using something that I seem to recall being called “control break logic”. I decreased the 3,084 SQL statement count to a single SQL statement, and pre-expanded the top-most node in the treeview control to save the end user one extra click. I did not remove the DBSelect or the SafeRSCloseAndClean black boxes (one might wonder if the DBSelect black box is opening a database connection to send the SQL statement to the database, and the SafeRSCloseAndClean black box is closing that database connection), nor did I convert the code to use Oracle’s .Net Oracle.DataAccess objects rather than ADO, or clean up much of the formatting to match my typical conventions. Here is the end result of the “control break logic” implementation:
Private Sub TubsToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles TubsToolStripMenuItem.Click Try Dim i As Integer Dim intNodeCount As Integer = -1 Dim strLastTub As String = "" Dim strSQL As String, rs As ADODB.Recordset Dim Anchor As Point = New Point(0, MenuStrip1.Height) Dim OldCursor As Windows.Forms.Cursor OldCursor = Windows.Forms.Cursor.Current Windows.Forms.Cursor.Current = Windows.Forms.Cursors.WaitCursor Label1.Visible = False Label2.Visible = False DataGridView1.Visible = False DataGridView2.Visible = False Button1.Visible = False Button3.Visible = False btnAddNew.Visible = False ' ' Display the information for all the tubs. ' tv1.BeginUpdate() tv1.Nodes.Clear() ' ' Add the root level node. ' tv1.Nodes.Add("Tubs") ' ' Get the information from the database. ' strSQL = "SELECT" & vbCrLf strSQL = strSQL & " *" & vbCrLf strSQL = strSQL & "FROM" & vbCrLf strSQL = strSQL & " MTS_TUBATTRIBUTES" & vbCrLf strSQL = strSQL & "WHERE" & vbCrLf strSQL = strSQL & " TUBID IS NOT NULL" & vbCrLf strSQL = strSQL & "ORDER BY" & vbCrLf strSQL = strSQL & " TO_NUMBER(TUBID)," & vbCrLf strSQL = strSQL & " ATTRIBUTEID," & vbCrLf strSQL = strSQL & " ATTRIBUTEVALUE" rs = DBSelect(strSQL) strLastTub = "" Do While Not (rs.EOF) If strLastTub <> rs("tubid").Value Then 'New tubid intNodeCount = intNodeCount + 1 tv1.Nodes(0).Nodes.Add(rs("TubID").Value) strLastTub = rs("tubid").Value End If tv1.Nodes(0).Nodes(intNodeCount).Nodes.Add(rs("AttributeID").Value & " - " & rs("AttributeValue").Value) rs.MoveNext() Loop Call SafeRSCloseAndClean(rs) If tv1.Nodes(0).Nodes.Count > 2 Then tv1.Nodes(0).Expand() End If tv1.EndUpdate() MakeTV_Visible() Windows.Forms.Cursor.Current = OldCursor Catch ex As Exception Call LogError("TubsMenuItem", Err.Number, Err.Description) End Try End Sub
Did the above code change result in a performance improvement? The client computer (which is a bit slow), completed the above procedure in roughly 5.2 seconds, a pretty healthy improvement from the original 21.5 seconds observed with the original code. The result would have been even more impressive running over a WAN connection that has a 0.60 second (60ms) ping time – the 184.98 seconds wasted just sending the 3,083 distinct SQL statements to the database to be parsed (actual execution and retrieval of the resultsets would have significantly added to that time) was completely eliminated, and properly setting the array fetch size (rather than leaving it set at the ADO default) would further enhanced performance over the WAN connection.
Maybe a 4.13 factor improvement in performance is not significant enough to consider this change as an improvement? As a fun experiment, I commented out the following lines in the code of my modified version of the code so that the treeview control is not populated:
tv1.Nodes(0).Nodes.Add(rs("TubID").Value) tv1.Nodes(0).Nodes(intNodeCount).Nodes.Add(rs("AttributeID").Value & " - " & rs("AttributeValue").Value)
I then re-timed the execution of the modified procedure – it now completes in less than 0.2 seconds. With that in mind, on this particular computer populating the treeview probably takes about 4.9 seconds to complete (4.9 + 0.2 seconds is within 0.1 seconds of the measured time for the original execution of modified procedure) regardless if the code is running in the original unmodified procedure or my modified version of that procedure. If we subtract out the uncontrollable treeview update time (there is a way to improve this performance further), then the unmodified procedure completes in 21.5 – 4.9 = 16.6 seconds, while my modified version of the same procedure completes in less than 0.2 seconds, so an actual 83 factor improvement when the program is run on a computer that is located in the same LAN as the Oracle Database server. Something is bugging me, but it is not this procedure’s performance problem any longer.
Thoughts? Is excessive code procedure modularization a goal to achieve? Is excessive code procedure modularization something to avoid as much as possible? Or is there a happy medium (a play on words here, using this definition: a person claiming to be in contact with the spirits of the dead and to communicate between the dead and the living) to the headache that excessive code procedure modularization seems to cause for me?
This level of brilliance is certainly not unique. Many years ago we saw an app that was so brilliant, it had won an award for its brilliant design and execution. Unfortunately for the users the design did not include performance. Moving the app to the database server (that was a hard sell with my fellow DBA’s) resulted in an approximately 35% increase in performance.
A rather good PL/SQL developer was also at this company, and he and I looked at the app and determined we could rewrite the entire core data processing part of it in PL/SQL in about 2 months. The performance increase would have been rather noticeable.
The owner of the award-winning app was not interested.
I can tell similar stories in a Java / J2EE shop. A brilliant Java programmer wrote code for us that checked all the boxes on using the latest and greatest frameworks: the maintenance programmer just wrote everything over so he could understand it. This same brilliant guy came back as a technical architect. When I asked him if he still did some programming, he said he couldn’t because he had not kept up with the new frameworks. So maintenance programmers need to become framework historians, or even framework archeologists, to even hope to understand the code they are responsible for?
One of the principles of these frameworks is the “abstraction layer” – another term for Charles’ black boxes. These layers make coding easier, I suppose, but make testing and debugging infinitely harder because you have to drill through multiple layers to see what is going on. Now, since we spend a lot more time testing and debugging than coding…
Hello,
In my naive view of software development, Supportability and Maintainability are significantly more important than easier coding (code reuse) or demonstrated “brilliance” or in many, if not most, cases performance. As Mr. Ashton notes (that is, if he’s not being sarcastic, sometimes hard to tell in text) “since we spend a lot more time testing and debugging…” I would emphasize this point when discussing code written by a consultant, who will not be supporting and maintaining the code long term. There is an slight chance that a developer who writes brilliant code as you describe but is forced to support and maintain this code might learn that it is just not worth it…