SQL – Programmatic Row By Row to MERGE INTO

27 07 2010

July 27, 2010

A question in an email from an ERP mailing list combined with Cary Millsap’s latest blog article inspired this blog article.  The question from the ERP mailing list asked the following question:

Does anyone have Oracle syntax for the ‘upsert‘ command?  I have found a few examples, but little success yet.

Using VB.net, I want to have one command which will see if data exists, and if yes, update, if no, then insert.

There are several ways to approach this particular problem, some of which may be more efficient than others.  For example, assume that we have a table defined like this:

CREATE TABLE T2(
  ID NUMBER,
  COL2 NUMBER,
  COL3 NUMBER,
  COL4 NUMBER,
  PRIMARY KEY (ID));

Then we insert 5 rows using the following SQL statement (if you receive a primary key violation, just try executing the INSERT statement again) and then create a table that will allow us to quickly restore the original values for various repeated tests:

INSERT INTO
  T2
SELECT
  TRUNC(DBMS_RANDOM.VALUE(1,30)),
  TRUNC(DBMS_RANDOM.VALUE(1,1000)),
  TRUNC(DBMS_RANDOM.VALUE(1,1000)),
  TRUNC(DBMS_RANDOM.VALUE(1,1000))
FROM
  DUAL
CONNECT BY
  LEVEL<=5;

CREATE TABLE
  T2_BACKUP
AS
SELECT
  *
FROM
  T2;

The five rows created by the above will have random numeric values in the COL2, COL3, and COL4 columns.  The rows might look something like this:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1        993        718        103
10        583        924        458
13         27        650        861
16        141        348        813
28        716        517        204

Now we want to fill in the missing rows, so that ID values 1 through 30 appear in the table, but if the row already exists, we will modify the column values as follows:

  • COL2 will be set to a value of 0
  • COL3 will be set to a value of COL2 + COL3
  • COL4 will be set to a random value

How might we make these changes?  Well, we might do something silly, as demonstrated by the following VB Script code (this code may be executed with wscript or cscript on the Windows platform – it is also compatible with Visual Basic 6 and the Excel macro language, but the late binding should be changed to early binding, and variable types should be declared):

Const adOpenKeyset = 1
Const adLockOptimistic = 3

Dim dbDatabase
Dim dynData
Dim intS_ID
Dim intS_C2
Dim intS_C3
Dim intS_C4
Dim i
Dim strSQL
Dim strUsername
Dim strPassword
Dim strDatabase

Set dbDatabase = CreateObject("ADODB.Connection")
Set dynData = CreateObject("ADODB.Recordset")

'Database configuration
strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

dbDatabase.BeginTrans

For i = 1 To 30
    strSQL = "SELECT" & vbCrLf
    strSQL = strSQL & "  ID," & vbCrLf
    strSQL = strSQL & "  COL2," & vbCrLf
    strSQL = strSQL & "  COL3," & vbCrLf
    strSQL = strSQL & "  COL4" & vbCrLf
    strSQL = strSQL & "FROM" & vbCrLf
    strSQL = strSQL & "  T2" & vbCrLf
    strSQL = strSQL & "WHERE" & vbCrLf
    strSQL = strSQL & "  ID=" & CStr(i)

    dynData.Open strSQL, dbDatabase, adOpenKeyset, adLockOptimistic

    intS_ID = i
    intS_C2 = Int(Rnd * 1000) + 1
    intS_C3 = Int(Rnd * 1000) + 1
    intS_C4 = Int(Rnd * 1000) + 1

    If Not (dynData.BOF And dynData.EOF) Then
        dynData("col2") = 0
        dynData("col3") = dynData("col2") + dynData("col3")
        dynData("col4") = intS_C4
    Else
        'No row found, need to add
        dynData.AddNew

        dynData("id") = i
        dynData("col2") = intS_C2
        dynData("col3") = intS_C3
        dynData("col4") = intS_C4
    End If
    dynData.Update

    dynData.Close
Next

dbDatabase.CommitTrans

dbDatabase.Close

Set dynData = Nothing
Set dbDatabase = Nothing

There are a couple of problems with the above, beyond the lack of bind variable usage.  At least 30 SQL statements are sent to the database.  If a row is tested to exist (the recordset’s BOF and EOF properties are not both true) then the row’s values are updated, otherwise a row is inserted.  This is the row by row (slow by slow) method of accomplishing the task.  When the script is executed, the table contents might look like this:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0        718        580
 2        290        302        775
 3         15        761        815
 4        710         46        415
 5        863        791        374
 6        962        872         57
 7        950        365        525
 8        768         54        593
 9        469        299        623
10          0        924        280
11        830        825        590
12        987        911        227
13          0        650        244
14        534        107       1000
15        677         16        576
16          0        348        799
17        285         46        296
18        383        301        949
19        980        402        279
20        161        163        647
21        411        413        713
22        327        634        208
23        187        584         81
24        458        906        262
25        786        379        290
26        920        632        628
27        429         98        562
28          0        517        835
29         23        544        917
30        431        678        503

Let’s return to the original starting point for table T2 so that we may try another test:

DELETE FROM T2;

INSERT INTO
  T2
SELECT
  *
FROM
  T2_BACKUP;

COMMIT;

Let’s eliminate the majority of the 30+ SQL statements that are sent to the database by modifying the VBS script:

Const adOpenKeyset = 1
Const adLockOptimistic = 3

Dim dbDatabase
Dim dynData
Dim intS_ID
Dim intS_C2
Dim intS_C3
Dim intS_C4
Dim i
Dim intMissing(30)
Dim strSQL
Dim strUsername
Dim strPassword
Dim strDatabase

Set dbDatabase = CreateObject("ADODB.Connection")
Set dynData = CreateObject("ADODB.Recordset")

'Database configuration
strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

dbDatabase.BeginTrans

strSQL = "SELECT" & vbCrLf
strSQL = strSQL & "  ID," & vbCrLf
strSQL = strSQL & "  COL2," & vbCrLf
strSQL = strSQL & "  COL3," & vbCrLf
strSQL = strSQL & "  COL4" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "ORDER BY" & vbCrLf
strSQL = strSQL & "  ID"

dynData.Open strSQL, dbDatabase, adOpenKeyset, adLockOptimistic

For i = 1 To 30
    intS_C2 = Int(Rnd * 1000) + 1
    intS_C3 = Int(Rnd * 1000) + 1
    intS_C4 = Int(Rnd * 1000) + 1

    If Not (dynData.EOF) Then
        If i = CInt(dynData("id")) Then
            intMissing(i) = False
            dynData("col2") = 0
            dynData("col3") = dynData("col2") + dynData("col3")
            dynData("col4") = intS_C4
            dynData.Update

            dynData.MoveNext
        Else
            intMissing(i) = True
        End If
    Else
        intMissing(i) = True
    End If
Next

'Add the missing rows
For i = 1 To 30
    intS_C2 = Int(Rnd * 1000) + 1
    intS_C3 = Int(Rnd * 1000) + 1
    intS_C4 = Int(Rnd * 1000) + 1

    If intMissing(i) = True Then
        dynData.AddNew
        dynData("id") = i
        dynData("col2") = intS_C2
        dynData("col3") = intS_C3
        dynData("col4") = intS_C4
        dynData.Update
    End If
Next

dynData.Close
dbDatabase.CommitTrans

dbDatabase.Close
Set dynData = Nothing
Set dbDatabase = Nothing

That certainly is better.  Here is the output showing the table’s contents:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0        718        580
 2        405        270         56
 3        244        980         61
 4        391        365        490
 5        156        475        258
 6        629        543        157
 7        939        655        507
 8        391        108        784
 9        460        754        597
10          0        924        280
11         74        106        332
12        129          1        537
13          0        650        244
14         82        192        679
15        455        358        150
16          0        348        799
17         90        758        402
18        462        493        208
19        330         96        590
20        170        928         98
21        444        273        873
22        751        273        674
23        257         90         31
24        323        791        298
25        236        481        255
26        341         45        483
27        207        865        589
28          0        517        835
29        543         81        635
30        411        961        115

Better, but not good enough.  There are too many round-trips between the client and server.  Let’s reset the T2 test table and try again:

DELETE FROM T2;

INSERT INTO
  T2
SELECT
  *
FROM
  T2_BACKUP;

COMMIT;

A third attempt collapses a lot of client-side code into two SQL statement:

Dim dbDatabase

Dim strSQL
Dim strUsername
Dim strPassword
Dim strDatabase

Set dbDatabase = CreateObject("ADODB.Connection")

'Database configuration
strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

dbDatabase.BeginTrans

strSQL = "UPDATE" & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "SET" & vbCrLf
strSQL = strSQL & "  COL2=0," & vbCrLf
strSQL = strSQL & "  COL3=COL2+COL3," & vbCrLf
strSQL = strSQL & "  COL4=TRUNC(DBMS_RANDOM.VALUE(1,1000))"
dbDatabase.Execute strSQL

strSQL = "INSERT INTO" & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "SELECT" & vbCrLf
strSQL = strSQL & "  S.S_ID," & vbCrLf
strSQL = strSQL & "  S.S_C2," & vbCrLf
strSQL = strSQL & "  S.S_C3," & vbCrLf
strSQL = strSQL & "  S.S_C4" & vbCrLf
strSQL = strSQL & "FROM" & vbCrLf
strSQL = strSQL & "  (SELECT" & vbCrLf
strSQL = strSQL & "    ROWNUM S_ID," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4" & vbCrLf
strSQL = strSQL & "  FROM" & vbCrLf
strSQL = strSQL & "    DUAL" & vbCrLf
strSQL = strSQL & "  CONNECT BY" & vbCrLf
strSQL = strSQL & "    LEVEL<=30) S," & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "WHERE" & vbCrLf
strSQL = strSQL & "  S.S_ID=T2.ID(+)" & vbCrLf
strSQL = strSQL & "  AND T2.ID IS NULL"
dbDatabase.Execute strSQL

dbDatabase.CommitTrans

Set dbDatabase = Nothing

Here is the output:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0       1711        202
 2        944        284        604
 3        612        909        576
 4        828        606        970
 5        433        868        446
 6        304        770        397
 7        502        257        474
 8        541        906        761
 9        283        614        819
10          0       1507        841
11        772         52        635
12        325         45        792
13          0        677        320
14        691        433        234
15        733        673        416
16          0        489        483
17        257         50         99
18        429        861        108
19        244          4        858
20        323        697        493
21        565        384        960
22        211        153        651
23        762        231        488
24         85        994        204
25        630        235        930
26        890        778        374
27         64        540        663
28          0       1233        955
29         70         16         56
30        493        647        742

Look closely at the above output.  Are you able to spot the “logic bug” in the first two code examples?

I like the above code sample, but we are able to improve it a bit by using a single SQL statement.  First, let’s reset the test table again:

DELETE FROM T2;

INSERT INTO
  T2
SELECT
  *
FROM
  T2_BACKUP;

COMMIT;

Now the code sample that uses a single SQL statement:

Dim dbDatabase

Dim strSQL
Dim strUsername
Dim strPassword
Dim strDatabase

Set dbDatabase = CreateObject("ADODB.Connection")

'Database configuration
strUsername = "MyUsername"
strPassword = "MyPassword"
strDatabase = "MyDB"

dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";"
dbDatabase.Open
'Should verify that the connection attempt was successful, but I will leave that for someone else to code

dbDatabase.BeginTrans

strSQL = "MERGE INTO" & vbCrLf
strSQL = strSQL & "  T2" & vbCrLf
strSQL = strSQL & "USING" & vbCrLf
strSQL = strSQL & "  (SELECT" & vbCrLf
strSQL = strSQL & "    ROWNUM S_ID," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3," & vbCrLf
strSQL = strSQL & "    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4" & vbCrLf
strSQL = strSQL & "  FROM" & vbCrLf
strSQL = strSQL & "    DUAL" & vbCrLf
strSQL = strSQL & "  CONNECT BY" & vbCrLf
strSQL = strSQL & "    LEVEL<=30) S" & vbCrLf
strSQL = strSQL & "ON" & vbCrLf
strSQL = strSQL & "  (T2.ID=S.S_ID)" & vbCrLf
strSQL = strSQL & "WHEN MATCHED THEN" & vbCrLf
strSQL = strSQL & "  UPDATE SET" & vbCrLf
strSQL = strSQL & "    T2.COL2=0," & vbCrLf
strSQL = strSQL & "    T2.COL3=T2.COL2+T2.COL3," & vbCrLf
strSQL = strSQL & "    T2.COL4=S.S_C4" & vbCrLf
strSQL = strSQL & "WHEN NOT MATCHED THEN" & vbCrLf
strSQL = strSQL & "  INSERT (ID, COL2, COL3, COL4) VALUES" & vbCrLf
strSQL = strSQL & "    (S.S_ID," & vbCrLf
strSQL = strSQL & "    S.S_C2," & vbCrLf
strSQL = strSQL & "    S.S_C3," & vbCrLf
strSQL = strSQL & "    S.S_C4)"
dbDatabase.Execute strSQL

dbDatabase.CommitTrans

Set dbDatabase = Nothing

The output of the above looks like this:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0       1711        286
 2        419         68        698
 3        849        296        986
 4         92         87        433
 5        425        786        802
 6        758        862        868
 7        450        327        978
 8        102        618        382
 9        276        563        620
10          0       1507        629
11        292        591        300
12        521        599        941
13          0        677        438
14        182        905        135
15        716        121        964
16          0        489        165
17        552        661         95
18        332        572        255
19        126        624        463
20        906        422        368
21        328        141        886
22        286        612        685
23        375        868        904
24        240        940        768
25          4        166        447
26        942        754        124
27        547        828        225
28          0       1233        872
29        883        417        215
30        762        427         21

At this point you are probably wondering why I even bothered to use VBScript for such a simple SQL statement.  Let’s reset the test table again:

DELETE FROM T2;

INSERT INTO
  T2
SELECT
  *
FROM
  T2_BACKUP;

COMMIT;

If I was trying to be as efficient as possible, I probably should have just executed the following in SQL*Plus:

MERGE INTO
  T2
USING
  (SELECT
    ROWNUM S_ID,
    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2,
    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3,
    TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4
  FROM
    DUAL
  CONNECT BY
    LEVEL<=30) S
ON
  (T2.ID=S.S_ID)
WHEN MATCHED THEN
  UPDATE SET
    T2.COL2=0,
    T2.COL3=T2.COL2+T2.COL3,
    T2.COL4=S.S_C4
WHEN NOT MATCHED THEN
  INSERT (ID, COL2, COL3, COL4) VALUES
    (S.S_ID,
    S.S_C2,
    S.S_C3,
    S.S_C4);

The following shows the modifications made by the above:

SELECT
  *
FROM
  T2
ORDER BY
  ID;

ID       COL2       COL3       COL4
-- ---------- ---------- ----------
 1          0       1711        849
 2        502        487        567
 3        273        966        847
 4        236        544        198
 5        191        970        986
 6        820        316        468
 7        833        651         82
 8         46        583        368
 9         63        685        148
10          0       1507        249
11        111        409         88
12        219        795        409
13          0        677        571
14        771         26        313
15        373        962        186
16          0        489        514
17        230        970        824
18         92        715        131
19        355        220        206
20        996         87        841
21        815        384        375
22        935        455        339
23        606        190        720
24        558        591        341
25        780        207        614
26        267        430        371
27        881        292        655
28          0       1233         70
29        379        466        628
30        293        216        881

We are certainly able to arrive at the correct answer many different ways (and the incorrect answer at least once), but what is the right way to achieve the task placed in front of us is not always easy to see.  The MERGE INTO syntax is one that I have not used often enough, and probably deserves a greater investment of experimentation.

Have you found the logic bug with the first two code samples yet?

dynData("col2") = 0
dynData("col3") = dynData("col2") + dynData("col3")
dynData("col4") = intS_C4

The above works correctly when the columns are updated in that order in a SQL statement, but VBScript requires a minor adjustment to produce the correct, expected results:

dynData("col3") = dynData("col2") + dynData("col3")
dynData("col2") = 0
dynData("col4") = intS_C4

Picky, picky, picky…  🙂