July 27, 2010
A question in an email from an ERP mailing list combined with Cary Millsap’s latest blog article inspired this blog article. The question from the ERP mailing list asked the following question:
Does anyone have Oracle syntax for the ‘upsert‘ command? I have found a few examples, but little success yet.
Using VB.net, I want to have one command which will see if data exists, and if yes, update, if no, then insert.
There are several ways to approach this particular problem, some of which may be more efficient than others. For example, assume that we have a table defined like this:
CREATE TABLE T2( ID NUMBER, COL2 NUMBER, COL3 NUMBER, COL4 NUMBER, PRIMARY KEY (ID));
Then we insert 5 rows using the following SQL statement (if you receive a primary key violation, just try executing the INSERT statement again) and then create a table that will allow us to quickly restore the original values for various repeated tests:
INSERT INTO T2 SELECT TRUNC(DBMS_RANDOM.VALUE(1,30)), TRUNC(DBMS_RANDOM.VALUE(1,1000)), TRUNC(DBMS_RANDOM.VALUE(1,1000)), TRUNC(DBMS_RANDOM.VALUE(1,1000)) FROM DUAL CONNECT BY LEVEL<=5; CREATE TABLE T2_BACKUP AS SELECT * FROM T2;
The five rows created by the above will have random numeric values in the COL2, COL3, and COL4 columns. The rows might look something like this:
SELECT * FROM T2 ORDER BY ID; ID COL2 COL3 COL4 -- ---------- ---------- ---------- 1 993 718 103 10 583 924 458 13 27 650 861 16 141 348 813 28 716 517 204
Now we want to fill in the missing rows, so that ID values 1 through 30 appear in the table, but if the row already exists, we will modify the column values as follows:
- COL2 will be set to a value of 0
- COL3 will be set to a value of COL2 + COL3
- COL4 will be set to a random value
How might we make these changes? Well, we might do something silly, as demonstrated by the following VB Script code (this code may be executed with wscript or cscript on the Windows platform – it is also compatible with Visual Basic 6 and the Excel macro language, but the late binding should be changed to early binding, and variable types should be declared):
Const adOpenKeyset = 1 Const adLockOptimistic = 3 Dim dbDatabase Dim dynData Dim intS_ID Dim intS_C2 Dim intS_C3 Dim intS_C4 Dim i Dim strSQL Dim strUsername Dim strPassword Dim strDatabase Set dbDatabase = CreateObject("ADODB.Connection") Set dynData = CreateObject("ADODB.Recordset") 'Database configuration strUsername = "MyUsername" strPassword = "MyPassword" strDatabase = "MyDB" dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";" dbDatabase.Open 'Should verify that the connection attempt was successful, but I will leave that for someone else to code dbDatabase.BeginTrans For i = 1 To 30 strSQL = "SELECT" & vbCrLf strSQL = strSQL & " ID," & vbCrLf strSQL = strSQL & " COL2," & vbCrLf strSQL = strSQL & " COL3," & vbCrLf strSQL = strSQL & " COL4" & vbCrLf strSQL = strSQL & "FROM" & vbCrLf strSQL = strSQL & " T2" & vbCrLf strSQL = strSQL & "WHERE" & vbCrLf strSQL = strSQL & " ID=" & CStr(i) dynData.Open strSQL, dbDatabase, adOpenKeyset, adLockOptimistic intS_ID = i intS_C2 = Int(Rnd * 1000) + 1 intS_C3 = Int(Rnd * 1000) + 1 intS_C4 = Int(Rnd * 1000) + 1 If Not (dynData.BOF And dynData.EOF) Then dynData("col2") = 0 dynData("col3") = dynData("col2") + dynData("col3") dynData("col4") = intS_C4 Else 'No row found, need to add dynData.AddNew dynData("id") = i dynData("col2") = intS_C2 dynData("col3") = intS_C3 dynData("col4") = intS_C4 End If dynData.Update dynData.Close Next dbDatabase.CommitTrans dbDatabase.Close Set dynData = Nothing Set dbDatabase = Nothing
There are a couple of problems with the above, beyond the lack of bind variable usage. At least 30 SQL statements are sent to the database. If a row is tested to exist (the recordset’s BOF and EOF properties are not both true) then the row’s values are updated, otherwise a row is inserted. This is the row by row (slow by slow) method of accomplishing the task. When the script is executed, the table contents might look like this:
SELECT * FROM T2 ORDER BY ID; ID COL2 COL3 COL4 -- ---------- ---------- ---------- 1 0 718 580 2 290 302 775 3 15 761 815 4 710 46 415 5 863 791 374 6 962 872 57 7 950 365 525 8 768 54 593 9 469 299 623 10 0 924 280 11 830 825 590 12 987 911 227 13 0 650 244 14 534 107 1000 15 677 16 576 16 0 348 799 17 285 46 296 18 383 301 949 19 980 402 279 20 161 163 647 21 411 413 713 22 327 634 208 23 187 584 81 24 458 906 262 25 786 379 290 26 920 632 628 27 429 98 562 28 0 517 835 29 23 544 917 30 431 678 503
Let’s return to the original starting point for table T2 so that we may try another test:
DELETE FROM T2; INSERT INTO T2 SELECT * FROM T2_BACKUP; COMMIT;
Let’s eliminate the majority of the 30+ SQL statements that are sent to the database by modifying the VBS script:
Const adOpenKeyset = 1 Const adLockOptimistic = 3 Dim dbDatabase Dim dynData Dim intS_ID Dim intS_C2 Dim intS_C3 Dim intS_C4 Dim i Dim intMissing(30) Dim strSQL Dim strUsername Dim strPassword Dim strDatabase Set dbDatabase = CreateObject("ADODB.Connection") Set dynData = CreateObject("ADODB.Recordset") 'Database configuration strUsername = "MyUsername" strPassword = "MyPassword" strDatabase = "MyDB" dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";" dbDatabase.Open 'Should verify that the connection attempt was successful, but I will leave that for someone else to code dbDatabase.BeginTrans strSQL = "SELECT" & vbCrLf strSQL = strSQL & " ID," & vbCrLf strSQL = strSQL & " COL2," & vbCrLf strSQL = strSQL & " COL3," & vbCrLf strSQL = strSQL & " COL4" & vbCrLf strSQL = strSQL & "FROM" & vbCrLf strSQL = strSQL & " T2" & vbCrLf strSQL = strSQL & "ORDER BY" & vbCrLf strSQL = strSQL & " ID" dynData.Open strSQL, dbDatabase, adOpenKeyset, adLockOptimistic For i = 1 To 30 intS_C2 = Int(Rnd * 1000) + 1 intS_C3 = Int(Rnd * 1000) + 1 intS_C4 = Int(Rnd * 1000) + 1 If Not (dynData.EOF) Then If i = CInt(dynData("id")) Then intMissing(i) = False dynData("col2") = 0 dynData("col3") = dynData("col2") + dynData("col3") dynData("col4") = intS_C4 dynData.Update dynData.MoveNext Else intMissing(i) = True End If Else intMissing(i) = True End If Next 'Add the missing rows For i = 1 To 30 intS_C2 = Int(Rnd * 1000) + 1 intS_C3 = Int(Rnd * 1000) + 1 intS_C4 = Int(Rnd * 1000) + 1 If intMissing(i) = True Then dynData.AddNew dynData("id") = i dynData("col2") = intS_C2 dynData("col3") = intS_C3 dynData("col4") = intS_C4 dynData.Update End If Next dynData.Close dbDatabase.CommitTrans dbDatabase.Close Set dynData = Nothing Set dbDatabase = Nothing
That certainly is better. Here is the output showing the table’s contents:
SELECT * FROM T2 ORDER BY ID; ID COL2 COL3 COL4 -- ---------- ---------- ---------- 1 0 718 580 2 405 270 56 3 244 980 61 4 391 365 490 5 156 475 258 6 629 543 157 7 939 655 507 8 391 108 784 9 460 754 597 10 0 924 280 11 74 106 332 12 129 1 537 13 0 650 244 14 82 192 679 15 455 358 150 16 0 348 799 17 90 758 402 18 462 493 208 19 330 96 590 20 170 928 98 21 444 273 873 22 751 273 674 23 257 90 31 24 323 791 298 25 236 481 255 26 341 45 483 27 207 865 589 28 0 517 835 29 543 81 635 30 411 961 115
Better, but not good enough. There are too many round-trips between the client and server. Let’s reset the T2 test table and try again:
DELETE FROM T2; INSERT INTO T2 SELECT * FROM T2_BACKUP; COMMIT;
A third attempt collapses a lot of client-side code into two SQL statement:
Dim dbDatabase Dim strSQL Dim strUsername Dim strPassword Dim strDatabase Set dbDatabase = CreateObject("ADODB.Connection") 'Database configuration strUsername = "MyUsername" strPassword = "MyPassword" strDatabase = "MyDB" dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";" dbDatabase.Open 'Should verify that the connection attempt was successful, but I will leave that for someone else to code dbDatabase.BeginTrans strSQL = "UPDATE" & vbCrLf strSQL = strSQL & " T2" & vbCrLf strSQL = strSQL & "SET" & vbCrLf strSQL = strSQL & " COL2=0," & vbCrLf strSQL = strSQL & " COL3=COL2+COL3," & vbCrLf strSQL = strSQL & " COL4=TRUNC(DBMS_RANDOM.VALUE(1,1000))" dbDatabase.Execute strSQL strSQL = "INSERT INTO" & vbCrLf strSQL = strSQL & " T2" & vbCrLf strSQL = strSQL & "SELECT" & vbCrLf strSQL = strSQL & " S.S_ID," & vbCrLf strSQL = strSQL & " S.S_C2," & vbCrLf strSQL = strSQL & " S.S_C3," & vbCrLf strSQL = strSQL & " S.S_C4" & vbCrLf strSQL = strSQL & "FROM" & vbCrLf strSQL = strSQL & " (SELECT" & vbCrLf strSQL = strSQL & " ROWNUM S_ID," & vbCrLf strSQL = strSQL & " TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2," & vbCrLf strSQL = strSQL & " TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3," & vbCrLf strSQL = strSQL & " TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4" & vbCrLf strSQL = strSQL & " FROM" & vbCrLf strSQL = strSQL & " DUAL" & vbCrLf strSQL = strSQL & " CONNECT BY" & vbCrLf strSQL = strSQL & " LEVEL<=30) S," & vbCrLf strSQL = strSQL & " T2" & vbCrLf strSQL = strSQL & "WHERE" & vbCrLf strSQL = strSQL & " S.S_ID=T2.ID(+)" & vbCrLf strSQL = strSQL & " AND T2.ID IS NULL" dbDatabase.Execute strSQL dbDatabase.CommitTrans Set dbDatabase = Nothing
Here is the output:
SELECT * FROM T2 ORDER BY ID; ID COL2 COL3 COL4 -- ---------- ---------- ---------- 1 0 1711 202 2 944 284 604 3 612 909 576 4 828 606 970 5 433 868 446 6 304 770 397 7 502 257 474 8 541 906 761 9 283 614 819 10 0 1507 841 11 772 52 635 12 325 45 792 13 0 677 320 14 691 433 234 15 733 673 416 16 0 489 483 17 257 50 99 18 429 861 108 19 244 4 858 20 323 697 493 21 565 384 960 22 211 153 651 23 762 231 488 24 85 994 204 25 630 235 930 26 890 778 374 27 64 540 663 28 0 1233 955 29 70 16 56 30 493 647 742
Look closely at the above output. Are you able to spot the “logic bug” in the first two code examples?
I like the above code sample, but we are able to improve it a bit by using a single SQL statement. First, let’s reset the test table again:
DELETE FROM T2; INSERT INTO T2 SELECT * FROM T2_BACKUP; COMMIT;
Now the code sample that uses a single SQL statement:
Dim dbDatabase Dim strSQL Dim strUsername Dim strPassword Dim strDatabase Set dbDatabase = CreateObject("ADODB.Connection") 'Database configuration strUsername = "MyUsername" strPassword = "MyPassword" strDatabase = "MyDB" dbDatabase.ConnectionString = "Provider=OraOLEDB.Oracle;Data Source=" & strDatabase & ";User ID=" & strUsername & ";Password=" & strPassword & ";" dbDatabase.Open 'Should verify that the connection attempt was successful, but I will leave that for someone else to code dbDatabase.BeginTrans strSQL = "MERGE INTO" & vbCrLf strSQL = strSQL & " T2" & vbCrLf strSQL = strSQL & "USING" & vbCrLf strSQL = strSQL & " (SELECT" & vbCrLf strSQL = strSQL & " ROWNUM S_ID," & vbCrLf strSQL = strSQL & " TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2," & vbCrLf strSQL = strSQL & " TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3," & vbCrLf strSQL = strSQL & " TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4" & vbCrLf strSQL = strSQL & " FROM" & vbCrLf strSQL = strSQL & " DUAL" & vbCrLf strSQL = strSQL & " CONNECT BY" & vbCrLf strSQL = strSQL & " LEVEL<=30) S" & vbCrLf strSQL = strSQL & "ON" & vbCrLf strSQL = strSQL & " (T2.ID=S.S_ID)" & vbCrLf strSQL = strSQL & "WHEN MATCHED THEN" & vbCrLf strSQL = strSQL & " UPDATE SET" & vbCrLf strSQL = strSQL & " T2.COL2=0," & vbCrLf strSQL = strSQL & " T2.COL3=T2.COL2+T2.COL3," & vbCrLf strSQL = strSQL & " T2.COL4=S.S_C4" & vbCrLf strSQL = strSQL & "WHEN NOT MATCHED THEN" & vbCrLf strSQL = strSQL & " INSERT (ID, COL2, COL3, COL4) VALUES" & vbCrLf strSQL = strSQL & " (S.S_ID," & vbCrLf strSQL = strSQL & " S.S_C2," & vbCrLf strSQL = strSQL & " S.S_C3," & vbCrLf strSQL = strSQL & " S.S_C4)" dbDatabase.Execute strSQL dbDatabase.CommitTrans Set dbDatabase = Nothing
The output of the above looks like this:
SELECT * FROM T2 ORDER BY ID; ID COL2 COL3 COL4 -- ---------- ---------- ---------- 1 0 1711 286 2 419 68 698 3 849 296 986 4 92 87 433 5 425 786 802 6 758 862 868 7 450 327 978 8 102 618 382 9 276 563 620 10 0 1507 629 11 292 591 300 12 521 599 941 13 0 677 438 14 182 905 135 15 716 121 964 16 0 489 165 17 552 661 95 18 332 572 255 19 126 624 463 20 906 422 368 21 328 141 886 22 286 612 685 23 375 868 904 24 240 940 768 25 4 166 447 26 942 754 124 27 547 828 225 28 0 1233 872 29 883 417 215 30 762 427 21
At this point you are probably wondering why I even bothered to use VBScript for such a simple SQL statement. Let’s reset the test table again:
DELETE FROM T2; INSERT INTO T2 SELECT * FROM T2_BACKUP; COMMIT;
If I was trying to be as efficient as possible, I probably should have just executed the following in SQL*Plus:
MERGE INTO T2 USING (SELECT ROWNUM S_ID, TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C2, TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C3, TRUNC(DBMS_RANDOM.VALUE(1,1000)) S_C4 FROM DUAL CONNECT BY LEVEL<=30) S ON (T2.ID=S.S_ID) WHEN MATCHED THEN UPDATE SET T2.COL2=0, T2.COL3=T2.COL2+T2.COL3, T2.COL4=S.S_C4 WHEN NOT MATCHED THEN INSERT (ID, COL2, COL3, COL4) VALUES (S.S_ID, S.S_C2, S.S_C3, S.S_C4);
The following shows the modifications made by the above:
SELECT * FROM T2 ORDER BY ID; ID COL2 COL3 COL4 -- ---------- ---------- ---------- 1 0 1711 849 2 502 487 567 3 273 966 847 4 236 544 198 5 191 970 986 6 820 316 468 7 833 651 82 8 46 583 368 9 63 685 148 10 0 1507 249 11 111 409 88 12 219 795 409 13 0 677 571 14 771 26 313 15 373 962 186 16 0 489 514 17 230 970 824 18 92 715 131 19 355 220 206 20 996 87 841 21 815 384 375 22 935 455 339 23 606 190 720 24 558 591 341 25 780 207 614 26 267 430 371 27 881 292 655 28 0 1233 70 29 379 466 628 30 293 216 881
We are certainly able to arrive at the correct answer many different ways (and the incorrect answer at least once), but what is the right way to achieve the task placed in front of us is not always easy to see. The MERGE INTO syntax is one that I have not used often enough, and probably deserves a greater investment of experimentation.
Have you found the logic bug with the first two code samples yet?
dynData("col2") = 0 dynData("col3") = dynData("col2") + dynData("col3") dynData("col4") = intS_C4
The above works correctly when the columns are updated in that order in a SQL statement, but VBScript requires a minor adjustment to produce the correct, expected results:
dynData("col3") = dynData("col2") + dynData("col3") dynData("col2") = 0 dynData("col4") = intS_C4
Picky, picky, picky… 🙂
Recent Comments