July 6, 2011
Since there were so many unique solutions to the last blog article that posed a SQL challenge, I thought that I would try another blog article that asks a similar type of question. Assume that someone showed you the following output:
C2 D --- --- 100 0 150 50 200 50 201 1 300 99 350 50 400 50 500 100
You have the following table definition, and rows in the table:
CREATE TABLE T2 ( C1 NUMBER, C2 NUMBER); INSERT INTO T2 VALUES (1,100); INSERT INTO T2 VALUES (4,150); INSERT INTO T2 VALUES (7,200); INSERT INTO T2 VALUES (8,201); INSERT INTO T2 VALUES (10,300); INSERT INTO T2 VALUES (14,350); INSERT INTO T2 VALUES (18,400); INSERT INTO T2 VALUES (24,500); COMMIT;
Assume that you know nothing other than the fact that the C2 values are listed in ascending order when sorted by column C1. How many different ways can this particular problem be solved. Yes, there is an easy way, but assume that you were trying to “help educate” the person who provided the requested output.
My least-shortest-path solution follows:
SELECT C2, 0 D FROM T2 WHERE C1=(SELECT MIN(C1) FROM T2) UNION ALL SELECT V2.C2, V2.C2-MAX(T2.C2) D FROM T2, (SELECT C1, C2 FROM T2) V2 WHERE T2.C1<V2.C1 GROUP BY V2.C2 ORDER BY C2; C2 D ---- ---- 100 0 150 50 200 50 201 1 300 99 350 50 400 50 500 100 8 rows selected.
In the above, the row with the 0 in the D column was the hardest part of the solution. Why would I use UNION ALL and not UNION – what was not in the specification?
This blog article was inspired by an old question found in a Usenet group from 1998 – if you were answering the question in 1998, would your answer be any different? Be creative with your solution. While you are thinking about a solution, take a look at this old Usenet thread and consider how difficult it was to find the “50 highest paid workers” in the last century.
In Oracle, obviously a job for analytics.
Stew,
Nice example, and very close to what I had in mind when I wrote “Yes, there is an easy way”:
As can be seen by the above, I forgot that it was possible to supply a default value for the result of the LAG function on the first row.
Any other solutions? Possibly something that either uses different analytic functions, or no analytic functions at all (for those pre-Oracle 8i Enterprise Edition databases / pre-Oracle 9i Standard Edition databases?
I just realized that in the above example using the LAG analytic funtion, I *should* have ordered the rows by column C1 rather than C2 – the way that Stew ordered the rows was the *intended* ordering. The above SQL statement was an example of a quick note that I made while writing the article. In this case, Stew’s example has a couple of advantages over the LAG example that I had in mind.
Probably, everything what can be done with analytics, can as well be done with joins.
Assumed, lag would be not available,
you get that results with
Best regards
Maxim
Maxim,
That is a bit more efficient than what I posted in the blog article – nice example.
So far, we have:
* An example that demonstrates a query similar to what someone might write, if that person knows about UNION ALL, subqueries in the WHERE clause, and inline views, but not the syntax for an Oracle specific outer join (possibly the SQL in 10 Minutes book reader… that was me at one point, but I do not think that the book covers inline views, so that part of the SQL statement would need to be collapsed into the main query)
* A potentially efficient SQL statement using a self-join that works with both older and newer Oracle Database release versions
* A potentially (more) efficient SQL statement using the LAG analytic function to read the previous row’s value in a column that works with more recent Oracle Database release versions
Any other solutions? (yes!)
Scalar subquery rather than a regular join or inline view
Gary,
Nice example with a scalar subquery.
Are there any other techniques for solving this problem? Ask yourself, “What if Oracle does not have this capability, what would I do different?” If we set efficiency aside for a moment, and assume that the following features were not supported (or a developer did not know that these were possibilities), how else could this problem be solved:
* Scalar subqueries (Gary’s example)
* LAG/LEAD analytic functions to peek at previous and next row values for a column (Stew’s example)
* Self join (Maxim’s example)
For instance, could the following be used as a starting point for a solution?:
Note in the above that I used ROW_NUMBER rather than RANK or DENSE_RANK – the specification provided at the start of this article does not rule out the possibility of rows with duplicate values.
From your starting point Charles, I came up with the following:
It’s also possible to generate the result by unioning the table with itself, but generating row numbers for the two sets that are out of sequence with each-other by one. Then pivoting the results to get the values from each set as separate columns and then subtracting the column values:
ROW_NUMBER could have been used instead of the rownum … order by, but I wanted to construct an example without analytic functions
Chris,
Nice job finishing up the starting point that I left as a challenge. The fact that PRIOR can be used in the SELECT clause when using a CONNECT BY clause in the query probably is not well known (I occasionally forget the syntax).
Complicating your second example is the fact that the C1 values in the table have gaps in the sequence – this was intentional so that people could not create a simple self join (outer-join) that specified T1.C1 = T2.C1+1. What you posted is a very good work-around for the lack of gapless sequential values, and is a solution that I had not previously considered as a solution for this problem.
—
Have we listed all of the possible SQL only solutions at this point?
Well, since you were so nice to my first solution, here’s another:
As in two other solutions, you need gapless sequential numbers to navigate to the previous row.
By the way, I didn’t know about that use of PRIOR, thanks Chris!
Stew,
That is an interesting solution.
I have heard of the model clause, but I have never used it. I need to spend some time finding uses for the model clause to fully understand how it works and when to use it.
Here is a variation of Chris’ second solution. I use a cartesian join instead of union all (which should save a full scan) and I don’t really pivot since I can just make C2 negative when needed and use SUM.
select max(c2) c2, sum(decode(pass,1,decode(rn,1,0,C2),-C2)) D from
(select level pass from DUAL connect by level <= 2) a,
(select rownum RN, T2.* from T2)
group by RN+pass
having SUM(PASS) <> 2
order by max(C1);
On “when to use the Model clause”, here is a link to a problem where the MODEL clause is the solution I prefer.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:13946369553642#3478381500346951056
Not sure what happened there, should be “having sum(pass) not equal to 2”
…and now miraculously the (not equals) sign has reappeared !?
I saw your comment and quietly fixed the SQL statement to match your description. The less than and greater than characters tend to get lost because those characters have special meaning in HTML. != is a good substitute for <> or you can use (without spaces):
& lt; (for <)
& gt; (for >)
I am wondering if this part of the SQL statement could cause problems if the data is not added to the table blocks in a completely sequential order possibly due to the amount of free space remaining in the table’s blocks when the new rows are added:
The current output of your SQL statement shows the expected result.
Now let’s change the order of the rows in the table’s blocks:
The output of your SQL statement:
What you posted is an interesting approach to the problem – I believe that your query can be fixed by removing the ROWNUM from “(select rownum RN, T2.* from T2)”, sliding that into an inline view with an ORDER BY clause, and adding ROWNUM outside the inline view. I have NOT attempted this change in your SQL statement, but I suspect that it will work.
Thank you for supplying the link to the AskTom thread. I need to find some time to take a look at that thread.
Thank you, Charles. This is poetic justice: I have recently pointed out the lack of ORDER BY in other queries, and in my turn I am guilty of disorderly conduct 🙂 Here is a corrected (and perhaps correct) version:
select max(C2) C2, SUM(DECODE(PASS,1,DECODE(RN,1,0,C2),-C2)) D from
( select level PASS from DUAL connect by level <= 2 ),
( select rownum RN, C1, C2 from ( select * from T2 order by C1 ) )
group by RN+PASS
having SUM(PASS) != 2
order by max(C1);
Stew,
Nice solution that produces the expected output (with the out-of-order inserted rows):
It appears that a small adjustment to the above SQL creates another possible solution (this is just a small change, and should not be considered a unique solution to the problem):