Analytic Functions – What is Wrong with this Statement?

1 01 2011

January 1, 2011

I was a bit excited to see the chapter discussing analytic functions in the book “Pro Oracle SQL”, which has a rather extensive coverage of most of Oracle Database’s analytic functions (something that I have not seen from other SQL books).  That chapter is very well assembled, with easier to understand descriptions of the various functions than what is found in the Oracle documentation.

However, there is one particular statement in the chapter that made be stop and think for a moment, and then ask “What, if anything, is wrong with this quote” from page 227 of the book:

“The default windowing clause [of analytic functions that support the windowing clause] is rows between unbounded preceding and current row. If you do not specify a window, you’ll get the default window. It is a good approach to specify this clause explicitly to avoid ambiguities.”

Before you answer, check the Oracle Database 11.2 documentation:

 “If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.”

The Oracle Database documentation is stating essentially the same thing as the book. 

What, if anything, is wrong with this quote (or the quote from the documentation)?


Actions

Information

7 responses

1 01 2011
Maxim

Well, you did it very hard to spot – the difference between range and rows ;-).
It seems, the question you like to ask, is “rows” correct, is “range” correct, and isn’t that the same thing anyway ? In my opinion, this time the documentation quote is correct, the quote from the book is wrong and yes, “range” and “rows” mean very different things (like logical offsets vs physical offsets, deterministic results vs nondeterministic results etc) Alex Nuijten asked last year in his blog exactly the same question – what is the default windowing clause

http://nuijten.blogspot.com/2010/06/last-tuesday-we-had-odtug-preview-mini.html#more

i think, the example from my comment can be used to illustrate, which quote is correct.

And btw,
all the best in the new year!

1 01 2011
Charles Hooper

Hi Maxim,

I am suggesting that the Oracle documentation *might* be incorrect. :-)

Consider this test table:

CREATE TABLE T1 AS
SELECT
  ROWNUM C1,
  10000-ROWNUM C2,
  TO_CHAR(TRUNC(SYSDATE+ROWNUM),'DAY') C3,
  LPAD('A',100,'A') PADDING
FROM
  DUAL
CONNECT BY
  LEVEL<=100;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1')

And these test SQL statements:

SELECT
  C3,
  C1,
  SUM(C1) OVER (PARTITION BY C3 ORDER BY C1) SUM
FROM
  T1
ORDER BY
  C3,
  C1;
SELECT
  C3,
  C1,
  SUM(C1) OVER (PARTITION BY C3) SUM
FROM
  T1
ORDER BY
  C3,
  C1;

I saw something a couple of years ago, I do not remember if it was something that you showed me in one of the Oracle Usenet news groups, or if I saw it somewhere else. One of the above two SQL statements suggests that the Oracle documentation is correct, while the other suggests that the Oracle documentation is wrong.

Happy new year to you too!

1 01 2011
Maxim

If you mean the second query, i don’t think, it contradicts documentation. The windowing clause ( and therefore – default windowing clause) is applicable only with order by clause. If you omit order by clause entirely, the function will be calculated on the entire rowset in the partition. Maybe, this quote can be used here:
“Whenever the order_by_clause results in identical values for multiple rows, the function returns the same result for each of those rows”. Or, did i misunderstood your comment?

Best regards

Maxim

1 01 2011
Charles Hooper

Maxim,

Your comment in quotes differed slightly from my understanding. However, after performing the tests suggested by Jonathan, I now understand what you wrote – and my understanding has now changed as a result. Thank you for pointing out that the function’s behavior changes when the value in the ORDER BY changes from one row to the next.

I thought that introducing an ORDER BY into an analytic SUM function was a good method to generate a running SUM, but I see now that I need to make certain that the columns in the ORDER BY clause must result in a unique value for each row.
This:

SELECT
  C3,
  C1,
  SUM(C1) OVER (PARTITION BY C3 ORDER BY C1) SUM
FROM
  T1
ORDER BY
  C3,
  C1;

Actually needs to be rewritten into something like this:

SELECT
  C3,
  C1,
  SUM(C1) OVER (PARTITION BY C3 ORDER BY C1, ROWNUM) SUM
FROM
  T1
ORDER BY
  C3,
  C1;
2 01 2011
Maxim

Charles, i think, we have still talk about semantics ;-)
You have generated duplicate values and calculated running sum is suddenly not running but jumping. This is what happen, when more than one row are placed in the same window – due to equality in order by expression. That means, more than one record have the same value calculated for analytical sum function and the runnig sum is no more strictly increasing. You can do it strictly increasing again, if you compress your window to exactly one row. But then, you have a problem by non unique sorting key – which row gets first in its window? Well, in many cases, you can simply say – i don’t care, that is not so important, the only running sum matter. Then, you have two possibilities –
1) introduce pseudo uniqueness by means of rownum, as you suggested.
Here, you should be aware, that rownum will be arbitrary assigned , so you may get different results at different times/sessions
2) make use of physical offsets – here, you don’t have to fake the uniqueness – you get it for free, because, here, window for every row is different (the same non-determinism still apply ).
Consider, this minimalistic example:

SQL> with t as (
  2   select 1 id, 1 id2, 1 val from dual union all
  3   select 2,1,2 from dual union all
  4   select 3,2,3 from dual
  5  )
  6  select id,
  7  sum(val) over(order by id2) sum_range,
  8  sum(val) over(order by id2,rownum) sum_range_pseudo,
  9  sum(val) over(order by id2 rows between unbounded preceding and current row) sum_rows
 10  from t
 11  /

        ID  SUM_RANGE SUM_RANGE_PSEUDO   SUM_ROWS
---------- ---------- ---------------- ----------
         1          3                1          1
         2          3                3          3
         3          6                6          6

I personally prefer the 2nd semantics, as here, you are expressing more clear, what you intend to do.

Best regards

2 01 2011
Charles Hooper

Maxim,

Nice example.

This line:

sum(val) over(order by id2) sum_range,

May be replaced with this line without changing the output (which appears to confirm that “range between unbounded preceding and current row” is the default windowing clause):

sum(val) over(order by id2 range between unbounded preceding and current row) sum_range,

I actually considered the possibility that adding ROWNUM to the ORDER BY clause (in my previous example) might cause some rows in the running sum to appear out of sequential order… unfortunately, I did not think about that possible problem until 3 or 4 hours after my comment was posted.

Thank you for your participation in this blog article.

1 01 2011
Jonathan Lewis

Maxim,

Nice observation.

I was about to make the same comment regarding “windowing” and “order by”; but I hadn’t got as far as realising that the comment “Whenever the order_by_clause results in identical values for multiple rows, the function returns the same result for each of those rows” could by used to justify the behaviour in the case of the missing order_by_clause. It’s not what you’d call intuitively obvious, of course – but two further tests with Charles’ data help things along:
a) put “order by 1″ into the over() clause of the second – and it doesn’t fail.
b) put “insert into t1 select * from t1;” into the first test and notice how the results “double up” (in more ways than one ;)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 141 other followers

%d bloggers like this: