Nice example.

This line:

sum(val) over(order by id2) sum_range,

May be replaced with this line without changing the output (which appears to confirm that “range between unbounded preceding and current row” is the default windowing clause):

sum(val) over(order by id2 range between unbounded preceding and current row) sum_range,

I actually considered the possibility that adding ROWNUM to the ORDER BY clause (in my previous example) might cause some rows in the running sum to appear out of sequential order… unfortunately, I did not think about that possible problem until 3 or 4 hours after my comment was posted.

Thank you for your participation in this blog article.

]]>You have generated duplicate values and calculated running sum is suddenly not running but jumping. This is what happen, when more than one row are placed in the same window – due to equality in order by expression. That means, more than one record have the same value calculated for analytical sum function and the runnig sum is no more strictly increasing. You can do it strictly increasing again, if you compress your window to exactly one row. But then, you have a problem by non unique sorting key – which row gets first in its window? Well, in many cases, you can simply say – i don’t care, that is not so important, the only running sum matter. Then, you have two possibilities –

1) introduce pseudo uniqueness by means of rownum, as you suggested.

Here, you should be aware, that rownum will be arbitrary assigned , so you may get different results at different times/sessions

2) make use of physical offsets – here, you don’t have to fake the uniqueness – you get it for free, because, here, window for every row is different (the same non-determinism still apply ).

Consider, this minimalistic example:

SQL> with t as ( 2 select 1 id, 1 id2, 1 val from dual union all 3 select 2,1,2 from dual union all 4 select 3,2,3 from dual 5 ) 6 select id, 7 sum(val) over(order by id2) sum_range, 8 sum(val) over(order by id2,rownum) sum_range_pseudo, 9 sum(val) over(order by id2 rows between unbounded preceding and current row) sum_rows 10 from t 11 / ID SUM_RANGE SUM_RANGE_PSEUDO SUM_ROWS ---------- ---------- ---------------- ---------- 1 3 1 1 2 3 3 3 3 6 6 6

I personally prefer the 2nd semantics, as here, you are expressing more clear, what you intend to do.

Best regards

]]>Your comment in quotes differed slightly from my understanding. However, after performing the tests suggested by Jonathan, I now understand what you wrote – and my understanding has now changed as a result. Thank you for pointing out that the function’s behavior changes when the value in the ORDER BY changes from one row to the next.

I thought that introducing an ORDER BY into an analytic SUM function was a good method to generate a running SUM, but I see now that I need to make certain that the columns in the ORDER BY clause must result in a unique value for each row.

This:

SELECT C3, C1, SUM(C1) OVER (PARTITION BY C3 ORDER BY C1) SUM FROM T1 ORDER BY C3, C1;

Actually needs to be rewritten into something like this:

SELECT C3, C1, SUM(C1) OVER (PARTITION BY C3 ORDER BY C1, ROWNUM) SUM FROM T1 ORDER BY C3, C1;]]>

Nice observation.

I was about to make the same comment regarding “windowing” and “order by”; but I hadn’t got as far as realising that the comment *“Whenever the order_by_clause results in identical values for multiple rows, the function returns the same result for each of those rows”* could by used to justify the behaviour in the case of the missing order_by_clause. It’s not what you’d call intuitively obvious, of course – but two further tests with Charles’ data help things along:

a) put “order by 1″ into the over() clause of the second – and it doesn’t fail.

b) put “insert into t1 select * from t1;” into the first test and notice how the results “double up” (in more ways than one ;)

“Whenever the order_by_clause results in identical values for multiple rows, the function returns the same result for each of those rows”. Or, did i misunderstood your comment?

Best regards

Maxim

]]>I am suggesting that the Oracle documentation *might* be incorrect. :-)

Consider this test table:

CREATE TABLE T1 AS SELECT ROWNUM C1, 10000-ROWNUM C2, TO_CHAR(TRUNC(SYSDATE+ROWNUM),'DAY') C3, LPAD('A',100,'A') PADDING FROM DUAL CONNECT BY LEVEL<=100; EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1')

And these test SQL statements:

SELECT C3, C1, SUM(C1) OVER (PARTITION BY C3 ORDER BY C1) SUM FROM T1 ORDER BY C3, C1;

SELECT C3, C1, SUM(C1) OVER (PARTITION BY C3) SUM FROM T1 ORDER BY C3, C1;

I saw something a couple of years ago, I do not remember if it was something that you showed me in one of the Oracle Usenet news groups, or if I saw it somewhere else. One of the above two SQL statements suggests that the Oracle documentation is correct, while the other suggests that the Oracle documentation is wrong.

Happy new year to you too!

]]>It seems, the question you like to ask, is “rows” correct, is “range” correct, and isn’t that the same thing anyway ? In my opinion, this time the documentation quote is correct, the quote from the book is wrong and yes, “range” and “rows” mean very different things (like logical offsets vs physical offsets, deterministic results vs nondeterministic results etc) Alex Nuijten asked last year in his blog exactly the same question – what is the default windowing clause

http://nuijten.blogspot.com/2010/06/last-tuesday-we-had-odtug-preview-mini.html#more

i think, the example from my comment can be used to illustrate, which quote is correct.

And btw,

all the best in the new year!