December 2, 2009
(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)
A couple of years ago someone asked in the comp.databases.oracle.misc Usenet group the following question:
http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/576ea61b1a93469b
I need to write a query that returns only the students that have read all books by an author. I have these tables set up so far….
create table Books ( BookTitle varchar2(20) PRIMARY KEY, author varchar2(20) ); create table BookCamp ( MemberName varchar2(20), BookTitle varchar2(20), CONSTRAINT fk_BookTitle FOREIGN KEY (BookTitle) REFERENCES Books(BookTitle) ); insert into Books values ('Psycho', 'Brian'); insert into Books values ('Happy Rotter', 'Rocksteady'); insert into Books values ('Goblet', 'J.K Rowling'); insert into Books values ('Prisoner', 'J.K Rowling'); insert into BookCamp values ('Bob', 'Psycho'); insert into BookCamp values ('Chuck', 'Goblet'); insert into BookCamp values ('Chuck', 'Prisoner'); insert into BookCamp values ('Mike', 'Psycho'); insert into BookCamp values ('Mike', 'Goblet'); insert into BookCamp values ('Mike', 'Prisoner'); insert into BookCamp values ('Mary', 'Goblet');So basically, if I inputted “J.K Rowling” the names “Chuck” and “Mike” should come up. If the author is “Brian” then the names “Bob” and “Mike” should come up. I’ve tried several things like… select membername from BookCamp where BookTitle in(select BookTitle from Books where (author = ‘J.K Rowling’)); but this obviously isn’t quite there….Any Help?
This message thread, like several others, generated suggestions from several people.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maxim Demenko suggested the following:
SQL> SELECT DISTINCT membername 2 FROM bookcamp bc 3 WHERE NOT EXISTS ( 4 SELECT NULL 5 FROM bookcamp bc1 6 PARTITION BY (membername) 7 RIGHT OUTER JOIN books b 8 ON (bc1.booktitle=b.booktitle) 9 WHERE b.author='J.K Rowling' 10 AND bc.membername=bc1.membername AND bc1.booktitle IS NULL) 11 / MEMBERNAME -------------------- Chuck Mike
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I then offered the following:
Maxim has provided a solution that produces the desired list. Let’s see if we can develop another method to solve this problem. First, a simple experiment using the analytical version of COUNT:
SELECT B.BOOKTITLE, B.AUTHOR, COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) NUM_BOOKS FROM BOOKS B WHERE B.AUTHOR='J.K Rowling'; BOOKTITLE AUTHOR NUM_BOOKS -------------------- -------------------- ---------- Prisoner J.K Rowling 2 Goblet J.K Rowling 2
Not too impressive yet, but let’s add in the second table:
SELECT BC.MEMBERNAME, B.BOOKTITLE, B.AUTHOR, COUNT(DISTINCT B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) AUTHOR_NUM_BOOKS, COUNT(BC.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME, B.AUTHOR) MEMBER_NUM_BOOKS FROM BOOKS B, BOOKCAMP BC WHERE B.AUTHOR='J.K Rowling' AND B.BOOKTITLE=BC.BOOKTITLE(+);
Note that I changed the original COUNT(B.BOOKTITLE) to COUNT(DISTINCT B.BOOKTITLE) and changed the alias to AUTHOR_NUM_BOOKS:
MEMBERNAME AUTHOR_NUM_BOOKS MEMBER_NUM_BOOKS -------------------- ---------------- ---------------- Chuck 2 2 Mary 2 1 Mike 2 2 Mike 2 2 Chuck 2 2
Now, we need a way to first eliminate all rows where AUTHOR_NUM_BOOKS is not equal to MEMBER_NUM_BOOKS, and then return a list of names without duplicates. This can be accomplished by sliding the above SQL statement into an inline view:
SELECT DISTINCT MEMBERNAME FROM (SELECT BC.MEMBERNAME, COUNT(DISTINCT B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) AUTHOR_NUM_BOOKS, COUNT(BC.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME, B.AUTHOR) MEMBER_NUM_BOOKS FROM BOOKS B, BOOKCAMP BC WHERE B.AUTHOR='J.K Rowling' AND B.BOOKTITLE=BC.BOOKTITLE(+)) WHERE AUTHOR_NUM_BOOKS=MEMBER_NUM_BOOKS; MEMBERNAME -------------------- Chuck Mike
Let’s try again, this time without analytical functions. First, let’s find out how many of the author’s books were read by each membername:
SELECT BC.MEMBERNAME, B.AUTHOR, COUNT(*) MEMBER_NUM_BOOKS FROM BOOKS B, BOOKCAMP BC WHERE B.AUTHOR='J.K Rowling' AND BC.BOOKTITLE=B.BOOKTITLE GROUP BY BC.MEMBERNAME, B.AUTHOR; MEMBERNAME AUTHOR MEMBER_NUM_BOOKS -------------------- -------------------- ---------------- Mike J.K Rowling 2 Chuck J.K Rowling 2 Mary J.K Rowling 1
Now, let’s determine the number of books written by each author:
SELECT AUTHOR, COUNT(*) AUTHOR_NUM_BOOKS FROM BOOKS GROUP BY AUTHOR; AUTHOR AUTHOR_NUM_BOOKS -------------------- ---------------- Rocksteady 1 Brian 1 J.K Rowling 2
Let’s put each into an inline view and pull out the membernames of interest:
SELECT DISTINCT BC.MEMBERNAME FROM (SELECT BC.MEMBERNAME, B.AUTHOR, COUNT(*) MEMBER_NUM_BOOKS FROM BOOKS B, BOOKCAMP BC WHERE B.AUTHOR='J.K Rowling' AND BC.BOOKTITLE=B.BOOKTITLE GROUP BY BC.MEMBERNAME, B.AUTHOR) BC, (SELECT AUTHOR, COUNT(*) AUTHOR_NUM_BOOKS FROM BOOKS GROUP BY AUTHOR) B WHERE B.AUTHOR=BC.AUTHOR AND B.AUTHOR_NUM_BOOKS=BC.MEMBER_NUM_BOOKS; MEMBERNAME -------------------- Chuck Mike
Let’s try one more time, A simple starting point:
SELECT BC.MEMBERNAME, COUNT(BC.BOOKTITLE) FROM BOOKS B, BOOKCAMP BC WHERE B.AUTHOR='J.K Rowling' AND B.BOOKTITLE=BC.BOOKTITLE(+) GROUP BY BC.MEMBERNAME; MEMBERNAME COUNT(BC.BOOKTITLE) -------------------- ------------------- Chuck 2 Mary 1 Mike 2
[The above does not need to be an outer join]
Now, let’s add an inline view to retrieve the total number of books written by the author:
SELECT BC.MEMBERNAME, COUNT(BC.BOOKTITLE) MEMBER_NUM_BOOKS, NB.AUTHOR_NUM_BOOKS FROM BOOKS B, BOOKCAMP BC, (SELECT COUNT(*) AUTHOR_NUM_BOOKS FROM BOOKS WHERE AUTHOR='J.K Rowling') NB WHERE B.AUTHOR='J.K Rowling' AND B.BOOKTITLE=BC.BOOKTITLE(+) GROUP BY BC.MEMBERNAME, NB.AUTHOR_NUM_BOOKS; MEMBERNAME MEMBER_NUM_BOOKS AUTHOR_NUM_BOOKS -------------------- ---------------- ---------------- Chuck 2 2 Mike 2 2 Mary 1 2
The final clean up is accomplished with a HAVING clause:
SELECT BC.MEMBERNAME FROM BOOKS B, BOOKCAMP BC, (SELECT COUNT(*) AUTHOR_NUM_BOOKS FROM BOOKS WHERE AUTHOR='J.K Rowling') NB WHERE B.AUTHOR='J.K Rowling' AND B.BOOKTITLE=BC.BOOKTITLE(+) GROUP BY BC.MEMBERNAME, NB.AUTHOR_NUM_BOOKS HAVING COUNT(BC.BOOKTITLE)=NB.AUTHOR_NUM_BOOKS; MEMBERNAME -------------------- Chuck Mike
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maxim Demenko then suggested the following:
Just for fun, yet another one:
SELECT MEMBERNAME FROM (SELECT B.MEMBERNAME,CAST(COLLECT(booktitle) AS SYS.dbms_debug_vc2coll) BOOKLIST FROM BOOKCAMP B GROUP BY MEMBERNAME) M, (SELECT AUTHOR,CAST(COLLECT(booktitle) AS SYS.dbms_debug_vc2coll) BOOKLIST FROM BOOKS B GROUP BY AUTHOR) A WHERE A.BOOKLIST SUBMULTISET OF M.BOOKLIST AND AUTHOR = 'J.K Rowling' /
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I then offered the following:
Let’s see if there is another way – caution, this might be inefficient:
The starting point:
SELECT B.AUTHOR, B.BOOKTITLE, ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM, COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT FROM BOOKS B WHERE AUTHOR='J.K Rowling'; AUTHOR BOOKTITLE BOOK_NUM BOOK_COUNT -------------------- -------------------- ---------- ---------- J.K Rowling Goblet 1 2 J.K Rowling Prisoner 2 2
Now, let’s put the book list into a comma separated list:
SELECT SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST FROM (SELECT B.AUTHOR, B.BOOKTITLE, ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM, COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT FROM BOOKS B WHERE B.AUTHOR='J.K Rowling') WHERE BOOK_NUM=BOOK_COUNT CONNECT BY PRIOR BOOK_NUM=BOOK_NUM-1 START WITH BOOK_NUM=1; BOOK_LIST --------------- Goblet,Prisoner
We are now half way done. Prepare to do the same with the BOOKCAMP
table:
SELECT BC.MEMBERNAME, BC.BOOKTITLE, ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM, COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT FROM BOOKS B, BOOKCAMP BC WHERE B.AUTHOR='J.K Rowling' AND B.BOOKTITLE=BC.BOOKTITLE; MEMBERNAME BOOKTITLE BOOK_NUM BOOK_COUNT -------------------- -------------------- ---------- ---------- Chuck Goblet 1 2 Chuck Prisoner 2 2 Mary Goblet 1 1 Mike Goblet 1 2 Mike Prisoner 2 2
Generate a comma separated list for each MEMBERNAME:
SELECT MEMBERNAME, SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST FROM (SELECT BC.MEMBERNAME, BC.BOOKTITLE, ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM, COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT FROM BOOKS B, BOOKCAMP BC WHERE B.AUTHOR='J.K Rowling' AND B.BOOKTITLE=BC.BOOKTITLE) WHERE BOOK_NUM=BOOK_COUNT CONNECT BY PRIOR (MEMBERNAME||TO_CHAR(BOOK_NUM))=(MEMBERNAME||TO_CHAR(BOOK_NUM-1)) START WITH BOOK_NUM=1; MEMBERNAME BOOK_LIST -------------------- Chuck Goblet,Prisoner Mary Goblet Mike Goblet,Prisoner
Now, let’s put it all together to see where the author book list matches the MEMBERNAME book lists:
SELECT BC.MEMBERNAME FROM (SELECT SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST FROM (SELECT B.AUTHOR, B.BOOKTITLE, ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM, COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT FROM BOOKS B WHERE B.AUTHOR='J.K Rowling') WHERE BOOK_NUM=BOOK_COUNT CONNECT BY PRIOR BOOK_NUM=BOOK_NUM-1 START WITH BOOK_NUM=1) B, (SELECT MEMBERNAME, SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST FROM (SELECT BC.MEMBERNAME, BC.BOOKTITLE, ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM, COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT FROM BOOKS B, BOOKCAMP BC WHERE B.AUTHOR='J.K Rowling' AND B.BOOKTITLE=BC.BOOKTITLE) WHERE BOOK_NUM=BOOK_COUNT CONNECT BY PRIOR (MEMBERNAME||TO_CHAR(BOOK_NUM))=(MEMBERNAME||TO_CHAR(BOOK_NUM-1)) START WITH BOOK_NUM=1) BC WHERE B.BOOK_LIST=BC.BOOK_LIST; MEMBERNAME -------------------- Chuck Mike
Oddly, the above executes much faster than the CAST(COLLECT(booktitle) AS SYS.dbms_debug_vc2coll) solution. Maybe the dataset size should be increased, and the OP should post the performance results of each method to see how the first two solutions compare with the others. I think that it would be interesting to see if the CAST(COLLECT(booktitle) AS SYS.dbms_debug_vc2coll) method scales better than the other methods.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maxim Demenko then suggested the following:
This ties to the problem of set comparisons in sql, which i believe ( i don’t mean multiset operations) can’t be effectively solved in pure sql.
Yet one approach (borrowed from
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1…)
SELECT DISTINCT MEMBERNAME FROM (SELECT BC.*, B.*, SUM(B_RNK) OVER(PARTITION BY MEMBERNAME, AUTHOR) M_RNK FROM BOOKCAMP BC, (SELECT B.*, SUM(B_RNK) OVER(PARTITION BY AUTHOR) A_RNK FROM (SELECT B.*, POWER(2, DENSE_RANK() OVER(ORDER BY BOOKTITLE) - 1) B_RNK FROM BOOKS B) B) B WHERE BC.BOOKTITLE = B.BOOKTITLE) WHERE AUTHOR = 'J.K Rowling' AND A_RNK = M_RNK /
however, it’ll have its limitations too ( and on really big sets – bigger than 1000 members) – i think, all suggested solutions will not perform very well. For middle sized sets ( where the complete resultsets will fit into pga) – the best performance i saw until now ( for similar tasks) – has the model clause.
Leave a Reply