December 2, 2009
(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)
A couple of years ago someone asked in the comp.databases.oracle.misc Usenet group the following question:
http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/576ea61b1a93469b
I need to write a query that returns only the students that have read all books by an author. I have these tables set up so far….
create table Books (
BookTitle varchar2(20) PRIMARY KEY,
author varchar2(20)
);
create table BookCamp (
MemberName varchar2(20),
BookTitle varchar2(20),
CONSTRAINT fk_BookTitle
FOREIGN KEY (BookTitle)
REFERENCES Books(BookTitle)
);
insert into Books values ('Psycho', 'Brian');
insert into Books values ('Happy Rotter', 'Rocksteady');
insert into Books values ('Goblet', 'J.K Rowling');
insert into Books values ('Prisoner', 'J.K Rowling');
insert into BookCamp values ('Bob', 'Psycho');
insert into BookCamp values ('Chuck', 'Goblet');
insert into BookCamp values ('Chuck', 'Prisoner');
insert into BookCamp values ('Mike', 'Psycho');
insert into BookCamp values ('Mike', 'Goblet');
insert into BookCamp values ('Mike', 'Prisoner');
insert into BookCamp values ('Mary', 'Goblet');
So basically, if I inputted “J.K Rowling” the names “Chuck” and “Mike” should come up. If the author is “Brian” then the names “Bob” and “Mike” should come up. I’ve tried several things like… select membername from BookCamp where BookTitle in(select BookTitle from Books where (author = ‘J.K Rowling’)); but this obviously isn’t quite there….Any Help?
This message thread, like several others, generated suggestions from several people.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maxim Demenko suggested the following:
SQL> SELECT DISTINCT membername
2 FROM bookcamp bc
3 WHERE NOT EXISTS (
4 SELECT NULL
5 FROM bookcamp bc1
6 PARTITION BY (membername)
7 RIGHT OUTER JOIN books b
8 ON (bc1.booktitle=b.booktitle)
9 WHERE b.author='J.K Rowling'
10 AND bc.membername=bc1.membername AND bc1.booktitle IS NULL)
11 /
MEMBERNAME
--------------------
Chuck
Mike
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I then offered the following:
Maxim has provided a solution that produces the desired list. Let’s see if we can develop another method to solve this problem. First, a simple experiment using the analytical version of COUNT:
SELECT
B.BOOKTITLE,
B.AUTHOR,
COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) NUM_BOOKS
FROM
BOOKS B
WHERE
B.AUTHOR='J.K Rowling';
BOOKTITLE AUTHOR NUM_BOOKS
-------------------- -------------------- ----------
Prisoner J.K Rowling 2
Goblet J.K Rowling 2
Not too impressive yet, but let’s add in the second table:
SELECT
BC.MEMBERNAME,
B.BOOKTITLE,
B.AUTHOR,
COUNT(DISTINCT B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) AUTHOR_NUM_BOOKS,
COUNT(BC.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME, B.AUTHOR) MEMBER_NUM_BOOKS
FROM
BOOKS B,
BOOKCAMP BC
WHERE
B.AUTHOR='J.K Rowling'
AND B.BOOKTITLE=BC.BOOKTITLE(+);
Note that I changed the original COUNT(B.BOOKTITLE) to COUNT(DISTINCT B.BOOKTITLE) and changed the alias to AUTHOR_NUM_BOOKS:
MEMBERNAME AUTHOR_NUM_BOOKS MEMBER_NUM_BOOKS
-------------------- ---------------- ----------------
Chuck 2 2
Mary 2 1
Mike 2 2
Mike 2 2
Chuck 2 2
Now, we need a way to first eliminate all rows where AUTHOR_NUM_BOOKS is not equal to MEMBER_NUM_BOOKS, and then return a list of names without duplicates. This can be accomplished by sliding the above SQL statement into an inline view:
SELECT DISTINCT
MEMBERNAME
FROM
(SELECT
BC.MEMBERNAME,
COUNT(DISTINCT B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR)
AUTHOR_NUM_BOOKS,
COUNT(BC.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME, B.AUTHOR)
MEMBER_NUM_BOOKS
FROM
BOOKS B,
BOOKCAMP BC
WHERE
B.AUTHOR='J.K Rowling'
AND B.BOOKTITLE=BC.BOOKTITLE(+))
WHERE
AUTHOR_NUM_BOOKS=MEMBER_NUM_BOOKS;
MEMBERNAME
--------------------
Chuck
Mike
Let’s try again, this time without analytical functions. First, let’s find out how many of the author’s books were read by each membername:
SELECT
BC.MEMBERNAME,
B.AUTHOR,
COUNT(*) MEMBER_NUM_BOOKS
FROM
BOOKS B,
BOOKCAMP BC
WHERE
B.AUTHOR='J.K Rowling'
AND BC.BOOKTITLE=B.BOOKTITLE
GROUP BY
BC.MEMBERNAME,
B.AUTHOR;
MEMBERNAME AUTHOR MEMBER_NUM_BOOKS
-------------------- -------------------- ----------------
Mike J.K Rowling 2
Chuck J.K Rowling 2
Mary J.K Rowling 1
Now, let’s determine the number of books written by each author:
SELECT
AUTHOR,
COUNT(*) AUTHOR_NUM_BOOKS
FROM
BOOKS
GROUP BY
AUTHOR;
AUTHOR AUTHOR_NUM_BOOKS
-------------------- ----------------
Rocksteady 1
Brian 1
J.K Rowling 2
Let’s put each into an inline view and pull out the membernames of interest:
SELECT DISTINCT
BC.MEMBERNAME
FROM
(SELECT
BC.MEMBERNAME,
B.AUTHOR,
COUNT(*) MEMBER_NUM_BOOKS
FROM
BOOKS B,
BOOKCAMP BC
WHERE
B.AUTHOR='J.K Rowling'
AND BC.BOOKTITLE=B.BOOKTITLE
GROUP BY
BC.MEMBERNAME,
B.AUTHOR) BC,
(SELECT
AUTHOR,
COUNT(*) AUTHOR_NUM_BOOKS
FROM
BOOKS
GROUP BY
AUTHOR) B
WHERE
B.AUTHOR=BC.AUTHOR
AND B.AUTHOR_NUM_BOOKS=BC.MEMBER_NUM_BOOKS;
MEMBERNAME
--------------------
Chuck
Mike
Let’s try one more time, A simple starting point:
SELECT
BC.MEMBERNAME,
COUNT(BC.BOOKTITLE)
FROM
BOOKS B,
BOOKCAMP BC
WHERE
B.AUTHOR='J.K Rowling'
AND B.BOOKTITLE=BC.BOOKTITLE(+)
GROUP BY
BC.MEMBERNAME;
MEMBERNAME COUNT(BC.BOOKTITLE)
-------------------- -------------------
Chuck 2
Mary 1
Mike 2
[The above does not need to be an outer join]
Now, let’s add an inline view to retrieve the total number of books written by the author:
SELECT
BC.MEMBERNAME,
COUNT(BC.BOOKTITLE) MEMBER_NUM_BOOKS,
NB.AUTHOR_NUM_BOOKS
FROM
BOOKS B,
BOOKCAMP BC,
(SELECT
COUNT(*) AUTHOR_NUM_BOOKS
FROM
BOOKS
WHERE
AUTHOR='J.K Rowling') NB
WHERE
B.AUTHOR='J.K Rowling'
AND B.BOOKTITLE=BC.BOOKTITLE(+)
GROUP BY
BC.MEMBERNAME,
NB.AUTHOR_NUM_BOOKS;
MEMBERNAME MEMBER_NUM_BOOKS AUTHOR_NUM_BOOKS
-------------------- ---------------- ----------------
Chuck 2 2
Mike 2 2
Mary 1 2
The final clean up is accomplished with a HAVING clause:
SELECT
BC.MEMBERNAME
FROM
BOOKS B,
BOOKCAMP BC,
(SELECT
COUNT(*) AUTHOR_NUM_BOOKS
FROM
BOOKS
WHERE
AUTHOR='J.K Rowling') NB
WHERE
B.AUTHOR='J.K Rowling'
AND B.BOOKTITLE=BC.BOOKTITLE(+)
GROUP BY
BC.MEMBERNAME,
NB.AUTHOR_NUM_BOOKS
HAVING
COUNT(BC.BOOKTITLE)=NB.AUTHOR_NUM_BOOKS;
MEMBERNAME
--------------------
Chuck
Mike
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maxim Demenko then suggested the following:
Just for fun, yet another one:
SELECT MEMBERNAME
FROM (SELECT B.MEMBERNAME,CAST(COLLECT(booktitle) AS
SYS.dbms_debug_vc2coll) BOOKLIST
FROM BOOKCAMP B
GROUP BY MEMBERNAME) M,
(SELECT AUTHOR,CAST(COLLECT(booktitle) AS
SYS.dbms_debug_vc2coll) BOOKLIST
FROM BOOKS B
GROUP BY AUTHOR) A
WHERE A.BOOKLIST SUBMULTISET OF M.BOOKLIST
AND AUTHOR = 'J.K Rowling'
/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I then offered the following:
Let’s see if there is another way – caution, this might be inefficient:
The starting point:
SELECT
B.AUTHOR,
B.BOOKTITLE,
ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE)
BOOK_NUM,
COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT
FROM
BOOKS B
WHERE
AUTHOR='J.K Rowling';
AUTHOR BOOKTITLE BOOK_NUM BOOK_COUNT
-------------------- -------------------- ---------- ----------
J.K Rowling Goblet 1 2
J.K Rowling Prisoner 2 2
Now, let’s put the book list into a comma separated list:
SELECT
SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST
FROM
(SELECT
B.AUTHOR,
B.BOOKTITLE,
ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT
FROM
BOOKS B
WHERE
B.AUTHOR='J.K Rowling')
WHERE
BOOK_NUM=BOOK_COUNT
CONNECT BY PRIOR
BOOK_NUM=BOOK_NUM-1
START WITH
BOOK_NUM=1;
BOOK_LIST
---------------
Goblet,Prisoner
We are now half way done. Prepare to do the same with the BOOKCAMP
table:
SELECT
BC.MEMBERNAME,
BC.BOOKTITLE,
ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT
FROM
BOOKS B,
BOOKCAMP BC
WHERE
B.AUTHOR='J.K Rowling'
AND B.BOOKTITLE=BC.BOOKTITLE;
MEMBERNAME BOOKTITLE BOOK_NUM BOOK_COUNT
-------------------- -------------------- ---------- ----------
Chuck Goblet 1 2
Chuck Prisoner 2 2
Mary Goblet 1 1
Mike Goblet 1 2
Mike Prisoner 2 2
Generate a comma separated list for each MEMBERNAME:
SELECT
MEMBERNAME,
SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST
FROM
(SELECT
BC.MEMBERNAME,
BC.BOOKTITLE,
ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT
FROM
BOOKS B,
BOOKCAMP BC
WHERE
B.AUTHOR='J.K Rowling'
AND B.BOOKTITLE=BC.BOOKTITLE)
WHERE
BOOK_NUM=BOOK_COUNT
CONNECT BY PRIOR
(MEMBERNAME||TO_CHAR(BOOK_NUM))=(MEMBERNAME||TO_CHAR(BOOK_NUM-1))
START WITH
BOOK_NUM=1;
MEMBERNAME BOOK_LIST
--------------------
Chuck Goblet,Prisoner
Mary Goblet
Mike Goblet,Prisoner
Now, let’s put it all together to see where the author book list matches the MEMBERNAME book lists:
SELECT
BC.MEMBERNAME
FROM
(SELECT
SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST
FROM
(SELECT
B.AUTHOR,
B.BOOKTITLE,
ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT
FROM
BOOKS B
WHERE
B.AUTHOR='J.K Rowling')
WHERE
BOOK_NUM=BOOK_COUNT
CONNECT BY PRIOR
BOOK_NUM=BOOK_NUM-1
START WITH
BOOK_NUM=1) B,
(SELECT
MEMBERNAME,
SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST
FROM
(SELECT
BC.MEMBERNAME,
BC.BOOKTITLE,
ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT
FROM
BOOKS B,
BOOKCAMP BC
WHERE
B.AUTHOR='J.K Rowling'
AND B.BOOKTITLE=BC.BOOKTITLE)
WHERE
BOOK_NUM=BOOK_COUNT
CONNECT BY PRIOR
(MEMBERNAME||TO_CHAR(BOOK_NUM))=(MEMBERNAME||TO_CHAR(BOOK_NUM-1))
START WITH
BOOK_NUM=1) BC
WHERE
B.BOOK_LIST=BC.BOOK_LIST;
MEMBERNAME
--------------------
Chuck
Mike
Oddly, the above executes much faster than the CAST(COLLECT(booktitle) AS SYS.dbms_debug_vc2coll) solution. Maybe the dataset size should be increased, and the OP should post the performance results of each method to see how the first two solutions compare with the others. I think that it would be interesting to see if the CAST(COLLECT(booktitle) AS SYS.dbms_debug_vc2coll) method scales better than the other methods.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maxim Demenko then suggested the following:
This ties to the problem of set comparisons in sql, which i believe ( i don’t mean multiset operations) can’t be effectively solved in pure sql.
Yet one approach (borrowed from
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1…)
SELECT DISTINCT MEMBERNAME
FROM (SELECT BC.*,
B.*,
SUM(B_RNK) OVER(PARTITION BY MEMBERNAME, AUTHOR) M_RNK
FROM BOOKCAMP BC,
(SELECT B.*, SUM(B_RNK) OVER(PARTITION BY AUTHOR) A_RNK
FROM (SELECT B.*,
POWER(2,
DENSE_RANK() OVER(ORDER BY BOOKTITLE) - 1) B_RNK
FROM BOOKS B) B) B
WHERE BC.BOOKTITLE = B.BOOKTITLE)
WHERE AUTHOR = 'J.K Rowling'
AND A_RNK = M_RNK
/
however, it’ll have its limitations too ( and on really big sets – bigger than 1000 members) – i think, all suggested solutions will not perform very well. For middle sized sets ( where the complete resultsets will fit into pga) – the best performance i saw until now ( for similar tasks) – has the model clause.
Recent Comments