SQL – Methods of Reformatting into Equivalent Forms 2

2 12 2009

December 2, 2009

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

A couple of years ago someone asked in the comp.databases.oracle.misc Usenet group the following question:
http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/576ea61b1a93469b

I need to write a query that returns only the students that have read all books by an author.  I have these tables set up so far….

create table Books (
        BookTitle varchar2(20) PRIMARY KEY,
        author varchar2(20)
);

create table BookCamp (
        MemberName varchar2(20),
        BookTitle varchar2(20),
        CONSTRAINT fk_BookTitle
          FOREIGN KEY (BookTitle)
          REFERENCES Books(BookTitle)
);

insert into Books values ('Psycho', 'Brian');
insert into Books values ('Happy Rotter', 'Rocksteady');
insert into Books values ('Goblet', 'J.K Rowling');
insert into Books values ('Prisoner', 'J.K Rowling');
insert into BookCamp values ('Bob', 'Psycho');
insert into BookCamp values ('Chuck', 'Goblet');
insert into BookCamp values ('Chuck', 'Prisoner');
insert into BookCamp values ('Mike', 'Psycho');
insert into BookCamp values ('Mike', 'Goblet');
insert into BookCamp values ('Mike', 'Prisoner');
insert into BookCamp values ('Mary', 'Goblet');

So basically, if I inputted “J.K Rowling” the names “Chuck” and “Mike” should come up.  If the author is “Brian” then the names “Bob” and “Mike” should come up.  I’ve tried several things like… select membername from BookCamp where BookTitle in(select BookTitle from Books where (author = ‘J.K Rowling’)); but this obviously isn’t quite there….Any Help?

This message thread, like several others, generated suggestions from several people.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko suggested the following:

SQL> SELECT DISTINCT membername
   2  FROM bookcamp bc
   3  WHERE NOT EXISTS (
   4    SELECT NULL
   5    FROM bookcamp bc1
   6    PARTITION BY (membername)
   7    RIGHT OUTER JOIN books b
   8    ON (bc1.booktitle=b.booktitle)
   9    WHERE b.author='J.K Rowling'
  10    AND bc.membername=bc1.membername AND bc1.booktitle IS NULL)
  11  /

MEMBERNAME
--------------------
Chuck
Mike

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I then offered the following:

Maxim has provided a solution that produces the desired list.  Let’s see if we can develop another method to solve this problem.  First, a simple experiment using the analytical version of COUNT:

SELECT
  B.BOOKTITLE,
  B.AUTHOR,
  COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) NUM_BOOKS
FROM
  BOOKS B
WHERE
  B.AUTHOR='J.K Rowling';

BOOKTITLE            AUTHOR                NUM_BOOKS
-------------------- -------------------- ----------
Prisoner             J.K Rowling                   2
Goblet               J.K Rowling                   2

Not too impressive yet, but let’s add in the second table:

SELECT
  BC.MEMBERNAME,
  B.BOOKTITLE,
  B.AUTHOR,
  COUNT(DISTINCT B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) AUTHOR_NUM_BOOKS,
  COUNT(BC.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME, B.AUTHOR) MEMBER_NUM_BOOKS
FROM
  BOOKS B,
  BOOKCAMP BC
WHERE
  B.AUTHOR='J.K Rowling'
  AND B.BOOKTITLE=BC.BOOKTITLE(+);

Note that I changed the original COUNT(B.BOOKTITLE) to COUNT(DISTINCT B.BOOKTITLE) and changed the alias to AUTHOR_NUM_BOOKS:

MEMBERNAME           AUTHOR_NUM_BOOKS MEMBER_NUM_BOOKS
-------------------- ---------------- ----------------
Chuck                               2                2
Mary                                2                1
Mike                                2                2
Mike                                2                2
Chuck                               2                2

Now, we need a way to first eliminate all rows where AUTHOR_NUM_BOOKS is not equal to MEMBER_NUM_BOOKS, and then return a list of names without duplicates.  This can be accomplished by sliding the above SQL statement into an inline view:

SELECT DISTINCT
  MEMBERNAME
FROM
  (SELECT
    BC.MEMBERNAME,
    COUNT(DISTINCT B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR)
AUTHOR_NUM_BOOKS,
    COUNT(BC.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME, B.AUTHOR)
MEMBER_NUM_BOOKS
  FROM
    BOOKS B,
    BOOKCAMP BC
  WHERE
    B.AUTHOR='J.K Rowling'
    AND B.BOOKTITLE=BC.BOOKTITLE(+))
WHERE
  AUTHOR_NUM_BOOKS=MEMBER_NUM_BOOKS;

MEMBERNAME
--------------------
Chuck
Mike

Let’s try again, this time without analytical functions.  First, let’s find out how many of the author’s books were read by each membername:

SELECT
  BC.MEMBERNAME,
  B.AUTHOR,
  COUNT(*) MEMBER_NUM_BOOKS
FROM
  BOOKS B,
  BOOKCAMP BC
WHERE
  B.AUTHOR='J.K Rowling'
  AND BC.BOOKTITLE=B.BOOKTITLE
GROUP BY
  BC.MEMBERNAME,
  B.AUTHOR;

MEMBERNAME           AUTHOR               MEMBER_NUM_BOOKS
-------------------- -------------------- ----------------
Mike                 J.K Rowling                         2
Chuck                J.K Rowling                         2
Mary                 J.K Rowling                         1

Now, let’s determine the number of books written by each author:

SELECT
  AUTHOR,
  COUNT(*) AUTHOR_NUM_BOOKS
FROM
  BOOKS
GROUP BY
  AUTHOR;

AUTHOR               AUTHOR_NUM_BOOKS
-------------------- ----------------
Rocksteady                          1
Brian                               1
J.K Rowling                         2

Let’s put each into an inline view and pull out the membernames of interest:

SELECT DISTINCT
  BC.MEMBERNAME
FROM
  (SELECT
    BC.MEMBERNAME,
    B.AUTHOR,
    COUNT(*) MEMBER_NUM_BOOKS
  FROM
    BOOKS B,
    BOOKCAMP BC
  WHERE
    B.AUTHOR='J.K Rowling'
    AND BC.BOOKTITLE=B.BOOKTITLE
  GROUP BY
    BC.MEMBERNAME,
    B.AUTHOR) BC,
  (SELECT
    AUTHOR,
    COUNT(*) AUTHOR_NUM_BOOKS
  FROM
    BOOKS
  GROUP BY
    AUTHOR) B
WHERE
  B.AUTHOR=BC.AUTHOR
  AND B.AUTHOR_NUM_BOOKS=BC.MEMBER_NUM_BOOKS;

MEMBERNAME
--------------------
Chuck
Mike

Let’s try one more time, A simple starting point:

SELECT
  BC.MEMBERNAME,
  COUNT(BC.BOOKTITLE)
FROM
  BOOKS B,
  BOOKCAMP BC
WHERE
  B.AUTHOR='J.K Rowling'
  AND B.BOOKTITLE=BC.BOOKTITLE(+)
GROUP BY
  BC.MEMBERNAME;

MEMBERNAME           COUNT(BC.BOOKTITLE)
-------------------- -------------------
Chuck                                  2
Mary                                   1
Mike                                   2

[The above does not need to be an outer join]
Now, let’s add an inline view to retrieve the total number of books written by the author:

SELECT
  BC.MEMBERNAME,
  COUNT(BC.BOOKTITLE) MEMBER_NUM_BOOKS,
  NB.AUTHOR_NUM_BOOKS
FROM
  BOOKS B,
  BOOKCAMP BC,
  (SELECT
    COUNT(*) AUTHOR_NUM_BOOKS
  FROM
    BOOKS
  WHERE
    AUTHOR='J.K Rowling') NB
WHERE
  B.AUTHOR='J.K Rowling'
  AND B.BOOKTITLE=BC.BOOKTITLE(+)
GROUP BY
  BC.MEMBERNAME,
  NB.AUTHOR_NUM_BOOKS;

MEMBERNAME           MEMBER_NUM_BOOKS AUTHOR_NUM_BOOKS
-------------------- ---------------- ----------------
Chuck                               2                2
Mike                                2                2
Mary                                1                2

The final clean up is accomplished with a HAVING clause:

 SELECT
  BC.MEMBERNAME
FROM
  BOOKS B,
  BOOKCAMP BC,
  (SELECT
    COUNT(*) AUTHOR_NUM_BOOKS
  FROM
    BOOKS
  WHERE
    AUTHOR='J.K Rowling') NB
WHERE
  B.AUTHOR='J.K Rowling'
  AND B.BOOKTITLE=BC.BOOKTITLE(+)
GROUP BY
  BC.MEMBERNAME,
  NB.AUTHOR_NUM_BOOKS
HAVING
  COUNT(BC.BOOKTITLE)=NB.AUTHOR_NUM_BOOKS;

MEMBERNAME
--------------------
Chuck
Mike

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko then suggested the following:

Just for fun, yet another one:

SELECT MEMBERNAME
FROM   (SELECT B.MEMBERNAME,CAST(COLLECT(booktitle) AS
SYS.dbms_debug_vc2coll) BOOKLIST
         FROM   BOOKCAMP B
         GROUP  BY MEMBERNAME) M,
        (SELECT AUTHOR,CAST(COLLECT(booktitle) AS
SYS.dbms_debug_vc2coll) BOOKLIST
         FROM   BOOKS B
         GROUP  BY AUTHOR) A
WHERE  A.BOOKLIST SUBMULTISET OF M.BOOKLIST
AND    AUTHOR = 'J.K Rowling'
/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I then offered the following:

Let’s see if there is another way – caution, this might be inefficient:
The starting point:

SELECT
  B.AUTHOR,
  B.BOOKTITLE,
  ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE)
BOOK_NUM,
  COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT
FROM
  BOOKS B
WHERE
  AUTHOR='J.K Rowling';

AUTHOR               BOOKTITLE              BOOK_NUM BOOK_COUNT
-------------------- -------------------- ---------- ----------
J.K Rowling          Goblet                        1          2
J.K Rowling          Prisoner                      2          2

Now, let’s put the book list into a comma separated list:

SELECT
  SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST
FROM
  (SELECT
    B.AUTHOR,
    B.BOOKTITLE,
    ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
    COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT
  FROM
    BOOKS B
  WHERE
    B.AUTHOR='J.K Rowling')
WHERE
  BOOK_NUM=BOOK_COUNT
CONNECT BY PRIOR
  BOOK_NUM=BOOK_NUM-1
START WITH
  BOOK_NUM=1;

BOOK_LIST
---------------
Goblet,Prisoner

We are now half way done.  Prepare to do the same with the BOOKCAMP
table:

SELECT
  BC.MEMBERNAME,
  BC.BOOKTITLE,
  ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
  COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT
FROM
  BOOKS B,
  BOOKCAMP BC
WHERE
  B.AUTHOR='J.K Rowling'
  AND B.BOOKTITLE=BC.BOOKTITLE;

MEMBERNAME           BOOKTITLE              BOOK_NUM BOOK_COUNT
-------------------- -------------------- ---------- ----------
Chuck                Goblet                        1          2
Chuck                Prisoner                      2          2
Mary                 Goblet                        1          1
Mike                 Goblet                        1          2
Mike                 Prisoner                      2          2

Generate a comma separated list for each MEMBERNAME:

SELECT
  MEMBERNAME,
  SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST
FROM
  (SELECT
    BC.MEMBERNAME,
    BC.BOOKTITLE,
    ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
    COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT
  FROM
    BOOKS B,
    BOOKCAMP BC
  WHERE
    B.AUTHOR='J.K Rowling'
    AND B.BOOKTITLE=BC.BOOKTITLE)
WHERE
  BOOK_NUM=BOOK_COUNT
CONNECT BY PRIOR
  (MEMBERNAME||TO_CHAR(BOOK_NUM))=(MEMBERNAME||TO_CHAR(BOOK_NUM-1))
START WITH
  BOOK_NUM=1;

MEMBERNAME  BOOK_LIST
--------------------
Chuck       Goblet,Prisoner
Mary        Goblet
Mike        Goblet,Prisoner

Now, let’s put it all together to see where the author book list matches the MEMBERNAME book lists:

SELECT
  BC.MEMBERNAME
FROM
(SELECT
  SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST
FROM
  (SELECT
    B.AUTHOR,
    B.BOOKTITLE,
    ROW_NUMBER() OVER (PARTITION BY B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
    COUNT(B.BOOKTITLE) OVER (PARTITION BY B.AUTHOR) BOOK_COUNT
  FROM
    BOOKS B
  WHERE
    B.AUTHOR='J.K Rowling')
WHERE
  BOOK_NUM=BOOK_COUNT
CONNECT BY PRIOR
  BOOK_NUM=BOOK_NUM-1
START WITH
  BOOK_NUM=1) B,
(SELECT
  MEMBERNAME,
  SUBSTR(SYS_CONNECT_BY_PATH(BOOKTITLE,','),2) BOOK_LIST
FROM
  (SELECT
    BC.MEMBERNAME,
    BC.BOOKTITLE,
    ROW_NUMBER() OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR ORDER BY B.BOOKTITLE) BOOK_NUM,
    COUNT(B.BOOKTITLE) OVER (PARTITION BY BC.MEMBERNAME,B.AUTHOR) BOOK_COUNT
  FROM
    BOOKS B,
    BOOKCAMP BC
  WHERE
    B.AUTHOR='J.K Rowling'
    AND B.BOOKTITLE=BC.BOOKTITLE)
WHERE
  BOOK_NUM=BOOK_COUNT
CONNECT BY PRIOR
  (MEMBERNAME||TO_CHAR(BOOK_NUM))=(MEMBERNAME||TO_CHAR(BOOK_NUM-1))
START WITH
  BOOK_NUM=1) BC
WHERE
  B.BOOK_LIST=BC.BOOK_LIST;

MEMBERNAME
--------------------
Chuck
Mike

Oddly, the above executes much faster than the CAST(COLLECT(booktitle) AS SYS.dbms_debug_vc2coll) solution.  Maybe the dataset size should be increased, and the OP should post the performance results of each method to see how the first two solutions compare with the others.  I think that it would be interesting to see if the CAST(COLLECT(booktitle) AS SYS.dbms_debug_vc2coll) method scales better than the other methods.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko then suggested the following:

This ties to the problem of set comparisons in sql, which i believe ( i don’t mean multiset operations) can’t be effectively solved in pure sql.
Yet one approach (borrowed from
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1…)

SELECT DISTINCT MEMBERNAME
FROM   (SELECT BC.*,
                B.*,
                SUM(B_RNK) OVER(PARTITION BY MEMBERNAME, AUTHOR) M_RNK
         FROM   BOOKCAMP BC,
                (SELECT B.*, SUM(B_RNK) OVER(PARTITION BY AUTHOR) A_RNK
                 FROM   (SELECT B.*,
                                POWER(2,
                                      DENSE_RANK() OVER(ORDER BY BOOKTITLE) - 1) B_RNK
                         FROM   BOOKS B) B) B
         WHERE  BC.BOOKTITLE = B.BOOKTITLE)
WHERE  AUTHOR = 'J.K Rowling'
AND    A_RNK = M_RNK
/

however, it’ll have its limitations too ( and on really big sets – bigger than 1000 members) – i think, all suggested solutions will not perform very well. For middle sized sets ( where the complete resultsets will fit into pga) – the best performance i saw until now ( for similar tasks) – has the model clause.


Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 147 other followers

%d bloggers like this: