December 26, 2009
A couple years ago the following question appeared on the comp.databases.oracle.misc Usenet group: http://groups.google.com/group/comp.databases.oracle.misc/browse_thread/thread/93bf8d1e75033d4c
I have a book table and in that table it has the book tile, publisher, and type of book it is. example mystery, scifi, etc…
I am trying to write a query that brings back a list of every pair of books that have the same publisher and same book type. I have been able to get the following code to work:
select publisher_code, type from book group by publisher_code, type having count(*) > 1;which returns the following results:
PU TYP -- --- JP MYS LB FIC PE FIC PL FIC ST SFI VB FICI can not figure out how to get the book title and book code for the books that this result list represents, everything I have tried throws out an error.
My initial response follows:
I see two possible methods:
- Slide the SQL statement that you have written into an inline view, join the inline view to your book table, and then use the publisher_code, type columns to drive back into your book table. The join syntax may look like one of the following: (publisher_code, type) IN (SELECT…) or b.publisher_code=ib.publisher_code and b.type=ib.type
- Use analytical functions (COUNT() OVER…) to determine the number of matches for the same publisher_code, type columns. Then slide this SQL statement into an inline view to retrieve only those records with the aliased COUNT() OVER greater than 1. This has the benefit of retrieving the matching rows in a single pass.
—
The original poster then attempted to create a query to meet the requirements, but the query generated an error:
SQL> select title 2 from book 3 where publisher_code, type in 4 (select publisher_code, type 5 from book 6 group by publisher_code, type 7 having count(*) > 1); where publisher_code, type in * ERROR at line 3: ORA-00920: invalid relational operator
My reponse continues:
Very close to what you need. However, Oracle expects the column names to be wrapped in () … like this: where (publisher_code, type) in
The above uses a subquery, which may perform slow on some Oracle releases compared to the use of an inline view. Assume that I have a table named PART, which has columns ID, DESCRIPITION, PRODUCT_CODE, and COMMODITY_CODE, with ID as the primary key. I want to find ID, DESCRIPTION, and COMMODITY_CODE for all parts with the same DESCRIPTION and PRODUCT_CODE, where there are at least 3 matching parts in the group:
The starting point, which looks similar to your initial query:
SELECT DESCRIPTION, PRODUCT_CODE, COUNT(*) NUM_MATCHES FROM PART GROUP BY DESCRIPTION, PRODUCT_CODE HAVING COUNT(*)>=3;
When the original query is slid into an inline view and joined to the original table, it looks like this:
SELECT P.ID, P.DESCRIPTION, P.COMMODITY_CODE FROM (SELECT DESCRIPTION, PRODUCT_CODE, COUNT(*) NUM_MATCHES FROM PART GROUP BY DESCRIPTION, PRODUCT_CODE HAVING COUNT(*)>=3) IP, PART P WHERE IP.DESCRIPTION=P.DESCRIPTION AND IP.PRODUCT_CODE=P.PRODUCT_CODE;
Here is the DBMS_XPLAN:
------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------- |* 1 | HASH JOIN | | 1 | 1768 | 11525 |00:00:00.21 | 2748 | 1048K| 1048K| 1293K (0)| | 2 | VIEW | | 1 | 1768 | 1156 |00:00:00.11 | 1319 | | | | |* 3 | FILTER | | 1 | | 1156 |00:00:00.11 | 1319 | | | | | 4 | HASH GROUP BY | | 1 | 1768 | 23276 |00:00:00.08 | 1319 | | | | | 5 | TABLE ACCESS FULL| PART | 1 | 35344 | 35344 |00:00:00.04 | 1319 | | | | | 6 | TABLE ACCESS FULL | PART | 1 | 35344 | 35344 |00:00:00.04 | 1429 | | | | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("IP"."DESCRIPTION"="P"."DESCRIPTION" AND "IP"."PRODUCT_CODE"="P"."PRODUCT_CODE") 3 - filter(COUNT(*)>=3)
The query format using the subquery looks like this:
SELECT P.ID, P.DESCRIPTION, P.COMMODITY_CODE FROM PART P WHERE (DESCRIPTION,PRODUCT_CODE) IN (SELECT DESCRIPTION, PRODUCT_CODE FROM PART GROUP BY DESCRIPTION, PRODUCT_CODE HAVING COUNT(*)>=3);
The DBMS_XPLAN, note that Oracle 10.2.0.2 transformed the query above:
----------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------- |* 1 | HASH JOIN RIGHT SEMI | | 1 | 1 | 11525 |00:00:00.21 | 2748 | 1048K| 1048K| 1214K (0)| | 2 | VIEW | VW_NSO_1 | 1 | 1768 | 1156 |00:00:00.12 | 1319 | | | | |* 3 | FILTER | | 1 | | 1156 |00:00:00.12 | 1319 | | | | | 4 | HASH GROUP BY | | 1 | 1768 | 23276 |00:00:00.09 | 1319 | | | | | 5 | TABLE ACCESS FULL| PART | 1 | 35344 | 35344 |00:00:00.04 | 1319 | | | | | 6 | TABLE ACCESS FULL | PART | 1 | 35344 | 35344 |00:00:00.01 | 1429 | | | | ----------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("DESCRIPTION"="$nso_col_1" AND "PRODUCT_CODE"="$nso_col_2") 3 - filter(COUNT(*)>=3)
Without allowing the automatic transformations in Oracle 10.2.0.2, the query takes _much_ longer than 0.21 seconds to complete.
The method using analytical functions starts like this:
SELECT P.ID, P.DESCRIPTION, P.COMMODITY_CODE, COUNT(*) OVER (PARTITION BY DESCRIPTION, PRODUCT_CODE) NUM_MATCHES FROM PART P;
Then, sliding the above into an inline view:
SELECT ID, DESCRIPTION, COMMODITY_CODE FROM (SELECT P.ID, P.DESCRIPTION, P.COMMODITY_CODE, COUNT(*) OVER (PARTITION BY DESCRIPTION, PRODUCT_CODE) NUM_MATCHES FROM PART P) WHERE NUM_MATCHES>=3;
The DBMS_XPLAN for the above looks like this:
----------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------- |* 1 | VIEW | | 1 | 35344 | 11525 |00:00:00.31 | 1319 | | | | | 2 | WINDOW SORT | | 1 | 35344 | 35344 |00:00:00.27 | 1319 | 2533K| 726K| 2251K (0)| | 3 | TABLE ACCESS FULL| PART | 1 | 35344 | 35344 |00:00:00.04 | 1319 | | | | ----------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("NUM_MATCHES">=3) Note that there is only one TABLE ACCESS FULL of the PART table in the above. The execution time required 0.31 seconds to complete, which is greater than the first two approaches, but that is because the database server is concurrently still trying to resolve the query method using the subquery with no permitted transformations (5+ minutes later).
Subquery method with no transformations permitted:
--------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------- |* 1 | FILTER | | 1 | | 11525 |00:46:21.46 | 38M| | 2 | TABLE ACCESS FULL | PART | 1 | 35344 | 35344 |00:00:00.25 | 1429 | |* 3 | FILTER | | 29474 | | 6143 |00:46:06.52 | 38M| | 4 | HASH GROUP BY | | 29474 | 1 | 613M|00:33:24.30 | 38M| | 5 | TABLE ACCESS FULL| PART | 29474 | 35344 | 1041M|00:00:02.54 | 38M| --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter( IS NOT NULL) 3 - filter(("DESCRIPTION"=:B1 AND "PRODUCT_CODE"=:B2 AND COUNT(*)>=3))
Maxim Demenko provided another possible solution for the problem experienced by the original poster.
Leave a Reply