Finding Rows with Common Attributes – Roman to Find a Solution in Something New

1 07 2011

July 1, 1011

A recent thread in the comp.databases.oracle.server Usenet group (actually two threads) asked an interesting question.  Assume that you had a detail table that contained several attributes for each of the unique key values.  How would one go about finding all of the unique key values that share the same set of attributes?  The sample set provided by the OP looks like this:

COL1   COL2
----  -----
I         a
I         b
I         c
II        a
II        b
III       a
III       b
III       c 

For the above, assume that the OP was interested in the attributes of “I”: a,b,c.  “II” lacks a “c” attribute, while “III” has the required “a”, “b”, and “c” attributes.  So, the OP would like to return C1 value “III” but not “II”.  I wonder if there is a simple solution for the OP?

First, let’s create our test data.  COL1 appears to contain Roman numbers – if we go beyond the number 3, those could be tricky to generate (unless of course you find the RN format parameter for the TO_CHAR function).  Let’s first create a temporary work table that contains the Roman numbers from 1 to 100 and a random number between 1 and 10:

CREATE TABLE T1_TEMP AS
SELECT
  TRIM(TO_CHAR(ROWNUM,'RN')) C1,
  TRUNC(DBMS_RANDOM.VALUE(1,10)+1) C2,
  ROWNUM C3
FROM
  DUAL
CONNECT BY
  LEVEL<=100; 

Let’s see what is in the T1_TEMP table:

COLUMN C1 FORMAT A10

SELECT
  *
FROM
  T1_TEMP
ORDER BY
  C3;

C1                 C2         C3
---------- ---------- ----------
I                  10          1
II                  4          2
III                 7          3
IV                  9          4
V                   8          5
VI                 10          6
VII                 9          7
VIII                4          8
IX                  4          9
X                  10         10
...
XCV                 5         95
XCVI                4         96
XCVII               8         97
XCVIII              7         98
XCIX               10         99
C                   4        100

100 rows selected. 

The row with the value “I” in column C1 has the number 10 in column C2, but that number might be a bit different in your temporary work table.  Column C2 will determine the number of attributes that are added for each of the values found in column C1 when we create the table T1 (note that we could have defined column C2 with the function CHR(96 + COUNTER) to place lowercase letters in that column, rather than numbers, to help reproduce the OP’s dataset):

CREATE TABLE T1 AS
SELECT
  T1_TEMP.C1,
  V1.COUNTER C2
FROM
  T1_TEMP,
  (SELECT
     ROWNUM COUNTER
   FROM
     DUAL
   CONNECT BY
     LEVEL<=10) V1
WHERE
  T1_TEMP.C2>=V1.COUNTER; 

Let’s see what is in table T1:

SELECT
  *
FROM
  T1
ORDER BY
  C1,
  C2;

C1                 C2
---------- ----------
C                   1
C                   2
C                   3
C                   4
I                   1
I                   2
I                   3
I                   4
I                   5
I                   6
I                   7
I                   8
I                   9
I                  10
...
XXXVII              1
XXXVII              2
XXXVII              3
XXXVIII             1
XXXVIII             2
XXXVIII             3
XXXVIII             4
XXXVIII             5
XXXVIII             6

634 rows selected.  

From the above output, you can see that we now have the number of rows in table T1 for each distinct value of C1 as was specified in table T1_TEMP.  An interesting side-note, the Roman number 100 (C) is less than the Roman number 1 (I) – I guess that explains why computers do not natively use Roman numbers for calculations.  🙂

For the next step, we need to collapse the different C2 values for each of the unique C1 values into a single row.  Oracle Database 11.2.0.1 introduced the LISTAGG function that makes easy work of this task, as shown in this earlier blog article.

COLUMN C2_LISTING FORMAT A22

SELECT
  C1,
  LISTAGG(TO_CHAR(C2), ',') WITHIN GROUP (ORDER BY C2) C2_LISTING
FROM
  T1
GROUP BY
  C1
ORDER BY
  C1;

C1         C2_LISTING
---------- --------------------
C          1,2,3,4
I          1,2,3,4,5,6,7,8,9,10
II         1,2,3,4
III        1,2,3,4,5,6,7
IV         1,2,3,4,5,6,7,8,9
IX         1,2,3,4
...
XXXV       1,2,3,4,5,6,7,8,9
XXXVI      1,2,3,4,5
XXXVII     1,2,3
XXXVIII    1,2,3,4,5,6

100 rows selected. 

The question remains, how can I find all of the unique C1 values that have all of the same attributes as the C1 value “I” – in this case 1,2,3,4,5,6,7,8,9,10?  One method slides the above query into a WITH block and then the WITH block is referenced twice in the main query:

WITH MY_VIEW AS
(SELECT
  C1,
  LISTAGG(TO_CHAR(C2), ',') WITHIN GROUP (ORDER BY C2) C2_LISTING
FROM
  T1
GROUP BY
  C1)
SELECT
  V2.C1,
  V2.C2_LISTING
FROM
  MY_VIEW V1,
  MY_VIEW V2
WHERE
  V1.C1='I'
  AND V1.C1<>V2.C1
  AND V1.C2_LISTING=V2.C2_LISTING
ORDER BY
  V2.C1;

C1         C2_LISTING
---------- --------------------
LVII       1,2,3,4,5,6,7,8,9,10
LXXI       1,2,3,4,5,6,7,8,9,10
LXXIII     1,2,3,4,5,6,7,8,9,10
VI         1,2,3,4,5,6,7,8,9,10
X          1,2,3,4,5,6,7,8,9,10
XCIX       1,2,3,4,5,6,7,8,9,10
XV         1,2,3,4,5,6,7,8,9,10
XXIX       1,2,3,4,5,6,7,8,9,10
XXXI       1,2,3,4,5,6,7,8,9,10

9 rows selected. 

How else might you solve the problem posted by the OP in the Usenet thread?