The Unique Result Oracle Database Coding Challenge

28 07 2011

July 28, 2011

I must say that I am impressed with the number of unique solutions that were developed for the previous coding challenge (FizzBuzz). While not all solutions were extremely efficient (a couple were intentionally designed to be as inefficient as possible), the various techniques provide views of different approaches to solving a problem that was not well defined (even though at first glance it appeared to be well defined). While not all of the solutions presented are optimal for the FizzBuzz challenge, derivatives of those solutions might be perfect for real-world problems (for instance, side-tracking a pesky DBA or developer with a performance challenge).

Time for another coding challenge. This challenge is adapted from one that was posed in a Usenet thread several years ago. Remember that if a specification is not well defined, feel free to interpret the specification – one of the goals of this blog article is to see a variety of solutions, but a secondary goal is to determine what might happen when weak specifications are provided in a request for a solution.

Assume that there are two tables, T1 and T2, as designed and populated below. The two tables contain the results from experiments performed by two different analysts.

DROP TABLE T1 PURGE;
DROP TABLE T2 PURGE;

CREATE TABLE T1 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

CREATE TABLE T2 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

INSERT INTO T1 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T1 VALUES ('TEST2A','TEST2B','TEST2C','TEST2D');
INSERT INTO T1 VALUES ('TEST3A','TEST3B','TEST3C','TEST3D');
INSERT INTO T1 VALUES ('TEST4A','TEST4B','TEST4C','TEST4D');
INSERT INTO T1 VALUES ('TEST5A','TEST5B','TEST5C','TEST5D');

INSERT INTO T2 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T2 VALUES ('TEST2A','TEST1B','TEST1C','TEST2D');
INSERT INTO T2 VALUES ('TEST3A','TEST3B','TEST3C','TEST3D');
INSERT INTO T2 VALUES ('TEST5A','TEST5B','TEST5C','TEST5D');
INSERT INTO T2 VALUES ('TEST6A','TEST5B','TEST5C','TEST6D');

This is a three part challenge.

Part 1: Return all rows from table T1 where the C2 value is not found in table T2’s column C2 values, and return all rows from table T2 where the C2 value is not found in table T1’s column C2 values. In short, we want the rows that have unique results found in column C2 so that we may identify the differences found by the two analysts.

Part 2: Using a single select statement, return all rows found in table T1 that are not duplicated in table T2, and all rows found in table T2 that are not duplicated in table T1. The resulting output must identify the table from which the row was found.

Part 3: A third analyst performed experiments and entered his results in table T3, as designed and populated below. Using a single SQL statement, retrieve the rows from tables T1, T2, and T3 that are not duplicated in the other tables – find the unique rows from the three tables. Each row should identify its source table (T1, T2, or T3).

DROP TABLE T3 PURGE;

CREATE TABLE T3 (
  C1 VARCHAR2(10),
  C2 VARCHAR2(10),
  C3 VARCHAR2(10),
  C4 VARCHAR2(10));

INSERT INTO T3 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D');
INSERT INTO T3 VALUES ('TEST2A','TEST1B','TEST1C','TEST2D');
INSERT INTO T3 VALUES ('TEST6A','TEST6B','TEST6C','TEST6D');

When you post your solutions, please identify Part 1, Part 2, or Part 3 next to the solution.

General tips for comments:

Include your code portion of the solution in a monospaced font (Courier) to retain the alignment spaces in the solution. To do this, use a <pre> tag just before the code portion of the solution and a </pre> tag just after the solution ([s ourcecode] and [/s ourcecode] tags (without the space between the first two letters) should have the same result, just with a smaller font size and a non-scrolling code area).
Less than (<) and greater than (>) signs have a special meaning in HTML web pages. As such, specify < for a less than sign and > for a greater than sign in code sections to avoid having portions of your code sections magically disappear.

Think about how you would build the solutions before scrolling down to the comments section.

–

Part 2:

SELECT  NVL(T1.C1,T2.C1) C1, NVL(T1.C2,T2.C2) C2, NVL(T1.C3,T2.C3) C3, NVL(T1.C4,T2.C4) C4, NVL2(T1.C1,'T1','T2') FROM_TABLE FROM T1 FULL OUTER JOIN T2 ON T1.C1=T2.C1 AND T1.C2=T2.C2 AND T1.C3=T2.C3 AND T1.C4=T2.C4 WHERE T1.C1 IS NULL OR T2.C1 IS NULL;

–

Actions

Information

Date : July 28, 2011
Categories : SQL

13 responses

28 07 2011

Raj (08:35:59) :

INSERT INTO T2 VALUES ('TEST6A','TEST6B','TEST6C','TEST6D');

Not sure the above insert statement is intentional, however I changed it to as follows

INSERT INTO T3 VALUES ('TEST6A','TEST6B','TEST6C','TEST6D');

Part_1 :

with data
as
(
 select c1, c2, c3, c4, 'T1' src from t1
 union all
 select c1, c2, c3, c4, 'T2' from t2
),
filter as
(
 select c1, c2, c3, c4, src, count(distinct src) over(partition by c2) cnt from data
)
select * from filter
where cnt = 1;

Part_2 and Part_3 :

with data
as
(
 select c1, c2, c3, c4, 'T1' src from t1
 union all
 select c1, c2, c3, c4, 'T2' from t2
 union all
 select c1, c2, c3, c4, 'T3' from t3
),
filter as
(
 select c1, c2, c3, c4, src, count(distinct src) over(partition by c1, c2, c3, c4) cnt from data
)
select * from filter
where cnt = 1;

Regards

28 07 2011

Charles Hooper (08:40:45) :

Raj,

Thank you for catching the typo. I have corrected the original article.

28 07 2011

Charles Hooper (12:39:39) :

There are a lot of unique solutions for this particular challenge – I would say at least 10 solutions for the second part alone. The solutions posted by Raj are very good, and are not solutions that I had previously consider – he apparently read all three parts of the challenge, and carefully considered SQL code reuse.

Here is a sample of one of the other solutions for the second part of the challenge:

SELECT
  C1,
  C2,
  C3,
  C4,
  'T1' FROM_TABLE
FROM
  T1
WHERE
  (C1,
  C2,
  C3,
  C4)
NOT IN
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T2)
UNION ALL
SELECT
  C1,
  C2,
  C3,
  C4,
  'T2' FROM_TABLE
FROM
  T2
WHERE
  (C1,
  C2,
  C3,
  C4)
NOT IN
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T1);

The above is one possible outcome of expanding the solution for the first part of the challenge into a solution for the second part of the challenge. If we were to expand the above solution to work for the third part of the challenge, considered how might it appear.

If we know how to read a 10053 trace file, might we be able to cheat to find other solutions?

29 07 2011

Charles Hooper (07:53:58) :

Another for the second part of the challenge:

SELECT DISTINCT 
  NVL(T1.C1,T2.C1) COL1, 
  NVL(T1.C2,T2.C2) COL2, 
  NVL(T1.C3,T2.C3) COL3, 
  NVL(T1.C4,T2.C4) COL4, 
  NVL2(T1.C1,'T1','T2') FROM_TABLE 
FROM 
  T1 
FULL OUTER JOIN 
  T2 ON T1.C1=T2.C1 AND T1.C2=T2.C2 AND T1.C3=T2.C3 AND T1.C4=T2.C4
WHERE 
  UTL_I18N.RAW_TO_CHAR(DBMS_CRYPTO.HASH(UTL_I18N.STRING_TO_RAW(NVL(T1.C1,'1'),'AL32UTF8'),1)||
                       DBMS_CRYPTO.HASH(UTL_I18N.STRING_TO_RAW(NVL(T1.C2,'1'),'AL32UTF8'),1)||
                       DBMS_CRYPTO.HASH(UTL_I18N.STRING_TO_RAW(NVL(T1.C3,'1'),'AL32UTF8'),1)||
                       DBMS_CRYPTO.HASH(UTL_I18N.STRING_TO_RAW(NVL(T1.C4,'1'),'AL32UTF8'),1),'AL32UTF8') <>
  UTL_I18N.RAW_TO_CHAR(DBMS_CRYPTO.HASH(UTL_I18N.STRING_TO_RAW(NVL(T2.C1,'1'),'AL32UTF8'),1)||
                       DBMS_CRYPTO.HASH(UTL_I18N.STRING_TO_RAW(NVL(T2.C2,'1'),'AL32UTF8'),1)||
                       DBMS_CRYPTO.HASH(UTL_I18N.STRING_TO_RAW(NVL(T2.C3,'1'),'AL32UTF8'),1)||
                       DBMS_CRYPTO.HASH(UTL_I18N.STRING_TO_RAW(NVL(T2.C4,'1'),'AL32UTF8'),1),'AL32UTF8');

29 07 2011

Charles Hooper (09:30:25) :

Extending the NOT IN approach to the third part of the challenge:

SELECT
  C1,
  C2,
  C3,
  C4,
  'T1' FROM_TABLE
FROM
  T1
WHERE
  (C1,
  C2,
  C3,
  C4)
NOT IN
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T2
  UNION ALL
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T3)
UNION ALL
SELECT
  C1,
  C2,
  C3,
  C4,
  'T2' FROM_TABLE
FROM
  T2
WHERE
  (C1,
  C2,
  C3,
  C4)
NOT IN
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T1
  UNION ALL
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T3)
UNION ALL
SELECT
  C1,
  C2,
  C3,
  C4,
  'T3' FROM_TABLE
FROM
  T3
WHERE
  (C1,
  C2,
  C3,
  C4)
NOT IN
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T1
  UNION ALL
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T2);

Does another solution EXIST that does not use IN? 🙂

29 07 2011

Charles Hooper (08:13:02) :

Another solution that becomes messy when introducing the third table:
Part 2:

SELECT
  C1,
  C2,
  C3,
  C4,
  'T1' FROM_TABLE
FROM
  T1
MINUS
SELECT
  C1,
  C2,
  C3,
  C4,
  'T1' FROM_TABLE
FROM
  T2
UNION ALL
SELECT
  C1,
  C2,
  C3,
  C4,
  'T2' FROM_TABLE
FROM
  T2
MINUS
SELECT
  C1,
  C2,
  C3,
  C4,
  'T2' FROM_TABLE
FROM
  T1;

Extending the above to Part 3:

SELECT
  C1,
  C2,
  C3,
  C4,
  'T1' FROM_TABLE
FROM
  T1
MINUS
SELECT
  *
FROM
(SELECT
  C1,
  C2,
  C3,
  C4,
  FROM_TABLE
FROM
  (SELECT
    C1,
    C2,
    C3,
    C4,
   'T1' FROM_TABLE
  FROM
    T2
  UNION ALL
  SELECT
    C1,
    C2,
    C3,
    C4,
   'T1' FROM_TABLE
  FROM
    T3))
UNION ALL
SELECT
  *
FROM
(SELECT
  C1,
  C2,
  C3,
  C4,
  'T2' FROM_TABLE
FROM
  T2
MINUS
SELECT
  C1,
  C2,
  C3,
  C4,
  FROM_TABLE
FROM
  (SELECT
    C1,
    C2,
    C3,
    C4,
   'T2' FROM_TABLE
  FROM
    T1
  UNION ALL
  SELECT
    C1,
    C2,
    C3,
    C4,
   'T2' FROM_TABLE
  FROM
    T3))
UNION ALL
SELECT
  *
FROM
(SELECT
  C1,
  C2,
  C3,
  C4,
  'T3' FROM_TABLE
FROM
  T3
MINUS
SELECT
  C1,
  C2,
  C3,
  C4,
  FROM_TABLE
FROM
  (SELECT
    C1,
    C2,
    C3,
    C4,
   'T3' FROM_TABLE
  FROM
    T1
  UNION ALL
  SELECT
    C1,
    C2,
    C3,
    C4,
   'T3' FROM_TABLE
  FROM
    T2));

There are of course still more solutions – probably not as compact as the solutions provided by Raj.

29 07 2011

Charles Hooper (08:27:30) :

Part 3 – Determine the list of items that are not wanted, outer join that with the list of items, and make certain that the item is not present in the unwanted list.

Defining the exclusion list:

SELECT
  C1,
  C2,
  C3,
  C4
FROM
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T1
  INTERSECT
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T2)
UNION ALL
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T1
  INTERSECT
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T3)
UNION ALL
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T2
  INTERSECT
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T3);

The finished SQL statement:

WITH EXCLUSION AS
(SELECT
  C1,
  C2,
  C3,
  C4
FROM
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T1
  INTERSECT
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T2)
UNION ALL
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T1
  INTERSECT
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T3)
UNION ALL
  (SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T2
  INTERSECT
  SELECT
    C1,
    C2,
    C3,
    C4
  FROM
    T3))
SELECT
  T1.C1,
  T1.C2,
  T1.C3,
  T1.C4,
 'T1' FROM_TABLE
FROM
  T1,
  EXCLUSION
WHERE
  T1.C1=EXCLUSION.C1(+)
  AND T1.C2=EXCLUSION.C2(+)
  AND T1.C3=EXCLUSION.C3(+)
  AND T1.C4=EXCLUSION.C4(+)
  AND EXCLUSION.C1 IS NULL
UNION ALL
SELECT
  T2.C1,
  T2.C2,
  T2.C3,
  T2.C4,
 'T2' FROM_TABLE
FROM
  T2,
  EXCLUSION
WHERE
  T2.C1=EXCLUSION.C1(+)
  AND T2.C2=EXCLUSION.C2(+)
  AND T2.C3=EXCLUSION.C3(+)
  AND T2.C4=EXCLUSION.C4(+)
  AND EXCLUSION.C1 IS NULL
UNION ALL
SELECT
  T3.C1,
  T3.C2,
  T3.C3,
  T3.C4,
 'T3' FROM_TABLE
FROM
  T3,
  EXCLUSION
WHERE
  T3.C1=EXCLUSION.C1(+)
  AND T3.C2=EXCLUSION.C2(+)
  AND T3.C3=EXCLUSION.C3(+)
  AND T3.C4=EXCLUSION.C4(+)
  AND EXCLUSION.C1 IS NULL;

29 07 2011

Charles Hooper (09:23:29) :

Part 3
A similar idea to the solution provided by Raj:

SELECT
  C1,
  C2,
  C3,
  C4,
  FROM_TABLE
FROM
  (SELECT
    C1,
    C2,
    C3,
    C4,
    FROM_TABLE,
    COUNT(*) OVER (PARTITION BY C1, C2, C3, C4) CNT
  FROM
    (SELECT
      C1,
      C2,
      C3,
      C4,
      'T1' FROM_TABLE
    FROM
      T1
    UNION
    SELECT
      C1,
      C2,
      C3,
      C4,
      'T2' FROM_TABLE
    FROM
      T2
    UNION
    SELECT
      C1,
      C2,
      C3,
      C4,
      'T3' FROM_TABLE
    FROM
      T3))
WHERE
  CNT=1;

If using the COUNT analytic function is not an option:

SELECT
  C1,
  C2,
  C3,
  C4
FROM
  (SELECT
    C1,
    C2,
    C3,
    C4,
    FROM_TABLE,
    MAX(RN) OVER (PARTITION BY C1, C2, C3, C4) MRN
  FROM
    (SELECT
      C1,
      C2,
      C3,
      C4,
      FROM_TABLE,
      ROW_NUMBER() OVER (PARTITION BY C1, C2, C3, C4 ORDER BY FROM_TABLE) RN
    FROM
      (SELECT
        C1,
        C2,
        C3,
        C4,
        'T1' FROM_TABLE
      FROM
        T1
      UNION
      SELECT
        C1,
        C2,
        C3,
        C4,
        'T2' FROM_TABLE
      FROM
        T2
      UNION
      SELECT
        C1,
        C2,
        C3,
        C4,
        'T3' FROM_TABLE
      FROM
        T3)))
WHERE
  MRN=1;

29 07 2011

Charles Hooper (10:39:42) :

Part 3
Another solution that builds an exclusion list and joins the exclusion list to all rows from all tables:

SELECT
  A.C1,
  A.C2,
  A.C3,
  A.C4,
  A.FROM_TABLE
FROM
  (SELECT
    C1,
    C2,
    C3,
    C4,
    'T1' FROM_TABLE
  FROM
    T1
  UNION ALL
  SELECT
    C1,
    C2,
    C3,
    C4,
    'T2' FROM_TABLE
  FROM
    T2
  UNION ALL
  SELECT
    C1,
    C2,
    C3,
    C4,
    'T3' FROM_TABLE
  FROM
    T3) A,
  (SELECT
    T1.C1,
    T1.C2,
    T1.C3,
    T1.C4
  FROM
    T1,
    T2
  WHERE
    T1.C1=T2.C1
    AND T1.C2=T2.C2
    AND T1.C3=T2.C3
    AND T1.C4=T2.C4
  UNION ALL
  SELECT
    T1.C1,
    T1.C2,
    T1.C3,
    T1.C4
  FROM
    T1,
    T3
  WHERE
    T1.C1=T3.C1
    AND T1.C2=T3.C2
    AND T1.C3=T3.C3
    AND T1.C4=T3.C4
  UNION ALL
  SELECT
    T2.C1,
    T2.C2,
    T2.C3,
    T2.C4
  FROM
    T2,
    T3
  WHERE
    T3.C1=T2.C1
    AND T3.C2=T2.C2
    AND T3.C3=T2.C3
    AND T3.C4=T2.C4) E
WHERE
  A.C1=E.C1(+)
  AND A.C2=E.C2(+)
  AND A.C3=E.C3(+)
  AND A.C4=E.C4(+)
  AND E.C1 IS NULL;

15 08 2011

Stew Ashton (04:49:18) :

Just got back from a trip. In case you’re still reading the comments, here is a set of solutions inspired by an asktom thread. Basically, I use aggregation instead of analytics. In parts 2 and 3 I assume that I can throw out duplicate lines in each analyst’s results; this is not necessary, but allows me to add the third table with less change to the code. Besides, you want us to make dangerous assumptions, right?

By the way, do you have an RSS feed?

--PART 1: 
select max(C1) C1, C2, max(C3) C3, max(C4) C4, SUM(T1) T1, SUM(T2) T2 from (
  select C1, C2, C3, C4, 1 T1, 0 T2  from T1
  union all
  select C1, C2, C3, C4, 0 T1, 1 T2 from T2
) group by C2
having SUM(T1) = 0 or SUM(T2) = 0
order by T2;

--part 2:
select C1, C2, C3, C4, sum(t) T from (
  select C1, C2, C3, C4, 1 T from T1
  union
  select C1, C2, C3, C4, 2 T from T2
) group by C1, C2, C3, C4
having SUM(T) in (1, 2)
order by T;

--part 3:
select C1, C2, C3, C4, decode(sum(t),4,3,sum(t)) T from (
  select C1, C2, C3, C4, 1 T from T1
  union
  select C1, C2, C3, C4, 2 T from T2
  union
  select C1, C2, C3, C4, 4 T from T3
) group by C1, C2, C3, C4
having SUM(T) in (1, 2, 4)
order by T;

15 08 2011

Stew Ashton (04:52:41) :

Oops, part 1 should have UNION ALL, not UNION. Part 1 shows how I would do things if I did not eliminate duplicates.

15 08 2011

Charles Hooper (05:15:23) :

I adjusted part 1 per your request.

15 08 2011

Charles Hooper (05:54:52) :

Stew,

Nice solutions – a bit of binary math in part 3. 🙂

It appears that WordPress offers a standardized RSS feed address for all hosted blogs: blogname.wordpress.com/feed/rss/:
https://hoopercharles.wordpress.com/feed/rss/

My RSS feed only includes a small part of my articles for a variety of reasons. It seems that various sites strive to be more than blog aggregators, and have a habit of either not identifying the original author of some articles (oraclecommunity.in.ua and http://www.oracle.cc/the-new-order-oracle-coding-challenge-3-%E2%80%93-mind-boggle.html seems to come to mind), make it appear that people are writing articles specifically for a site (itnewscast.com/database/new-order-oracle-coding-challenge-2 seems to come to mind), or attempt to charge people to read articles that the author makes freely readable (sswug.org seems to come to mind). It is not uncommon for me (and I suspect several others who write technical blog articles) to spend several hours developing each article – limiting the RSS feed simply means that someone must make more of an effort to republish my articles without authorization.
(I have a much longer explanation why my RSS feed includes only partial articles, but this should be a sufficient explanation for now.

	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Favorite Free Applications to…
	Zach on Favorite Free Applications to…
	Charles Hooper on Huge Presentation – Work…
	Dan on Huge Presentation – Work…
	Charles Hooper on Review of Synology DS415+, How…
	Matthias Clausen on Review of Synology DS415+, How…
	Charles Hooper on _SMALL_TABLE_THRESHOLD Paramet…
	mohd atif on _SMALL_TABLE_THRESHOLD Paramet…
	Charles Hooper on Free ANSI SQL to Oracle Specif…
	Kalin Agrawal on Free ANSI SQL to Oracle Specif…
	Charles Hooper on Large Page Support or Not…
	vaurob on Large Page Support or Not…
	Locating Oracle… on Finding a New Home for a Clien…

Charles Hooper's Oracle Notes