July 28, 2011
I must say that I am impressed with the number of unique solutions that were developed for the previous coding challenge (FizzBuzz). While not all solutions were extremely efficient (a couple were intentionally designed to be as inefficient as possible), the various techniques provide views of different approaches to solving a problem that was not well defined (even though at first glance it appeared to be well defined). While not all of the solutions presented are optimal for the FizzBuzz challenge, derivatives of those solutions might be perfect for real-world problems (for instance, side-tracking a pesky DBA or developer with a performance challenge).
Time for another coding challenge. This challenge is adapted from one that was posed in a Usenet thread several years ago. Remember that if a specification is not well defined, feel free to interpret the specification – one of the goals of this blog article is to see a variety of solutions, but a secondary goal is to determine what might happen when weak specifications are provided in a request for a solution.
Assume that there are two tables, T1 and T2, as designed and populated below. The two tables contain the results from experiments performed by two different analysts.
DROP TABLE T1 PURGE; DROP TABLE T2 PURGE; CREATE TABLE T1 ( C1 VARCHAR2(10), C2 VARCHAR2(10), C3 VARCHAR2(10), C4 VARCHAR2(10)); CREATE TABLE T2 ( C1 VARCHAR2(10), C2 VARCHAR2(10), C3 VARCHAR2(10), C4 VARCHAR2(10)); INSERT INTO T1 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D'); INSERT INTO T1 VALUES ('TEST2A','TEST2B','TEST2C','TEST2D'); INSERT INTO T1 VALUES ('TEST3A','TEST3B','TEST3C','TEST3D'); INSERT INTO T1 VALUES ('TEST4A','TEST4B','TEST4C','TEST4D'); INSERT INTO T1 VALUES ('TEST5A','TEST5B','TEST5C','TEST5D'); INSERT INTO T2 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D'); INSERT INTO T2 VALUES ('TEST2A','TEST1B','TEST1C','TEST2D'); INSERT INTO T2 VALUES ('TEST3A','TEST3B','TEST3C','TEST3D'); INSERT INTO T2 VALUES ('TEST5A','TEST5B','TEST5C','TEST5D'); INSERT INTO T2 VALUES ('TEST6A','TEST5B','TEST5C','TEST6D');
This is a three part challenge.
Part 1: Return all rows from table T1 where the C2 value is not found in table T2’s column C2 values, and return all rows from table T2 where the C2 value is not found in table T1’s column C2 values. In short, we want the rows that have unique results found in column C2 so that we may identify the differences found by the two analysts.
Part 2: Using a single select statement, return all rows found in table T1 that are not duplicated in table T2, and all rows found in table T2 that are not duplicated in table T1. The resulting output must identify the table from which the row was found.
Part 3: A third analyst performed experiments and entered his results in table T3, as designed and populated below. Using a single SQL statement, retrieve the rows from tables T1, T2, and T3 that are not duplicated in the other tables – find the unique rows from the three tables. Each row should identify its source table (T1, T2, or T3).
DROP TABLE T3 PURGE; CREATE TABLE T3 ( C1 VARCHAR2(10), C2 VARCHAR2(10), C3 VARCHAR2(10), C4 VARCHAR2(10)); INSERT INTO T3 VALUES ('TEST1A','TEST1B','TEST1C','TEST1D'); INSERT INTO T3 VALUES ('TEST2A','TEST1B','TEST1C','TEST2D'); INSERT INTO T3 VALUES ('TEST6A','TEST6B','TEST6C','TEST6D');
When you post your solutions, please identify Part 1, Part 2, or Part 3 next to the solution.
General tips for comments:
- Include your code portion of the solution in a monospaced font (Courier) to retain the alignment spaces in the solution. To do this, use a <pre> tag just before the code portion of the solution and a </pre> tag just after the solution ([s ourcecode] and [/s ourcecode] tags (without the space between the first two letters) should have the same result, just with a smaller font size and a non-scrolling code area).
- Less than (<) and greater than (>) signs have a special meaning in HTML web pages. As such, specify < for a less than sign and > for a greater than sign in code sections to avoid having portions of your code sections magically disappear.
Think about how you would build the solutions before scrolling down to the comments section.
–
–
–
–
–
–
–
–
Part 2:
SELECT NVL(T1.C1,T2.C1) C1, NVL(T1.C2,T2.C2) C2, NVL(T1.C3,T2.C3) C3, NVL(T1.C4,T2.C4) C4, NVL2(T1.C1,'T1','T2') FROM_TABLE FROM T1 FULL OUTER JOIN T2 ON T1.C1=T2.C1 AND T1.C2=T2.C2 AND T1.C3=T2.C3 AND T1.C4=T2.C4 WHERE T1.C1 IS NULL OR T2.C1 IS NULL;
–
–
–
–
–
–
Not sure the above insert statement is intentional, however I changed it to as follows
Part_1 :
Part_2 and Part_3 :
Regards
Raj,
Thank you for catching the typo. I have corrected the original article.
There are a lot of unique solutions for this particular challenge – I would say at least 10 solutions for the second part alone. The solutions posted by Raj are very good, and are not solutions that I had previously consider – he apparently read all three parts of the challenge, and carefully considered SQL code reuse.
Here is a sample of one of the other solutions for the second part of the challenge:
The above is one possible outcome of expanding the solution for the first part of the challenge into a solution for the second part of the challenge. If we were to expand the above solution to work for the third part of the challenge, considered how might it appear.
If we know how to read a 10053 trace file, might we be able to cheat to find other solutions?
Another for the second part of the challenge:
Extending the NOT IN approach to the third part of the challenge:
Does another solution EXIST that does not use IN? 🙂
Another solution that becomes messy when introducing the third table:
Part 2:
Extending the above to Part 3:
There are of course still more solutions – probably not as compact as the solutions provided by Raj.
Part 3 – Determine the list of items that are not wanted, outer join that with the list of items, and make certain that the item is not present in the unwanted list.
Defining the exclusion list:
The finished SQL statement:
Part 3
A similar idea to the solution provided by Raj:
If using the COUNT analytic function is not an option:
Part 3
Another solution that builds an exclusion list and joins the exclusion list to all rows from all tables:
Just got back from a trip. In case you’re still reading the comments, here is a set of solutions inspired by an asktom thread. Basically, I use aggregation instead of analytics. In parts 2 and 3 I assume that I can throw out duplicate lines in each analyst’s results; this is not necessary, but allows me to add the third table with less change to the code. Besides, you want us to make dangerous assumptions, right?
By the way, do you have an RSS feed?
Oops, part 1 should have UNION ALL, not UNION. Part 1 shows how I would do things if I did not eliminate duplicates.
I adjusted part 1 per your request.
Stew,
Nice solutions – a bit of binary math in part 3. 🙂
It appears that WordPress offers a standardized RSS feed address for all hosted blogs: blogname.wordpress.com/feed/rss/:
https://hoopercharles.wordpress.com/feed/rss/
My RSS feed only includes a small part of my articles for a variety of reasons. It seems that various sites strive to be more than blog aggregators, and have a habit of either not identifying the original author of some articles (oraclecommunity.in.ua and http://www.oracle.cc/the-new-order-oracle-coding-challenge-3-%E2%80%93-mind-boggle.html seems to come to mind), make it appear that people are writing articles specifically for a site (itnewscast.com/database/new-order-oracle-coding-challenge-2 seems to come to mind), or attempt to charge people to read articles that the author makes freely readable (sswug.org seems to come to mind). It is not uncommon for me (and I suspect several others who write technical blog articles) to spend several hours developing each article – limiting the RSS feed simply means that someone must make more of an effort to republish my articles without authorization.
(I have a much longer explanation why my RSS feed includes only partial articles, but this should be a sufficient explanation for now.