July 18, 2011
I am surprised at the significant number of unique solutions to the SQL problems that I have previously posed on this blog. For fun I thought that I would give people another chance to demonstrate their unique approaches to solving another problem.
Supposed that a transaction table exists with the following definition:
CREATE TABLE T1 ( TRANS_ID NUMBER, EMPLOYEE_ID VARCHAR2(10), PART_ID VARCHAR2(20), CLOCK_IN DATE, CLOCK_OUT DATE, QTY_COMPLETE NUMBER, PRIMARY KEY (TRANS_ID));
The above table simulates a transaction table that might capture production run times for people (or even robots) to produce sets of parts. The requirement is rather simple – we want to know the average number of hours required to produce a particular part by day in a calendar type layout. The only restriction is that the output must appear in a SQL*Plus window (or SQL*Plus may be used to spool the result to a text file). For example, the output might look like this (feel free to be creative):
PART_ID W SUNDAY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY -------- --- ---------- ---------- ---------- ---------- ---------- ---------- ---------- PART1 1 .98 .56 .67 1.21 .23 .98 .02 PART1 2 .57 .87 .76 .01 .45 .55 .77 PART1 3 .01 .77 .66 .54 .02 .44 .92 PART1 4 .55 .02 .88 .11 .05 .33 .89 PART1 5 .11 .43 .55 .14 .03 .90 .78 PART1 6 .87 .45 .63 .02 .82 .78 .55 PART1 7 .02 .01
If there was no transaction for a particular part on a particular date, a blank (NULL) should appear; to prevent division by zero errors, the hours should be output divided by one if the total QTY_COMPLETE is zero.
Let’s create some reproducible sample data in table T1:
DROP TABLE TABLE T1_EMP PURGE; CREATE TABLE T1_EMP AS SELECT 'EMP'||TO_CHAR(ROWNUM) EMPLOYEE_ID FROM DUAL CONNECT BY LEVEL<=1000; DROP TABLE T1_PART PURGE; CREATE TABLE T1_PART AS SELECT 'PART'||TO_CHAR(ROWNUM) PART_ID FROM DUAL CONNECT BY LEVEL<=200; INSERT INTO T1 SELECT /*+ LEADING(P E) */ ROWNUM TRANS_ID, E.EMPLOYEE_ID, P.PART_ID, TO_DATE('01-JAN-2000','DD-MON-YYYY')+ROWNUM/500 CLOCK_IN, TO_DATE('01-JAN-2000','DD-MON-YYYY')+ROWNUM/500 + ((MOD(ROWNUM,20) + 1) * 0.5) / 24 CLOCK_OUT, ROUND(ABS(SIN(ROWNUM/180*3.141592)*10)) QTY_COMPLETE FROM (SELECT EMPLOYEE_ID, ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID) RN FROM T1_EMP) E, (SELECT PART_ID, ROW_NUMBER() OVER (ORDER BY PART_ID) RN FROM T1_PART) P, (SELECT ROWNUM RN FROM DUAL CONNECT BY LEVEL<=10) D WHERE MOD(E.RN,8)=MOD(P.RN,8); COMMIT;
Taking a quick look at the sample data in table T1:
COLUMN PART_ID FORMAT A8 COLUMN HOURS FORMAT 90.00 SET PAGESIZE 1000 SET LINESIZE 140 SET TRIMSPOOL ON SELECT PART_ID, EMPLOYEE_ID, TRUNC(CLOCK_IN) SHIFT_DATE, CLOCK_OUT-CLOCK_IN HOURS, QTY_COMPLETE FROM T1 WHERE PART_ID='PART1' ORDER BY CLOCK_IN; PART_ID EMPLOYEE_I SHIFT_DAT HOURS QTY_COMPLETE -------- ---------- --------- ------ ------------ PART1 EMP811 31-JAN-01 0.19 9 PART1 EMP811 31-JAN-01 0.21 9 PART1 EMP811 31-JAN-01 0.23 9 PART1 EMP819 04-FEB-01 0.04 10 PART1 EMP819 04-FEB-01 0.06 10 PART1 EMP819 04-FEB-01 0.08 10 PART1 EMP819 04-FEB-01 0.10 10 PART1 EMP819 04-FEB-01 0.13 10 PART1 EMP819 04-FEB-01 0.15 10 ... PART1 EMP912 28-MAR-01 0.19 2 PART1 EMP912 28-MAR-01 0.21 2 PART1 EMP912 28-MAR-01 0.23 2 PART1 EMP92 01-APR-01 0.04 0 PART1 EMP92 01-APR-01 0.06 0 PART1 EMP92 01-APR-01 0.08 1 PART1 EMP92 01-APR-01 0.10 1 ... PART1 EMP992 11-MAY-01 0.19 5 PART1 EMP992 11-MAY-01 0.21 5 PART1 EMP992 11-MAY-01 0.23 5 1250 rows selected.
Now just a quick example of how you might calculate the average hours per piece (note that the HOURS column actually shows the number of days, NOT hours – multiply the numbers in that column by 24 to convert to hours):
SELECT PART_ID, TRUNC(CLOCK_IN) SHIFT_DATE, SUM(CLOCK_OUT-CLOCK_IN) HOURS, SUM(QTY_COMPLETE) QTY_COMPLETE FROM T1 WHERE PART_ID='PART1' GROUP BY PART_ID, TRUNC(CLOCK_IN) ORDER BY TRUNC(CLOCK_IN); PART_ID SHIFT_DAT HOURS QTY_COMPLETE -------- --------- ------ ------------ PART1 01-JAN-00 1.35 90 PART1 05-JAN-00 1.35 100 PART1 09-JAN-00 1.35 98 PART1 13-JAN-00 1.35 81 PART1 17-JAN-00 1.35 56 PART1 21-JAN-00 1.35 25 PART1 25-JAN-00 1.35 10 PART1 29-JAN-00 1.35 44 ... PART1 29-APR-01 1.35 56 PART1 03-MAY-01 1.35 25 PART1 07-MAY-01 1.35 10 PART1 11-MAY-01 1.35 44 125 rows selected.
Interesting how the HOURS column always sums to 1.35 per day…
OK, for you self-starters, create the calendar output. Think about any other kinds of analysis that might be done with this data – is it possible to determine which PART_ID takes the least (or the most) average time per piece on any given day.
–
–
–
For the non-self starters, scroll down.
–
–
–
–
–
–
–
–
–
–
–
–
–
One of the challenges that we face is determining the calendar row and column for any given shift date (the date associated with the CLOCK_IN date/time stamp). Since in this case we know that the first row of data has a date of 01-JAN-2000, we need to determine the Sunday of that week. One way to do that is to use the NEXT_DAY function, like this:
SELECT NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY') S FROM DUAL; S --------- 26-DEC-99
The next step is to divide the shift dates into rows and columns based on the zero date of 26-DEC-99. TRUNC and MOD could be used for this calculation, but to keep things interesting I will use TO_CHAR rather than MOD:
COLUMN R FORMAT 99 COLUMN C FORMAT 9 SELECT PART_ID, NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY') S, TRUNC(CLOCK_IN) SHIFT_DATE, TRUNC((TRUNC(CLOCK_IN)-NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY'))/7)+1 R, TO_NUMBER(TO_CHAR(TRUNC(CLOCK_IN),'D')) C, SUM(CLOCK_OUT-CLOCK_IN) HOURS, SUM(QTY_COMPLETE) QTY_COMPLETE FROM T1 WHERE PART_ID='PART1' GROUP BY PART_ID, TRUNC(CLOCK_IN) ORDER BY TRUNC(CLOCK_IN); PART_ID S SHIFT_DAT R C HOURS QTY_COMPLETE -------- --------- --------- --- -- ------ ------------ PART1 26-DEC-99 01-JAN-00 1 7 1.35 90 PART1 26-DEC-99 05-JAN-00 2 4 1.35 100 PART1 26-DEC-99 09-JAN-00 3 1 1.35 98 PART1 26-DEC-99 13-JAN-00 3 5 1.35 81 PART1 26-DEC-99 17-JAN-00 4 2 1.35 56 PART1 26-DEC-99 21-JAN-00 4 6 1.35 25 PART1 26-DEC-99 25-JAN-00 5 3 1.35 10 ... PART1 26-DEC-99 16-NOV-00 47 5 1.35 72 PART1 26-DEC-99 20-NOV-00 48 2 1.35 90 PART1 26-DEC-99 24-NOV-00 48 6 1.35 100 PART1 26-DEC-99 28-NOV-00 49 3 1.35 98 PART1 26-DEC-99 02-DEC-00 49 7 1.35 81 ... PART1 26-DEC-99 25-APR-01 70 4 1.35 81 PART1 26-DEC-99 29-APR-01 71 1 1.35 56 PART1 26-DEC-99 03-MAY-01 71 5 1.35 25 PART1 26-DEC-99 07-MAY-01 72 2 1.35 10 PART1 26-DEC-99 11-MAY-01 72 6 1.35 44 125 rows selected.
The next step is to collapse all of the rows with the same R value into a single row – we will do this with a combination of the MAX and DECODE functions:
SELECT PART_ID, R, ROUND(MAX(DECODE(C,1,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) SUNDAY, ROUND(MAX(DECODE(C,2,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) MONDAY, ROUND(MAX(DECODE(C,3,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) TUESDAY, ROUND(MAX(DECODE(C,4,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) WEDNESDAY, ROUND(MAX(DECODE(C,5,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) THURSDAY, ROUND(MAX(DECODE(C,6,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) FRIDAY, ROUND(MAX(DECODE(C,7,DECODE(QTY_COMPLETE,0,HOURS,HOURS/QTY_COMPLETE),NULL)),2) SATURDAY FROM (SELECT PART_ID, TRUNC(CLOCK_IN) SHIFT_DATE, TRUNC((TRUNC(CLOCK_IN)-NEXT_DAY(TO_DATE('01-JAN-2000','DD-MON-YYYY')-7,'SUNDAY'))/7)+1 R, TO_NUMBER(TO_CHAR(TRUNC(CLOCK_IN),'D')) C, SUM(CLOCK_OUT-CLOCK_IN) HOURS, SUM(QTY_COMPLETE) QTY_COMPLETE FROM T1 WHERE PART_ID='PART1' GROUP BY PART_ID, TRUNC(CLOCK_IN)) GROUP BY PART_ID, R ORDER BY PART_ID, R; PART_ID R SUNDAY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY -------- --- ---------- ---------- ---------- ---------- ---------- ---------- ---------- PART1 1 .02 PART1 2 .01 PART1 3 .01 .02 PART1 4 .02 .05 PART1 5 .14 .03 PART1 6 .02 PART1 7 .02 .01 PART1 8 .01 .02 ... PART1 69 .01 .01 PART1 70 .02 PART1 71 .02 .05 PART1 72 .14 .03 72 rows selected.
Of course there is a problem with the above – the scale of the time is actually in days, rather than hours, but that is easy enough to fix.
–
–
–
–
–
–
–
Anyone care to share their approach to solving the original problem?
–
–
Assume that the output will be used by a production supervisor – what other kinds of analysis can be performed? For example, could you produce an analysis like the one below that compares recent transactions with those from prior periods and color codes those prior periods based on how the current period (the Avg Hrs Pc column in the below screen capture) compares with the prior periods:
Be creative – performance, while important, is not the primary objective.
I don’t have time to do this right. The following “solution” doesn’t work unless NLS_TERRITORY is set to a country that has Sunday as the first day of the week. Also, if there was no work during some week, that week does not show up in the result set.
select * from (
SELECT
PART_ID,
TRUNC(CLOCK_IN) - to_number(to_char(clock_in, 'D')) + 1 week_starting,
TO_CHAR(CLOCK_IN, 'D') day,
numtodsinterval(round(SUM(CLOCK_OUT-CLOCK_IN)/greatest(SUM(QTY_COMPLETE),1)*24*60), 'minute') HOURS
FROM
T1
WHERE
PART_ID='PART1'
group by
(PART_ID,
TRUNC(CLOCK_IN) - TO_NUMBER(TO_CHAR(CLOCK_IN, 'D')),
TO_CHAR(CLOCK_IN, 'D'))
) S PIVOT (max(HOURS) for day in(1 as SUN, 2 as MON, 3 as TUE, 4 as WED, 5 as THU, 6 as FRI, 7 as SAT))
order by part_id, week_starting
Stew,
Nice example.
When I started writing this article I had intended to mention potential NLS problems regarding TO_CHAR with the ‘D’ parameter, but I forgot by the time I made it to the end of the article – thanks for drawing attention to this potential problem area.
There is also another potential problem that I intentionally did not mention – what happens if there is no transaction for a specific week. In my example, that week is simply skipped – maybe that is the desired result, and maybe it is not.
I had quite a bit of difficulty properly formatting the output from your SQL statement – in SQL*Plus how is the INTERVAL DAY TO SECOND datatype properly formatted for output so that it displays nicely on the screen? After an hour of searching I did not find the answer. Because I could not find a way to nicely format the output, I broke your SQL statement into steps:
To “cheat” with formatting the output, I used the EXTRACT function in the second to the last line of your SQL statement to pull out the hours and minutes from the INTERVAL DAY TO SECOND datatype:
The output of your SQL statement was then:
The addition of the WEEK_START is a nice touch, and the fact that you do not need to specify the “zero date” in the SQL statement is another benefit of your approach. The values shown by your SQL statement in hours and minutes do not exactly match up with the output of my sample SQL statement that shows fractions of days, but that is likely caused by differences in rounding. I was hoping that someone would post an example using the PIVOT operator that was introduced in Oracle Database 11.1 – that method is easier to understand than the MAX(DECODE()) technique.
A lot of people looking at this article, but not many people sharing their solution. I thought that I would share another one that:
* Eliminates the NLS problem
* Eliminates the problem with potential missing weeks in the calendar
* Fixes the fractional day output, replacing it with hours and minutes
First, let’s determine the date range for our test data, including the Sunday of the start week and the Saturday of the end week:
Now let’s generate a list of all of the dates between the calculated MIN_CLOCK_IN and MAX_CLOCK_IN:
Let’s add in the PART_ID column to the above SQL statement, slide it into an inline view, and then outer join the list of all dates with the average hours per date from the table T1 (the second half of the query was explained in the article, but I am converting the days calculation to hours and also breaking out hours and minutes:
Pretty simple so far – we have the dates, as well as the row (R) and column (C) where we should place the AVG_HOURS. Now for the old style cross-tab output as in the article’s example:
Nice, but we are able to take it another step further:
Any other solutions? HTML output spooled to a file?
Charles,
How did you eliminate the NLS problem? Try “alter session set nls_date_language=’FRENCH’ and then run your query…
Stew,
Nice catch – apparently Sunday is not a valid day in the French language. 🙂
What I was thinking about when I started writing this blog article is this comment:
http://download.oracle.com/docs/cd/E11882_01/server.112/e10729/ch3globenv.htm#i1007222
So, the revised query worked around the potential problem where TO_CHAR(TRUNC(CLOCK_IN),’D’)) would return 1 for a Monday rather than a 1 for a Sunday – of course then the calendar will have Sunday in the first position rather than Monday, so I guess that it would not completely work around the that specific NLS problem.
The spelling of the days of the week are still a problem, as you kindly pointed out.
Charles,
I must admit I have not yet managed to digest entire thought process but your last comment caught my attention.
If spelling is the only issue, I remember reading somewhere a workaround for the same. Could the following work?
NEXT_DAY(MIN(TRUNC(CLOCK_IN))-7,TO_CHAR('17-JUL-2011','DAY')) MIN_CLOCK_IN
in place of
NEXT_DAY(MIN(TRUNC(CLOCK_IN))-7,'SUNDAY') MIN_CLOCK_IN
Apologies Charles for not testing before suggesting the solution.
it should be
NEXT_DAY(MIN(TRUNC(SYSDATE))-7,TO_CHAR(TO_DATE('17-07-2011','DD-MM-YYYY'),'DAY')) MIN_CLOCK_IN
Narendra,
Let’s test your idea. First, the portion of the SQL statement that triggered the ORA-01846:
Putting in your suggested fix, the SQL statement returns the same result, so we have a good starting point:
Let’s go back to the SQL statement that failed when testing Stew’s suggested change to the NLS_DATE_LANGUAGE:
The above worked as it should.
Now for the change to the NLS_DATE_LANGUAGE:
Narendra, it appears that your suggestion worked well. All that is needed is a date that is a known Sunday and a date that is a known Saturday.