SQL – Determining if Resources are in Use in 2 Hour Time Periods

10 02 2010

February 10, 2010

A question appeared in the comp.databases.oracle.misc Usenet group a couple of years ago that caught my attention, and not just because the subject line read “need urgent help“.  The original poster supplied the following request:

I am creating attendance sheet software for in-house use.

my data is like this:

-----------------------------------------------------------------------
| name     |   login time                   |   logout time           |
-----------------------------------------------------------------------
|  a       |  2007-11-10 12:00:00           |  2007-11-10 16:00:00    |
-----------------------------------------------------------------------
|  b       |  2007-11-10 15:00:00           |  2007-11-10 18:00:00    |
-----------------------------------------------------------------------

My requirement:

I want to generate an hourly report like this:

----------------------------------------------------------
 date             time range        total people logged in
----------------------------------------------------------
 2007-11-10          0 -2                      0
----------------------------------------------------------
 2007-12-10          2-4                       0
----------------------------------------------------------
.
.
----------------------------------------------------------
 2007-11-10         12-14                      1
----------------------------------------------------------
 2007-11-10         14-16                      2
----------------------------------------------------------
 2007-11-10         16-18                      1
----------------------------------------------------------
.
.
----------------------------------------------------------
 2007-11-10         22-24                      0
----------------------------------------------------------

This is what I want to create, but I don’t know how can I generate such kind of report.

Ed Prochak offered the following advice in the thread, advice that is probably something that is easily forgotten when confronted by a simple problem that seems to be impossible to solve.

HINT: Try resolving the problem in steps.

The key to this is understanding that the result set of a SELECT can be considered a table. You may already have the first step you need, but basically try to think of the problem in parts.

If we were to apply Ed’s suggestions, we might start off by trying to simplify the problem with a test table, something like this:

CREATE TABLE T1 (
  USERNAME VARCHAR2(15),
  LOGIN_TIME DATE,
  LOGOUT_TIME DATE);

INSERT INTO
  T1
VALUES(
  'a',
  TO_DATE('2007-11-10 12:00','YYYY-MM-DD HH24:MI'),
  TO_DATE('2007-11-10 16:00','YYYY-MM-DD HH24:MI'));

INSERT INTO
  T1
VALUES(
  'b',
  TO_DATE('2007-11-10 15:00','YYYY-MM-DD HH24:MI'),
  TO_DATE('2007-11-10 18:00','YYYY-MM-DD HH24:MI'));

COMMIT;

At this point, we might start thinking about what, if any, potential problems we might encounter.  One of the challenges that we will face is working around the need to generate up to 12 rows (1 for each of the 2 hour long possible time periods) for each row in the source table.  A second problem is how to handle logins that occur before midnight, with a corresponding logout that occurs after midnight.  If it were known that there would be no time periods that cross midnight, we might try to build a solution like this using our test table:

SELECT
  TRUNC(LOGIN_TIME) CHECK_DATE,
  TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
  TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
FROM
  T1;

CHECK_DAT LOGIN_HOUR LOGOUT_HOUR
--------- ---------- -----------
10-NOV-07         12          16
10-NOV-07         15          18

The above just simplifies the input table into dates, login hour and logout hour.

Next, we need a way to generate 12 rows.  We could just use an existing table, and specify that we want to return all rows where ROWNUM<=12, but we will use CONNECT BY LEVEL, which could result in greater CPU consumption, but would likely be more portable:

SELECT
  (LEVEL-1)*2 LOGIN_COUNTER,
  (LEVEL-1)*2+2 LOGOUT_COUNTER
FROM
  DUAL
CONNECT BY
  LEVEL<=12;

LOGIN_COUNTER LOGOUT_COUNTER
------------- --------------
            0              2
            2              4
            4              6
            6              8
            8             10
           10             12
           12             14
           14             16
           16             18
           18             20
           20             22
           22             24

Now that we have the two simplified data sets, we just need to find where the two data sets intersect.  First, let’s find those records where the numbers from the counter fall between the LOGIN_HOUR and the LOGOUT_HOUR:

SELECT
  T.CHECK_DATE,
  T.LOGIN_HOUR,
  T.LOGOUT_HOUR,
  TO_CHAR(LOGIN_COUNTER,'99')||'-'||TO_CHAR(LOGOUT_COUNTER,'99') TIME_RANGE
FROM
  (SELECT
    TRUNC(LOGIN_TIME) CHECK_DATE,
    TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
    TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
  FROM
    T1) T,
  (SELECT
    (LEVEL-1)*2 LOGIN_COUNTER,
    (LEVEL-1)*2+2 LOGOUT_COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  C.LOGIN_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR
  AND C.LOGOUT_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR
ORDER BY
  1,
  4,
  2;

CHECK_DAT LOGIN_HOUR LOGOUT_HOUR TIME_RA
--------- ---------- ----------- -------
10-NOV-07         12          16  12- 14
10-NOV-07         12          16  14- 16
10-NOV-07         15          18  16- 18

You may notice that the above output is missing one row.  Let’s see if we can find a way to include the missing row:

SELECT
  T.CHECK_DATE,
  T.LOGIN_HOUR,
  T.LOGOUT_HOUR,
  TO_CHAR(LOGIN_COUNTER,'99')||'-'||TO_CHAR(LOGOUT_COUNTER,'99') TIME_RANGE
FROM
  (SELECT
    TRUNC(LOGIN_TIME) CHECK_DATE,
    TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
    TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
  FROM
    T1) T,
  (SELECT
    (LEVEL-1)*2 LOGIN_COUNTER,
    (LEVEL-1)*2+2 LOGOUT_COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  (C.LOGIN_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR
    AND C.LOGOUT_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR)
  OR T.LOGIN_HOUR BETWEEN C.LOGIN_COUNTER AND C.LOGOUT_COUNTER-1
  OR T.LOGOUT_HOUR BETWEEN C.LOGIN_COUNTER+1 AND C.LOGOUT_COUNTER
ORDER BY
  1,
  4,
  2;

CHECK_DAT LOGIN_HOUR LOGOUT_HOUR TIME_RA
--------- ---------- ----------- -------
10-NOV-07         12          16  12- 14
10-NOV-07         12          16  14- 16
10-NOV-07         15          18  14- 16
10-NOV-07         15          18  16- 18

By also allowing the LOGIN_HOUR to fall between the LOGIN_COUNTER and LOGOUT_COUNTER, or the LOGOUT_HOUR to fall between the LOGIN_COUNTER and LOGOUT_COUNTER (with a slight adjustment), we pick up the missing row.  Now, it is a simple matter to find the total number in each time period by sliding the above into an inline view:

SELECT
  T.CHECK_DATE,
  TO_CHAR(LOGIN_COUNTER,'99')||'-'||TO_CHAR(LOGOUT_COUNTER,'99') TIME_RANGE,
  COUNT(*) TOTAL_PEOPLE
FROM
  (SELECT
    TRUNC(LOGIN_TIME) CHECK_DATE,
    TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
    TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
  FROM
    T1) T,
  (SELECT
    (LEVEL-1)*2 LOGIN_COUNTER,
    (LEVEL-1)*2+2 LOGOUT_COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  (C.LOGIN_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR
    AND C.LOGOUT_COUNTER BETWEEN T.LOGIN_HOUR AND T.LOGOUT_HOUR)
  OR T.LOGIN_HOUR BETWEEN C.LOGIN_COUNTER AND C.LOGOUT_COUNTER-1
  OR T.LOGOUT_HOUR BETWEEN C.LOGIN_COUNTER+1 AND C.LOGOUT_COUNTER
GROUP BY
  T.CHECK_DATE,
  TO_CHAR(LOGIN_COUNTER,'99')||'-'||TO_CHAR(LOGOUT_COUNTER,'99')
ORDER BY
  1,
  2;

CHECK_DAT TIME_RA TOTAL_PEOPLE
--------- ------- ------------
10-NOV-07  12- 14            1
10-NOV-07  14- 16            2
10-NOV-07  16- 18            1

The above SQL statement is likely not the only solution to the problem.  Let’s take another look at the problem.  What if there is a need for the time intervals to cross midnight.  We need to make a couple of adjustments.  First, let’s add another row to the table for variety:

INSERT INTO
  T1
VALUES(
  'c',
  TO_DATE('2007-11-10 13:00','YYYY-MM-DD HH24:MI'),
  TO_DATE('2007-11-10 19:00','YYYY-MM-DD HH24:MI'));

We can start with the SELECT statement we used earlier:

SELECT
  LOGIN_TIME,
  TO_NUMBER(TO_CHAR(LOGIN_TIME,'HH24')) LOGIN_HOUR,
  TO_NUMBER(TO_CHAR(LOGOUT_TIME,'HH24')) LOGOUT_HOUR
FROM
  T1;

LOGIN_TIM LOGIN_HOUR LOGOUT_HOUR
--------- ---------- -----------
10-NOV-07         12          16
10-NOV-07         15          18
10-NOV-07         13          19

We will modified the SQL statement above to produce the same output, with a little bit more efficiency:

SELECT
  LOGIN_TIME,
  (LOGIN_TIME-TRUNC(LOGIN_TIME))*24 LOGIN_HOUR,
  (LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24 LOGOUT_HOUR
FROM
  T1;

LOGIN_TIM LOGIN_HOUR LOGOUT_HOUR
--------- ---------- -----------
10-NOV-07         12          16
10-NOV-07         15          18
10-NOV-07         13          19

Now, we need to round the clock in and clock out times to two hour intervals – note that the LOGOUT_HOUR_A value of the last row was rounded up:

SELECT
  LOGIN_TIME,
  (LOGIN_TIME-TRUNC(LOGIN_TIME))*24 LOGIN_HOUR,
  (LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24 LOGOUT_HOUR,
  FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2 LOGIN_HOUR_A,
  CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2 LOGOUT_HOUR_A
FROM
  T1;

LOGIN_TIM LOGIN_HOUR LOGOUT_HOUR LOGIN_HOUR_A LOGOUT_HOUR_A
--------- ---------- ----------- ------------ -------------
10-NOV-07         12          16           12            16
10-NOV-07         15          18           14            18
10-NOV-07         13          19           12            20

Let’s take the above hours and translate them back into date/time values and determine the number of intervals between the adjusted LOGIN_HOUR_A and LOGOUT_HOUR_A:

SELECT
  LOGIN_TIME,
  LOGOUT_TIME,
  TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24 LOGIN_HOUR_A,
  TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24 LOGOUT_HOUR_A,
  ((TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24)
    - (TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24))*12 H
FROM
  T1;

LOGIN_TIME           LOGOUT_TIME          LOGIN_HOUR_A         LOGOUT_HOUR_A        H
-------------------- -------------------- -------------------- -------------------- -
10-NOV-2007 12:00:00 10-NOV-2007 16:00:00 10-NOV-2007 12:00:00 10-NOV-2007 16:00:00 2
10-NOV-2007 15:00:00 10-NOV-2007 18:00:00 10-NOV-2007 14:00:00 10-NOV-2007 18:00:00 2
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 12:00:00 10-NOV-2007 20:00:00 4

We then combine the above with a simple counter that counts from 1 up to 12, only joining those rows from the counter that are less than or equal to the calculated number of intervals.  By adding the number of hours determined by the counter to the adjusted LOGIN_HOUR_A, values we obtain the time intervals:

SELECT
  T.LOGIN_TIME,
  T.LOGOUT_TIME,
  T.LOGIN_HOUR_A+(C.COUNTER*2-2)/24 TIME_START,
  T.LOGIN_HOUR_A+(C.COUNTER*2)/24 TIME_END
FROM
  (SELECT
    LOGIN_TIME,
    LOGOUT_TIME,
    TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24 LOGIN_HOUR_A,
    TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24 LOGOUT_HOUR_A,
    ((TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24)
     - (TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24))*12 H
  FROM
    T1) T,
  (SELECT
    LEVEL COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  T.H>=C.COUNTER;

LOGIN_TIME           LOGOUT_TIME          TIME_START           TIME_END
==================== ==================== ==================== ====================
10-NOV-2007 12:00:00 10-NOV-2007 16:00:00 10-NOV-2007 12:00:00 10-NOV-2007 14:00:00
10-NOV-2007 15:00:00 10-NOV-2007 18:00:00 10-NOV-2007 14:00:00 10-NOV-2007 16:00:00
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 12:00:00 10-NOV-2007 14:00:00
10-NOV-2007 12:00:00 10-NOV-2007 16:00:00 10-NOV-2007 14:00:00 10-NOV-2007 16:00:00
10-NOV-2007 15:00:00 10-NOV-2007 18:00:00 10-NOV-2007 16:00:00 10-NOV-2007 18:00:00
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 14:00:00 10-NOV-2007 16:00:00
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 16:00:00 10-NOV-2007 18:00:00
10-NOV-2007 13:00:00 10-NOV-2007 19:00:00 10-NOV-2007 18:00:00 10-NOV-2007 20:00:00

The final step is to perform a group by:

SELECT
  CHECK_DATE,
  TO_CHAR(TIME_START,'HH24')||'-'||TO_CHAR(TIME_END,'HH24') TIME_RANGE,
  COUNT(*) TOTAL_PEOPLE
FROM
(SELECT
  TRUNC(T.LOGIN_HOUR_A+(C.COUNTER*2-2)/24) CHECK_DATE,
  T.LOGIN_HOUR_A+(C.COUNTER*2-2)/24 TIME_START,
  T.LOGIN_HOUR_A+(C.COUNTER*2)/24 TIME_END
FROM
  (SELECT
    LOGIN_TIME,
    LOGOUT_TIME,
    TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24 LOGIN_HOUR_A,
    TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24 LOGOUT_HOUR_A,
    ((TRUNC(LOGOUT_TIME)+(CEIL((LOGOUT_TIME-TRUNC(LOGOUT_TIME))*24/2)*2)/24)
     - (TRUNC(LOGIN_TIME)+(FLOOR((LOGIN_TIME-TRUNC(LOGIN_TIME))*24/2)*2)/24))*12 H
  FROM
    T1) T,
  (SELECT
    LEVEL COUNTER
  FROM
    DUAL
  CONNECT BY
    LEVEL<=12) C
WHERE
  T.H>=C.COUNTER)
GROUP BY
  CHECK_DATE,
  TO_CHAR(TIME_START,'HH24')||'-'||TO_CHAR(TIME_END,'HH24')
ORDER BY
  1,
  2;

CHECK_DAT TIME_ TOTAL_PEOPLE
--------- ----- ------------
10-NOV-07 12-14            2
10-NOV-07 14-16            3
10-NOV-07 16-18            2
10-NOV-07 18-20            1

Now that we know that the above approach provides the desired results with the existing data, let’s add a row to the table where the values cross midnight:

INSERT INTO
  T1
VALUES(
  'c',
  TO_DATE('2007-11-10 19:00','YYYY-MM-DD HH24:MI'),
  TO_DATE('2007-11-11 04:00','YYYY-MM-DD HH24:MI'));

The results of our final SQL statement look like this following:

CHECK_DAT TIME_ TOTAL_PEOPLE
--------- ----- ------------
10-NOV-07 12-14            2
10-NOV-07 14-16            3
10-NOV-07 16-18            2
10-NOV-07 18-20            2
10-NOV-07 20-22            1
10-NOV-07 22-00            1
11-NOV-07 00-02            1
11-NOV-07 02-04            1

I don’t think that anyone mentioned it in the Usenet thread, but Ed Prochak’s suggestion was correct.  It does not matter much if someone is trying to solve an algebra problem (or even manually attempting to solve a long division problem), a performance tuning problem, or a SQL related problem – what is required is a logical, step-by-step approach to tackling the problem, with each step moving closer to the end result.








Follow

Get every new post delivered to your Inbox.

Join 139 other followers