SQL – Filling in Gaps in the Source Data

7 12 2009

December 7, 2009

Some time ago the following question was asked in the comp.databases.oracle.server Usenet group: http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/2244cd033b534dae

Say we have a table with date/time related data

create table hits
(date_time              date,
 ip_address             varchar2(240));

And our data look like this:

date_time               ip_address
---------       ----------
31-oct-2008 13:08       192.168.0.1
31-oct-2008 13:08       192.168.0.1
31-oct-2008 13:11       192.168.0.1
31-oct-2008 13:12       192.168.0.1
31-oct-2008 13:15       192.168.0.1
31-oct-2008 13:15       192.168.0.1

I want to produce a report that shows the number of hits per minute but
include the "gaps" where there is no data recorded.
So I would like to see:

date_time               hits
---------       ----
31-oct-2008 13:08       2
31-oct-2008 13:09       0
31-oct-2008 13:10       0
31-oct-2008 13:11       1
31-oct-2008 13:12       1
31-oct-2008 13:13       0
31-oct-2008 13:14       0
31-oct-2008 13:15       2

Is it possible without outer-joining to another table with "time" data in it?

I was previously pointed to http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/analysis.htm which relates to data densification - but this method involves getting the "blanks" from another table.

You need a way to generate rows with the missing time elements, for instance by creating an inline view with a simple counter:

SELECT
  LEVEL COUNTER
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

 

   COUNTER
==========
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
...

With a slight modification to the above:

SELECT
  DT.DATE_TIME
FROM
  (SELECT
    TO_DATE('31-OCT-2008','DD-MON-YYYY') + (LEVEL/1440) DATE_TIME
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) DT;

 

====================
31-OCT-2008 00:01:00
31-OCT-2008 00:02:00
31-OCT-2008 00:03:00
31-OCT-2008 00:04:00
31-OCT-2008 00:05:00
31-OCT-2008 00:06:00
31-OCT-2008 00:07:00
31-OCT-2008 00:08:00
31-OCT-2008 00:09:00
31-OCT-2008 00:10:00

Once the rows with the missing time elements are available, you could outer join your table to this inline view, something like this:

SELECT
  DT.DATE_TIME,
  NVL(COUNT(*),0) HIT_COUNT
FROM
  (SELECT
    TO_DATE('31-OCT-2008','DD-MON-YYYY') + (LEVEL/1440) DATE_TIME
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) DT,
  HITS
WHERE
  DT.DATE_TIME=HITS.DATE_TIME(+)
GROUP BY
  DT.DATE_TIME;

There are of course more than 1000 minutes in a day (right around 1440), so some adjustment will be necessary.


Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 139 other followers

%d bloggers like this: