SQL – Filling in Gaps in the Source Data

7 12 2009

December 7, 2009

Some time ago the following question was asked in the comp.databases.oracle.server Usenet group: http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/2244cd033b534dae

Say we have a table with date/time related data

create table hits
(date_time              date,
 ip_address             varchar2(240));

And our data look like this:

date_time               ip_address
---------       ----------
31-oct-2008 13:08       192.168.0.1
31-oct-2008 13:08       192.168.0.1
31-oct-2008 13:11       192.168.0.1
31-oct-2008 13:12       192.168.0.1
31-oct-2008 13:15       192.168.0.1
31-oct-2008 13:15       192.168.0.1

I want to produce a report that shows the number of hits per minute but
include the “gaps” where there is no data recorded.
So I would like to see:

date_time               hits
---------       ----
31-oct-2008 13:08       2
31-oct-2008 13:09       0
31-oct-2008 13:10       0
31-oct-2008 13:11       1
31-oct-2008 13:12       1
31-oct-2008 13:13       0
31-oct-2008 13:14       0
31-oct-2008 13:15       2

Is it possible without outer-joining to another table with “time” data in it?

I was previously pointed to http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/analysis.htm which relates to data densification – but this method involves getting the “blanks” from another table.

You need a way to generate rows with the missing time elements, for instance by creating an inline view with a simple counter:

SELECT
  LEVEL COUNTER
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

 

   COUNTER
==========
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
...

With a slight modification to the above:

SELECT
  DT.DATE_TIME
FROM
  (SELECT
    TO_DATE('31-OCT-2008','DD-MON-YYYY') + (LEVEL/1440) DATE_TIME
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) DT;

 

====================
31-OCT-2008 00:01:00
31-OCT-2008 00:02:00
31-OCT-2008 00:03:00
31-OCT-2008 00:04:00
31-OCT-2008 00:05:00
31-OCT-2008 00:06:00
31-OCT-2008 00:07:00
31-OCT-2008 00:08:00
31-OCT-2008 00:09:00
31-OCT-2008 00:10:00

Once the rows with the missing time elements are available, you could outer join your table to this inline view, something like this:

SELECT
  DT.DATE_TIME,
  NVL(COUNT(*),0) HIT_COUNT
FROM
  (SELECT
    TO_DATE('31-OCT-2008','DD-MON-YYYY') + (LEVEL/1440) DATE_TIME
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) DT,
  HITS
WHERE
  DT.DATE_TIME=HITS.DATE_TIME(+)
GROUP BY
  DT.DATE_TIME;

There are of course more than 1000 minutes in a day (right around 1440), so some adjustment will be necessary.


Actions

Information

One response

8 08 2016
Victor Herreros

Hello,

Great post, but just a few remarks regarding the final query.

In the SELECT section:
1. At least under Oracle 12, though DATE types store up to seconds, for showing it you have to convert them to CHAR type. If not, you’ll only get year, month & day .
2. The NVL function has to consider only the data from HINTS table. If it takes into account all records (*), i.e., all records from the inline DT table, every record will get 1, irrespectively of the existence or not of records from HINTS.

In the inline (DT) table:
3. It’s better, as you suggested, to have values up to 1440 for level, in order to cover every possible minute from the day 31-OCT-2008.

Final comment:
4. For ease of visualization, you’d better ORDER the records by DT.DATE_TIME.

So the whole query would become:

SELECT
TO_CHAR(DT.DATE_TIME, ‘DD-MM-YYYY HH24:MI’),
NVL(COUNT(HITS.DATE_TIME),0) HIT_COUNT
FROM
(SELECT
TO_DATE(’31-OCT-2008′,’DD-MON-YYYY’) + (LEVEL/1440) DATE_TIME
FROM
DUAL
CONNECT BY
LEVEL<=1440) DT,
HITS
WHERE
DT.DATE_TIME = HITS.DATE_TIME(+)
GROUP BY
DT.DATE_TIME
ORDER BY
DT.DATE_TIME;

Thanks for your great explanations!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 179 other followers

%d bloggers like this: