SQL – Recursive Summing of Related Entities

5 12 2009

December 5, 2009

A recent thread in the comp.databases.oracle.server Usenet group asked the following:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/fdce5b3d641c5f1a

Table dir_size stores the mbytes of storage used in a given directory. Table directories stores various directory names which may or may not exist in table dir_size.
For every directory in table directories, report the cumulative storage in that directory and all its subdirectories. This solution uses a cartesian join.  I imagine it will not scale well.

Thanks in advance!

create table dir_size (
  dir_name     varchar2(40),
  mbytes       number
  );

create table directories (
  dir_name    varchar2(40)
  );

insert into dir_size values ('c:\aaa\bbb\ccc\ddd', 100);
insert into dir_size values ('c:\aaa\bbb\ccc', 100);
insert into dir_size values ('c:\aaa\bbb', 100);
insert into dir_size values ('c:\aaa', 100);
insert into dir_size values ('c:\', 100);
insert into directories values ('c:\aaa\bbb\ccc\ddd');
insert into directories values ('c:\aaa\bbb\ccc');
insert into directories values ('c:\aaa\bbb');
insert into directories values ('c:\aaa');
insert into directories values ('c:\');
insert into directories values ('c:\xxx\yyy\zzz');
commit;

select dir_name, sum(mbytes) from (
  select directories.dir_name,
    instr(dir_size.dir_name, directories.dir_name) INSTR,
    mbytes
  from directories, dir_size
)
where INSTR = 1
group by dir_name
order by 1;

DIR_NAME                                 SUM(MBYTES)
---------------------------------------- -----------
c:\                                              500
c:\aaa                                           400
c:\aaa\bbb                                       300
c:\aaa\bbb\ccc                                   200
c:\aaa\bbb\ccc\ddd                               100

This appears to be a hard problem.  To avoid headaches, make certain that each of the DIR_NAMES ends with “\”

Let’s start here:

SELECT
  'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
  100 MBYTES
FROM
  DUAL;

DIR_NAME                 MBYTES
-------------------- ----------
c:\aaa\bbb\ccc\ddd\         100

In your example, you would like to put 100MB into the following directories based on the above:

c:\
c:\aaa\
c:\aaa\bbb\
c:\aaa\bbb\ccc\
c:\aaa\bbb\ccc\ddd\

You somehow need to be able to break that one row into 5 rows.  The following might help

SELECT
  LEVEL L
FROM
  DUAL
CONNECT BY
  LEVEL<=20;

  L
---
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20

If we join those two row sources together we might be able to create 5 rows from the one row:

SELECT
  SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) DIR_NAME2,
  MBYTES
FROM
  (SELECT
    'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
    100 MBYTES
  FROM
    DUAL) DIR_SIZE,
  (SELECT
    LEVEL L
  FROM
    DUAL
  CONNECT BY
    LEVEL<=20) C
WHERE
  SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) IS NOT NULL;

DIR_NAME2                MBYTES
-------------------- ----------
c:\                         100
c:\aaa\                     100
c:\aaa\bbb\                 100
c:\aaa\bbb\ccc\             100
c:\aaa\bbb\ccc\ddd\         100

Now, if we performed the same process for all of the rows in the DIR_SIZE table, grouping on DIR_NAME2, we might be able to find the SUM of the MBYTES column.

(Note that I did not provide an exact/final answer to the original poster – my post was intended to push the OP in the right direction of a solution.)

The OP followed up with this comment:

Thanks for the suggestion.  I suspect the best way will involve some kind of recursive processing.  The tricky bit is the matching of the rows in the directories table to the rows in the dir_size table.  We need to do a “like” (which we can’t, of course) which is why I thought of the instr.

 The LIKE keyword is not necessary.

Notice how closely the output of the following SQL statement:

SELECT
  'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
  100 MBYTES
FROM
  DUAL;

Matches the row created by one of your insert statements:

insert into dir_size values ('c:\aaa\bbb\ccc\ddd', 100);

You might try replacing in the above examples:

SELECT
  'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
  100 MBYTES
FROM
  DUAL;

With a SQL statement that selects all of the rows from your DIR_SIZE table – the results might surprise you IF each of the DIR_NAME values end with a “\”.
You really need more variety in the insert statements to see what is happening, for example:

insert into dir_size values ('c:\ddd\', 800);
insert into dir_size values ('c:\ddd\kkk\', 300);

The first of the above SQL statements will increase the calculated SUM in the c:\ directory by 800, and the second insert statement will increase the SUM in both of the c:\ and c:\ddd\ directories by 300 if you modify my original example to use the DIR_SIZE table rather than the DUAL table.
The final part that I did not provide to the OP is below:

TRUNCATE TABLE DIR_SIZE;

insert into dir_size values ('c:\aaa\bbb\ccc\ddd\', 100);
insert into dir_size values ('c:\aaa\bbb\ccc\', 100);
insert into dir_size values ('c:\aaa\bbb\', 100);
insert into dir_size values ('c:\aaa\', 100);
insert into dir_size values ('c:\', 100);
insert into dir_size values ('c:\ddd\', 800);
insert into dir_size values ('c:\ddd\kkk\', 300);

Working with the hints provided and the final SQL statement in my post, we start with the following:

SELECT
  SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) DIR_NAME2,
  MBYTES
FROM
  (SELECT
    *
  FROM
    DIR_SIZE) DIR_SIZE,
  (SELECT
    LEVEL L
  FROM
    DUAL
  CONNECT BY
    LEVEL<=20) C
WHERE
  SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) IS NOT NULL;

DIR_NAME2                MBYTES
-------------------- ----------
c:\                         100
c:\                         100
c:\                         100
c:\                         100
c:\                         100
c:\                         800
c:\                         300
c:\aaa\                     100
c:\aaa\                     100
c:\aaa\                     100
c:\aaa\                     100
c:\ddd\                     800
c:\ddd\                     300
c:\aaa\bbb\                 100
c:\aaa\bbb\                 100
c:\aaa\bbb\                 100
c:\ddd\kkk\                 300
c:\aaa\bbb\ccc\             100
c:\aaa\bbb\ccc\             100
c:\aaa\bbb\ccc\ddd\         100

 

SELECT
  SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) DIR_NAME2,
  SUM(MBYTES) MBYTES
FROM
  DIR_SIZE,
  (SELECT
    LEVEL L
  FROM
    DUAL
  CONNECT BY
    LEVEL<=20) C
WHERE
  SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) IS NOT NULL
GROUP BY
  SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L))
ORDER BY
  SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L));

DIR_NAME2                MBYTES
-------------------- ----------
c:\                        1600
c:\aaa\                     400
c:\aaa\bbb\                 300
c:\aaa\bbb\ccc\             200
c:\aaa\bbb\ccc\ddd\         100
c:\ddd\                    1100
c:\ddd\kkk\                 300

Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

Join 141 other followers

%d bloggers like this: