SQL – Reformatting to Improve Performance 4

4 12 2009

December 4, 2009

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Note: the Oracle cost-based query optimizer is able to automatically transform queries submitted by applications to produce an equivalent SQL statement that will hopefully improve SQL execution performance. The possible transformations are limited by the Oracle release version, the value of the OPTIMIZER_FEATURES_ENABLE parameter (and various other hidden parameters), the potential presence of NULL values, and hints embedded in the SQL statement.

Some time ago the following question appeared in a forum:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/25ee58d1a12e1fd1

(The thread also contains an interesting discussion about the effects of over-indexing tables.)

I have the following two select statements in some code I’ve been looking at.  I don’t have access to a computer with Oracle on it currently so I can’t determine this myself.

Both selects do the same thing.  The function in the second version simply checks the condition what is in the first and send back a ‘Y’ or ‘N’ whereas the first has the check in the where clause itself.

My question is this:  is any one of the two selects more efficient (or less efficient) than the other?

SELECT a.mbr_dob,
       a.mbr_first_nm,
       a.mbr_gender,
       b.mbr_key,
       a.mbr_last_nm,
       b.mbr_sys_key,
       b.mbr_updated_dt,
       a.mbr_x_fst_nm,
       a.mbr_x_lst_nm,
       a.person_id,
       a.z_person_id
  FROM mbr_person a,
       mbr_system b
 WHERE a.person_id = b.person_id
   AND a.deactivate = 0
   AND b.deactivate = 0
   AND (
             a.mbr_last_nm NOT IN ( 'DATA','CORRECTION' )
         AND a.mbr_first_nm NOT IN ( 'DATA','CORRECTION' )
        );

 

SELECT a.mbr_dob,
       a.mbr_first_nm,
       a.mbr_gender,
       b.mbr_key,
       a.mbr_last_nm,
       b.mbr_sys_key,
       b.mbr_updated_dt,
       a.mbr_x_fst_nm,
       a.mbr_x_lst_nm,
       a.person_id,
       a.z_person_id
  FROM mbr_person a,
       mbr_system b
 WHERE a.person_id = b.person_id
   AND a.deactivate = 0
   AND b.deactivate = 0
   AND not_data_correction( a.mbr_last_nm, a.mbr_first_nm ) = 'Y';

——————————–

I would caution against creating too many indexes, as this will likely negatively impact performance of other parts of the system while offering minimal assistance to this SQL statement.  Histograms could be important on the columns MBR_PERSON.DEACTIVATE, MBR_SYSTEM.DEACTIVATE, MBR_PERSON.MBR_LAST_NM, and MBR_PERSON.MBR_FIRST_NM.  The histograms will give Oracle’s cost based optimizer a better picture of the contents of the columns, rather than assuming an even spread of the data values between the min and max for the column.

Keep in mind that Oracle may transform your SQL statement into another equivalent form, and may use constraints and transitive closure to generate additional predicates (think entries in the WHERE) clause for the SQL statement.  You may want to perform timing with alternate SQL syntax:

SELECT a.mbr_dob,
       a.mbr_first_nm,
       a.mbr_gender,
       b.mbr_key,
       a.mbr_last_nm,
       b.mbr_sys_key,
       b.mbr_updated_dt,
       a.mbr_x_fst_nm,
       a.mbr_x_lst_nm,
       a.person_id,
       a.z_person_id
  FROM mbr_person a,
       mbr_system b
 WHERE a.person_id = b.person_id
   AND a.deactivate = 0
   AND b.deactivate = 0
   AND A.DEACTIVATE = B.DEACTIVATE
   AND A.MBR_LAST_NM NOT IN ('DATA','CORRECTION')
   AND A.MBR_FIRST_NM NOT IN ('DATA','CORRECTION');

Indexes on the MBR_PERSON.DEACTIVATE, MBR_SYSTEM.DEACTIVATE columns may be helpful if most of the rows have something other than 0 in those columns.  Indexes on the MBR_PERSON.MBR_LAST_NM and MBR_PERSON.MBR_FIRST_NM columns will likely be of limited use.  An index on MBR_PERSON.PERSON_ID, MBR_SYSTEM.PERSON_ID will likely be very helpful.

My guess is that the above SQL statement will perform a full tablescan on both tables.  The full tablescans may be the most efficient way to retrieve the rows.  However, that will depend greatly on the composition of the data in the two tables.  Oracle may select to perform a hash join between the two tables, so a large HASH_AREA_SIZE may help.

As has been mentioned in this thread, avoid using PL/SQL for something that can be done efficiently in pure SQL.  Context switches (and I suppose on-the-fly compiling of the PL/SQL code if not using Native code compilation) are expensive on some operating system platforms, and may greatly decrease performance.

The best way to test the performance of the various methods requires access to the Oracle database, unless of course you want to dig through 10046 traces at level 8 or 12.

——————————–





SQL – Reformatting to Improve Performance 3

4 12 2009

December 4, 2009

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Note: the Oracle cost-based query optimizer is able to automatically transform queries submitted by applications to produce an equivalent SQL statement that will hopefully improve SQL execution performance. The possible transformations are limited by the Oracle release version, the value of the OPTIMIZER_FEATURES_ENABLE parameter (and various other hidden parameters), the potential presence of NULL values, and hints embedded in the SQL statement.

Some time ago the following question appeared in a forum:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/8661f136bdaefae4

The query is against a view (MY_VIEW below) in the same schema where the procedure lives.  The view selects from another view (EXT_VIEW below) in an external DB via a DBLINK.

Both servers are running Oracle 10g.  The host server is Red Hat Linux.  The external server is Win 2003.  If exact versions become important, I can post them.

The query that works, but slowly, is:

SELECT bv.* FROM MY_VIEW bv
WHERE bv.objectid in
(select linkid from LINKS_TABLE
  where projectid = 116448
    AND blocklist = 'N')
AND bv.GISDATASETNAME = 'XXX' ORDER BY bv.physical_length DESC

However, this takes 12 seconds to return exactly one row from the external database.  Here’s the explain plan:

Operation          Object Name
SELECT STATEMENT Optimizer Mode=ALL_ROWS
 SORT ORDER BY
  NESTED LOOPS SEMI
   REMOTE    .EXT_VIEW   EXTDB.US.OPVANTEK.COM
   TABLE ACCESS BY INDEX ROWID  MY_SCHEMA.LINKS_TABLE
    INDEX UNIQUE SCAN           OPDSCP.PK_PGM 1 0

——————————–

Oracle appears to be using the results of the remote data to drive into the MY_SCHEMA.LINKS_TABLE table using an index on the LINKS_TABLE table.  Thus, Oracle is retrieving all rows from the remote database where GISDATASETNAME = ‘XXX’ and probing the MY_SCHEMA.LINKS_TABLE for matching LINKID values.  You might repeat your explain plan using DBMS_XPLAN with the ALLSTATS LAST parameters to see how it compares.

Is it possible to rewrite the SQL statement into a more efficient form?

First question – can any one tell me a way to force the first query above to use the faster explain plan?  We know there will not be very many linkids returned by the inner query (less than 10).

Having given up on re-writing the first query, I then wrote some PL/SQL to loop over the inner query with a cursor and construct a comma separate list of all the linkids, which I can then use as the IN() clause of my dyanamic sql statement.  That works and runs much faster from a SQL Editor window (we use TOAD).

Let’s take another look at rewriting the original query, something like this:

SELECT /*+ ORDERED */
  BV.*
FROM
  (SELECT DISTINCT
    LINKID
  FROM
    LINKS_TABLE
  WHERE
    PROJECTID = 116448
    AND BLOCKLIST = 'N') L,
  MY_VIEW BV
WHERE
  L.LINKID=BV.OBJECTID
  AND BV.GISDATASETNAME = 'XXX'
ORDER BY
  BV.PHYSICAL_LENGTH DESC;

The above will instruct Oracle to retrieve the 10 or fewer rows from the LINK_TABLE, and use the results to drive into MY_VIEW, which points to the remote database.

How does a DBMS_XPLAN with the ALLSTATS LAST parameters for your original query compare with my rewrite above?

——————————–





SQL – Reformatting to Improve Performance 2

4 12 2009

December 4, 2009

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Note: the Oracle cost-based query optimizer is able to automatically transform queries submitted by applications to produce an equivalent SQL statement that will hopefully improve SQL execution performance. The possible transformations are limited by the Oracle release version, the value of the OPTIMIZER_FEATURES_ENABLE parameter (and various other hidden parameters), the potential presence of NULL values, and hints embedded in the SQL statement.

Some time ago the following question appeared in a forum:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/d4e3f3f275ed894d

The following query is performing poorly in the app. I have tried the scalar subquery approach (using formatted to_char)  with no luck. any help would be appreciated.

SELECT  MP_NO, MP_DESC, MP_ACTIVE AS "FLAG_YES_NO",
(
  ( SELECT COUNT(*)     FROM ST, STINV
                WHERE ST.ST_ID = STINV.ST_ID
                AND ST.ST_POSTED IS NOT NULL
                AND STINV.MP_NO = MPT.MP_NO )
   +
  ( SELECT COUNT(*) FROM PT, PTINV
                  WHERE PT.PT_ID = PTINV.PT_ID
                  AND PT.PT_POSTED IS NOT NULL
                  AND PTINV.MP_NO = MPT.MP_NO  )
)
FROM MPT
WHERE COMP_ID = 1
GROUP BY MP_NO, MP_DESC, MP_ACTIVE

——————————–

It might be interesting to see how the performance and DBMS_XPLAN of your SQL statement compares with the following:

SELECT
  MPT.MP_NO,
  MPT.MP_DESC,
  MPT.MP_ACTIVE "FLAG_YES_NO",
  NVL(S.V1,0) + NVL(P.V2,0)
FROM
  MPT,
  (SELECT
    STINV.MP_NO,
    COUNT(*) V1
  FROM
    ST,
    STINV
  WHERE
    ST.ST_ID = STINV.ST_ID
    AND ST.ST_POSTED IS NOT NULL
  GROUP BY
    STINV.MP_NO) S,
  (SELECT
    PTINV.MP_NO,
    COUNT(*) V2
  FROM
    PT,
    PTINV
  WHERE
    PT.PT_ID = PTINV.PT_ID
    AND PT.PT_POSTED IS NOT NULL
  GROUP BY
    PTINV.MP_NO) P
WHERE
  MPT.COMP_ID=1
  AND MPT.MP_NO=S.MP_NO(+)
  AND MPT.MP_NO=P.MP_NO(+)
GROUP BY
  MPT.MP_NO,
  MPT.MP_DESC,
  MPT.MP_ACTIVE;

If possible, the outer join [ (+) ] should be removed from the SQL statement that I posted above.

Is MPT.COMP_ID a column defined as NUMBER?

——————————–





SQL – Reformatting to Improve Performance 1

4 12 2009

December 4, 2009

(Forward to the Next Post in the Series)

Note: the Oracle cost-based query optimizer is able to automatically transform queries submitted by applications to produce an equivalent SQL statement that will hopefully improve SQL execution performance. The possible transformations are limited by the Oracle release version, the value of the OPTIMIZER_FEATURES_ENABLE parameter (and various other hidden parameters), the potential presence of NULL values, and hints embedded in the SQL statement.

Some time ago the following question appeared in a forum:
http://groups.google.com/group/oracle_dba_experts/browse_thread/thread/9b81c1b5f729813c

There are three tables A B C with 18 Million, 6 M and 8M records respectivly.

select count(1) from
( Select * from A
where
not exists ( select 'x' from B where join on Keys )
and not exists ( select 'x' from C where join on keys )
)

The explain plan shows that it is using all key indexes for all tables and cost is 7366 for table A Index full scan. And other tables have a index Range scan 3 and 4. What could be a reason for this discrepency.

We are using oracle 9i , Linux operating system. For confidentiality I have renamed the tables

Query:

select /*+ parallel */
*
from A    where not exists
    (select /*+ parallel */ 'X'
             from b
            where stud_id = A.stud_id
              and cpnt_typ_id = A.cpnt_typ_id
              and cpnt_id = A.cpnt_id
              and rev_dte = A.rev_dte
              and cmpl_stat_id = A.cmpl_stat_id
              and compl_dte = a.compl_dte)
      and not exists
    (select /*+ parallel */ 'X'
             from c
            where stud_id = A.stud_id
              and cpnt_typ_id =A.cpnt_typ_id
              and cpnt_id = A.cpnt_id
              and rev_dte = A.rev_dte
              and cmpl_stat_id = A.cmpl_stat_id
              and compl_dte = A.compl_dte)

Plan :

SELECT STATEMENT, GOAL = CHOOSE           Cost=7400  Cardinality=    1   Bytes=     62   IO cost=7400
 SORT AGGREGATE                                      Cardinality=    1   Bytes=     62
  FILTER
   INDEX FAST FULL SCAN         PK_A      Cost=7393  Cardinality=46754   Bytes=2898748   IO cost=7393
   INDEX RANGE SCAN             IDX_B_1   Cost=   3  Cardinality=    1   Bytes=     58   IO cost=   3
   TABLE ACCESS BY INDEX ROWID  PA_C      Cost=   4  Cardinality=    1   Bytes=     60   IO cost=   4
    INDEX RANGE SCAN    Object  IX_C      Cost=   3  Cardinality=    1                   IO cost=   3

With the information provided, the best that I was able to offer is the following, with a manually transformed version of the SQL statement from the NOT EXISTS syntax into inline views:

——————————–

The Oracle version in part determines how Oracle will optimize and execute the SQL statement that you submit.  Certain transformations are performed automatically in 9i for NOT EXISTS queries, regardless of whether or not the cost will increase due to the transformation.  The cardinality numbers reported in the explain plan (a DBMS Xplan would be better) do not appear to be consistent with three tables having 18 million, 6 million and 8 million rows.  When collecting statistics, make certain that the DBMS_STATS routine is called with CASCADE=>TRUE specified.

Jonathan Lewis has commented on his blog a couple times about using PARALLEL, which only resulted in even longer execution times.  You might try reformatting the SQL statement into an equivalent form, and removing the PARALLEL hints.  Something like this:

SELECT
  A.*
FROM
  A,
  (SELECT DISTINCT
    STUD_ID,
    CPNT_TYP_ID,
    CPNT_ID,
    REV_DTE,
    CMPL_STAT_ID,
    COMPL_DTE
  FROM
    B) B,
  (SELECT DISTINCT
    STUD_ID,
    CPNT_TYP_ID,
    CPNT_ID,
    REV_DTE,
    CMPL_STAT_ID,
    COMPL_DTE
  FROM
    C) C
WHERE
  A.STUD_ID = B.STUD_ID(+)
  AND A.CPNT_TYP_ID = B.CPNT_TYP_ID(+)
  AND A.CPNT_ID = B.CPNT_ID(+)
  AND A.REV_DTE = B.REV_DTE(+)
  AND A.CMPL_STAT_ID = B.CMPL_STAT_ID(+)
  AND A.COMPL_DTE = B.COMPL_DTE(+)
  AND B.STUD_ID IS NULL
  AND A.STUD_ID = C.STUD_ID(+)
  AND A.CPNT_TYP_ID = C.CPNT_TYP_ID(+)
  AND A.CPNT_ID = C.CPNT_ID(+)
  AND A.REV_DTE = C.REV_DTE(+)
  AND A.CMPL_STAT_ID = C.CMPL_STAT_ID(+)
  AND A.COMPL_DTE = C.COMPL_DTE(+)
  AND C.STUD_ID IS NULL;

Compare the performance of the above with the performance of your original SQL statement using a 10046 trace at level 8.  The above will likely result in 2 sorting operations in 9i, so the SORT_AREA_SIZE parameter may need to be adjusted.

——————————–

Any other ideas?








Follow

Get every new post delivered to your Inbox.

Join 144 other followers