December 4, 2009
(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)
Note: the Oracle cost-based query optimizer is able to automatically transform queries submitted by applications to produce an equivalent SQL statement that will hopefully improve SQL execution performance. The possible transformations are limited by the Oracle release version, the value of the OPTIMIZER_FEATURES_ENABLE parameter (and various other hidden parameters), the potential presence of NULL values, and hints embedded in the SQL statement.
Some time ago the following question appeared in a forum:
http://groups.google.com/group/comp.databases.oracle.server/browse_thread/thread/25ee58d1a12e1fd1
(The thread also contains an interesting discussion about the effects of over-indexing tables.)
I have the following two select statements in some code I’ve been looking at. I don’t have access to a computer with Oracle on it currently so I can’t determine this myself.
Both selects do the same thing. The function in the second version simply checks the condition what is in the first and send back a ‘Y’ or ‘N’ whereas the first has the check in the where clause itself.
My question is this: is any one of the two selects more efficient (or less efficient) than the other?
SELECT a.mbr_dob, a.mbr_first_nm, a.mbr_gender, b.mbr_key, a.mbr_last_nm, b.mbr_sys_key, b.mbr_updated_dt, a.mbr_x_fst_nm, a.mbr_x_lst_nm, a.person_id, a.z_person_id FROM mbr_person a, mbr_system b WHERE a.person_id = b.person_id AND a.deactivate = 0 AND b.deactivate = 0 AND ( a.mbr_last_nm NOT IN ( 'DATA','CORRECTION' ) AND a.mbr_first_nm NOT IN ( 'DATA','CORRECTION' ) );
SELECT a.mbr_dob, a.mbr_first_nm, a.mbr_gender, b.mbr_key, a.mbr_last_nm, b.mbr_sys_key, b.mbr_updated_dt, a.mbr_x_fst_nm, a.mbr_x_lst_nm, a.person_id, a.z_person_id FROM mbr_person a, mbr_system b WHERE a.person_id = b.person_id AND a.deactivate = 0 AND b.deactivate = 0 AND not_data_correction( a.mbr_last_nm, a.mbr_first_nm ) = 'Y';
——————————–
I would caution against creating too many indexes, as this will likely negatively impact performance of other parts of the system while offering minimal assistance to this SQL statement. Histograms could be important on the columns MBR_PERSON.DEACTIVATE, MBR_SYSTEM.DEACTIVATE, MBR_PERSON.MBR_LAST_NM, and MBR_PERSON.MBR_FIRST_NM. The histograms will give Oracle’s cost based optimizer a better picture of the contents of the columns, rather than assuming an even spread of the data values between the min and max for the column.
Keep in mind that Oracle may transform your SQL statement into another equivalent form, and may use constraints and transitive closure to generate additional predicates (think entries in the WHERE) clause for the SQL statement. You may want to perform timing with alternate SQL syntax:
SELECT a.mbr_dob, a.mbr_first_nm, a.mbr_gender, b.mbr_key, a.mbr_last_nm, b.mbr_sys_key, b.mbr_updated_dt, a.mbr_x_fst_nm, a.mbr_x_lst_nm, a.person_id, a.z_person_id FROM mbr_person a, mbr_system b WHERE a.person_id = b.person_id AND a.deactivate = 0 AND b.deactivate = 0 AND A.DEACTIVATE = B.DEACTIVATE AND A.MBR_LAST_NM NOT IN ('DATA','CORRECTION') AND A.MBR_FIRST_NM NOT IN ('DATA','CORRECTION');
Indexes on the MBR_PERSON.DEACTIVATE, MBR_SYSTEM.DEACTIVATE columns may be helpful if most of the rows have something other than 0 in those columns. Indexes on the MBR_PERSON.MBR_LAST_NM and MBR_PERSON.MBR_FIRST_NM columns will likely be of limited use. An index on MBR_PERSON.PERSON_ID, MBR_SYSTEM.PERSON_ID will likely be very helpful.
My guess is that the above SQL statement will perform a full tablescan on both tables. The full tablescans may be the most efficient way to retrieve the rows. However, that will depend greatly on the composition of the data in the two tables. Oracle may select to perform a hash join between the two tables, so a large HASH_AREA_SIZE may help.
As has been mentioned in this thread, avoid using PL/SQL for something that can be done efficiently in pure SQL. Context switches (and I suppose on-the-fly compiling of the PL/SQL code if not using Native code compilation) are expensive on some operating system platforms, and may greatly decrease performance.
The best way to test the performance of the various methods requires access to the Oracle database, unless of course you want to dig through 10046 traces at level 8 or 12.
——————————–
Leave a Reply