Unindexed Foreign Keys – What is Wrong with this Quote?

5 09 2011

September 5, 2011

I often look for easier ways to accomplish time consuming tasks, and I suspect that is part of the reason why I buy so many computer books.  In Recipe 2-5 (page 60) of the “Oracle Database 11g Performance Tuning Recipes” book there is a SQL statement to determine which database columns (owned by the user) have foreign key constraints without indexes.  The same SQL statement is also found on page 243 of the book Oracle SQL Recipes: A Problem-Solution Approach.  What caught my attention is the length of the SQL statement – it is considerably shorter than the SQL statement I had been using, so that should certainly reduce the amount of unnecessary typing (and free up a couple of bytes in the library cache).  The SQL statement as it appears in these two books:

select
  a.constraint_name cons_name
 ,a.table_name tab_name
 ,b.column_name cons_column
 ,nvl(c.column_name,'***No Index***') ind_column
from user_constraints a
     join
     user_cons_columns b on a.constraint_name = b.constraint_name
     left outer join
     user_ind_columns c on b.column_name = c.column_name
                       and b.table_name = c.table_name
where constraint_type = 'R'
order by 2,1;

Using A, B, and C for table aliases… that must be what is wrong with the above SQL statement.

What, if anything, is wrong with the above quote from the book?

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true.  It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past).  If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.

In case you are wondering, the following is the SQL statement that I had been using to locate unindexed foreign key columns (this SQL statement is based on a SQL statement that was found in the Expert Oracle Database Architecture book).

SELECT
  DECODE(B.TABLE_NAME, NULL, '*Check*', 'OK' ) STATUS,
  A.OWNER,
  A.TABLE_NAME,
  A.COLUMNS,
  B.COLUMNS INDEX_COLUMNS
FROM
  (SELECT
    A.OWNER,
    SUBSTR(A.TABLE_NAME,1,30) TABLE_NAME,
    SUBSTR(A.CONSTRAINT_NAME,1,30) CONSTRAINT_NAME,
    MAX(DECODE(POSITION, 1,     SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION, 2,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION, 3,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION, 4,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION, 5,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION, 6,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION, 7,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION, 8,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION, 9,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION,10,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION,11,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION,12,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION,13,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION,14,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION,15,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) ||
    MAX(DECODE(POSITION,16,', '||SUBSTR(COLUMN_NAME,1,30),NULL)) COLUMNS
  FROM
    DBA_CONS_COLUMNS A,
    DBA_CONSTRAINTS B
  WHERE
    A.CONSTRAINT_NAME = B.CONSTRAINT_NAME
    AND A.OWNER=B.OWNER
    AND B.CONSTRAINT_TYPE = 'R'
  GROUP BY
    A.OWNER,
    SUBSTR(A.TABLE_NAME,1,30),
    SUBSTR(A.CONSTRAINT_NAME,1,30) ) A,
    (SELECT
      TABLE_OWNER,
      SUBSTR(TABLE_NAME,1,30) TABLE_NAME,
      SUBSTR(INDEX_NAME,1,30) INDEX_NAME,
      MAX(DECODE(COLUMN_POSITION, 1,
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION, 2,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION, 3,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION, 4,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION, 5,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION, 6,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION, 7,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION, 8,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION, 9,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION,10,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION,11,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION,12,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION,13,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION,14,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION,15,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) ||
      MAX(DECODE(COLUMN_POSITION,16,', '||
      SUBSTR(COLUMN_NAME,1,30),NULL)) COLUMNS
    FROM
      DBA_IND_COLUMNS
    GROUP BY
      TABLE_OWNER,
      SUBSTR(TABLE_NAME,1,30),
      SUBSTR(INDEX_NAME,1,30)) B
    WHERE
      A.TABLE_NAME = B.TABLE_NAME (+)
      AND A.OWNER=B.TABLE_OWNER(+)
      AND B.COLUMNS (+) LIKE A.COLUMNS || '%'
ORDER BY
  A.OWNER,
  A.TABLE_NAME;

Are you feeling that it might be an ANSI problem with the book’s SQL statement?  I certainly would prefer to use the shorter version of the SQL statement – I have an aversion to unnecessary typing.

While on the subject of creating indexes on foreign key columns, are those indexes created just to reduce the chance of deadlocks, or is it done primarily for SELECT query performance as described in the book (Google Books view)?  think about this question a bit.


Actions

Information

23 responses

5 09 2011
Kim Berg Hansen

A few points where the long SQL is better than the short:

1) The long SQL uses DBA_* rather than USER_*, so you do not have to run it once for each schema.
2) The long SQL can handle constraints with multiple columns.
3) The long SQL tests if the indexed column(s) is at the leading edge of the index.

The long SQL could also have been written with ANSI joins – no difference.
The short SQL is simply an approximate answer that in many cases may be sufficient, the long SQL more accurately pinpoints the unindexed FK’s.

Just my 2 cents 🙂

7 09 2011
Charles Hooper

Kim,

Well stated summary – a great way to start the discussion.

Considering that the DBA_ and USER_ views are typically a bit complex with Oracle specific join syntax, and there have in the past been issues (bugs) with various ANSI joins, especially when combined with Oracle specific join syntax, I wonder if there is any risk in using ANSI style joins with some of the various built-in views? I am thinkingh something along the lines of the following comment, where in this case a SQL statement would not run without a hint being specified:
https://hoopercharles.wordpress.com/2010/12/01/an-invalid-or-do-you-just-not-want-to-work/#comment-2538

5 09 2011
Dom Brooks

The quoted statement checks for the presence of the column in any index, any position.
So when you consider a composite index and the constrained column/s not at the beginning of the index, this might be a problem as it might claim that an FK is indexed when that index cannot be used to resolve the FK issue.

In addition, the structure of the first query is such that if the constrained columns are indexed more than once then you get multiple rows. Not itself really a problem but not exactly the best approach either.

Constraints can go across schemas of course so another slight issue.

From the example chapter, it sounds like the advice was aimed solely for query performance.

But the potential flaw mentioned applies regardless of whether it’s used to identify deadlock threats for simple query performance purposes.

I’ve not read the book or even the whole of the linked sample (yet) but you could also argue that there’s no caveat about the downside/overhead of always indexing foreign keys unnecessarily. Maybe it should warrant the briefest of mentions. The flip side of that is that you think of mentioning one little extra aspect and you risk opening pandora’s box.

On the other hand, the sql statement that you use checks for columns indexed in the same order as the constraint.
Now, whilst this might unnecessarily flag a foreign key constraint in the order of (A,B) indexed as (B,A), it is a better false positive.

In other words, better to flag something unnecessarily as needing checked than give a green light when there’s danger ahead.

Then again, database constraints are so not en vogue that if you’re lucky enough to have some…..
🙂

6 09 2011
Dom Brooks

> whether it’s used to identify deadlock threats for simple query performance purposes
Meant to say “whether it’s used to identify deadlock threats OR for simple query performance purposes”

7 09 2011
Charles Hooper

Dom,

Well stated.

I too noticed the duplicate rows in the output, but had not yet determined if the duplicates were caused by multi-column foreign keys or something else. I still have not spent the time working it out, but you could very well be right that the duplicates are caused by having multiple indexes on the columns. Nice catch regarding cross-schema contraints. If I recall correctly, the book did strongly imply (or state) that ALL foreign key columns should be indexed – I agree with your comment.

Thanks for reminding me about the problem with the SQL statement that I use – changed column order is of course a potential issue with the SQL statement. I suspect that the order of the columns in the index could be changed for better index compression.

The book did stress that constraints are important in a database, so that is a definite positive in the book.

5 09 2011
Mohamed Houri

Dom,

I didn’t clearly understand this
“On the other hand, the sql statement that you use checks for columns indexed in the same order as the constraint.
Now, whilst this might unnecessarily flag a foreign key constraint in the order of (A,B) indexed as (B,A), it is a better false positive”

In order to avoid the deadlock threat that might be introduced by an unindexed FK in an OLTP application, it is not necessary to have an index that match exactly the constraint. We need only to have the FK index starting with the constraint columns in any order. In other words if the FK constraint has been defined as (a,b) its index can be safely defined as (b,a) or (a,b) it doesn’t matter.

If this is what you mean then sorry for my misunderstanding

Best regards

Mohamed Houri

6 09 2011
Dom Brooks

Hi Mohammed,

> In other words if the FK constraint has been defined as (a,b) its index can be safely defined as (b,a) or (a,b) it doesn’t matter
This is exactly what I was trying to say, apologies for any lack of clarity.

I was just saying that the second script which Charles uses will say that it thinks an index is missing when in fact it’s ok (because the script works by concatenating the columns by position and then doing a like with a trailing ‘%’).

I don’t see that as a problem. I prefer scripts which say there may be a problem when there’s not compared to scripts which say everything’s ok when it isn’t.

Hopefully that’s clearer?
In summary, I agree.

Cheers,
Dominic

5 09 2011
Mohamed Houri

Charles,
Dom has nicely summarized what one has to point about the book FK script and about the performance reason the author are emphasizing.
Because the author’s reason of indexing FK is related to performance reason I have nothing to add. If it was related to deadlock then I would have pointed out few things like only b-tree index can protect us from the unindexed FK threat (bitmap and function based index will not protect us)

By the way, I have a simple SQL script that I use in order to avoid creating a duplicate index to cover the FK deadlock threat. It checks before creating this type of index if there is already an existing index starting with the FK columns, in any order, but starting with the FK constraint columns

Best regards

7 09 2011
Charles Hooper

Mohamed,

Thank you for the reminder that bitmap indexes and function based indexes will not protect against TM enqueues on the child table – that suggests that the script should also verify that the index is in fact a b*tree index and not a bitmap index.

If you have your script posted somewhere, please provide a link to the script. Or, if you would like, please post the script here. If you post the script here, please use a <pre> tag before the script and a </pre> tag after the script to retain the spacing in the script (or use one of the other common code tags).

19 01 2012
hourim

Hi Charles,

It is true that a bitmap index will not protect against TM enqueues on the child table. And how would it be the case when we know that bitmap indexes themselves generate deadlocks situation in an OLTP application. However, I was not completely true to say that Function based indexes do not protect against this kind of TM enqueue. They in fact do; they just need to start with the Foreign key column (or columns in case of a composite key). A very simple example can be found here
http://hourim.wordpress.com/2012/01/19/bitmap-indexes-and-foreign-key/
Best Regards

21 01 2012
Charles Hooper

Hi Mohamed,

Thanks for posting the link to your test case. It has been a couple of months since you posted the comment: “… function based index will not protect us”. If I remember correctly, I considered testing your statement, but instead decided that your comment was addressing the situation where the foreign key column was enclosed in a function within the index definition. I did wonder a bit what would happen if the function based index were a composite index.

5 09 2011
Franck Pachot

Hi,

About checking unindexed foreign keys, I’ve encountered situations that cannot be found by any of those queries.
Fore example a ‘delete on cascade set null’ can put a TM lock on the child even if the key is not concerned (see http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:292016138754#2983641500346763748 for an exemple)

Recently, I used a less theorical approach:
– set event 10704 “Print out information about what enqueues are being obtained”
– execute a ‘delete from … where null is not null; rollback;’ for each tables
– grep the 10704 trace to get all the TM locks with mode >= 4
– get the table name from the hexadecimal object_id in the 10704 trace

Then I have a matrix showing which tables may lock other ones, blocking on any DML, when deleting from them. Even when it is not expected from theory/documentation. And included any bug/feature for your version.

With that matrix, taking them one by one, I can choose between:
– Indexing the foreign key because, anyway, given the high selectivity, an index will prevent a non optimal full scan on the child to check if there is no child.
– Forbid deletes on the parent: if delete is needed, then we can flag it to be deleted later with an offline batch.
– Index for locking reasons only. But those cases are rare: usually, either you have a small lookup table that should not have deletes during online operations, or you have a master-detail table where an index is needed for access performance reason.

Note that it does not concern only deletes on the parent, nor updates on its key, but I’ve seen some merge operations that acquires a TM-Share lock as well even if it is never deleted. On delete cascade does cascade the lock as well even if there is no reason for that. (what we call a bug, but MOS may call it a feature or an expected behaviour…)

Regards,
Franck.

7 09 2011
Charles Hooper

Franck,

Thank you for the extensive comment post.

The AskTom thread you linked to is very interesting, and very long. The following is a list of some of the items mentioned in that thread:
* The original version of Tom Kyte’s foreign key indexing script
* Problems with multi-column foreign keys and NULL values in one of the columns
* Problems with CURSOR_SHARING = SIMILAR or FORCE: ORA-01467: sort key too long (another one to add to the list in the article https://hoopercharles.wordpress.com/2011/07/03/the-cursor_sharing-parameter-is-a-silver-bullet-what-is-wrong-with-this-quote/ )
* Oracle Forms by default updates all columns, including primary key columns (this is mentioned in a couple of books, as I recall)
* First hint of table locking change due to unindexed foreign key columns in 11g in July 2009 (that was the same month when Randolf Geist and I wrote the section in the OakTable book about the locking change – we tripped over it by accident)
* Tom Kyte’s blog article about unindexed foreign keys and his script: http://tkyte.blogspot.com/2009/10/httpasktomoraclecomtkyteunindex.html
* Mohamed Houri’s script:
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:292016138754#3002983200346796154

I also found your post “Unindexed foreign key Share lock without delete/update on the parent” which was added “February 10, 2011 – 10am Central time zone”
I probably misunderstood Tom Kyte’s follow up comment to you considering what he later stated about the bug. It seemed that he was stating that you need to index the foreign key columns in the grand-child table. You then mentioned Metalink (MOS) Bug 12313173, which currently has a status of “32 – Not a Bug. To Filer”. Interesting situation – do you see the same results on 10g and 11g?

Setting event 10704 – I had not heard of that before (or I simply forgot). A web search helped me determine the purpose of that event:
http://www.juliandyke.com/Diagnostics/Events/EventReference.html#10704

5 09 2011
Charles Hooper

I must say that I am impressed so far with the comments attached to this blog article. The comments are very positive: this is what is good about the section from the book, this is what is wrong (or limiting), and this is what could be done to further enhance the contents of the book. Some of the comments have even taken the discussion started by the book quote well beyond the scope of the book – and that too is a desired outcome of these types of articles.

I will be posting my comments in roughly 14 hours. Something to consider: this is a book about performance. Is it a performance problem if one (or more) session is prone to wait on TM enqueues? Something else to consider, which was mentioned by Franck: if you are indexing foreign key columns for SELECT query performance reasons because those columns are often found in the WHERE clause when joining to the table with the Unique or Primary key constraint (the parent table), how much help is the index on the foreign key columns (consider the potential CLUSTERING_FACTOR and index cardinality)?

6 09 2011
Charles Hooper

The following are my book review notes for recipe 2-5, as it will appear in the final book review (the following was put together immediately before posting this article, so I missed some of the finer points highlighted in the other comments):

Recipe 2-5 includes a SQL statement that attempts to identify unindexed foreign key columns. This same SQL statement appears on page 243 of the book “Oracle SQL Recipes: A Problem-Solution Approach”. While the SQL statement only works for the current user’s schema, and A, B, and C generic aliases are used for table aliases, there are more significant problems with the SQL statement and the recipe as a whole. The recipe gives the impression that the primary reason for indexes on the foreign key columns is to improve performance – specifically the performance of SELECT statements where the foreign key columns are included in the WHERE clause. When you consider that the tables with the foreign key columns are likely detail (child) tables, there is a bit of a threat that each of the unique values found in the foreign key columns will be repeated many times through the various rows (scattered among the various rows, thus the CLUSTERING_FACTOR for the index might be close to the number of rows), making it a bit less likely that an index on the foreign key columns will help performance significantly (unless an [bitmap] index join is possible). The more common performance reason for indexing foreign key columns is to reduce the threat of TM enqueue contention on the child table when rows are updated or deleted in the parent table – this reason was not mentioned in this recipe. The recipe also gives the impression that all foreign keys should be indexed, while it is primarily those cases where the unique/primary column(s) in the parent table is subject to change that will benefit the most from indexing the foreign key columns. The problem with the SQL statement found in the recipe appears when concatenated (multi-column) indexes reference the foreign key columns. The SQL statement found in the book does not consider the foreign key column’s position in the concatenated index. While the optimizer could select to use an index skip scan (consider what might happen if the foreign key column is the sixth column in a concatenated index) for a SELECT statement that does not reference the leading columns of the index in the WHERE clause, the same is not true for TM enqueues. Thus, the SQL statement misidentifies cases where the foreign key columns should be indexed, but are not. A DISTINCT clause may be necessary to eliminate duplicate rows returned by the query (pages 59-60).

6 09 2011
Houri Mohamed

Charles,

As far as we are speaking about Foreign key indexes particularly for composite foreign keys there is a fairly chance that those FK columns will be repetitives. And as far as the index FK should only start with the relevant columns in any order, it is worth to put the most repetitive column as the leading index column and use the compress option to compress the index and to reduce its size and hence to reduce the amount of I/O on this index. We will be protected against the deadlock threat and we will certainly enhance the desirability of that index by the CBO

Best Regards

6 09 2011
Houri Mohamed

Charles,

I am sorry, just one another remark. Suppose that we have a FK constraint on column (cust_id) and we have an index on (cust_id, prd_id). Sure that for deadlock reason we don’t need to create an extra index on (cust_id) we have the index on (cust_id, prd_id) for that. However for performance reason, it might be possible that the Clustering Factor of index (cust_id, prd_id) is not as good as the Clustering Factor of an index on (cust_id) and it might be also possible that the CBO will not choose the index (cust_id, prd_id) when joinging parent to child. One has to take this in consideration

Best Regards

8 09 2011
Timo Raitalaakso

I have written a check query that avoids false negatives in case of constraint (A,B) indexed (B,A). Also a better performing query available. It uses 11.2 features.

9 09 2011
Log Buffer #237, A Carnival of the Vanities for DBAs | The Pythian Blog

[…] Foreign Keys – What is Wrong with this Quote? Charles Hooper enlightens […]

16 07 2012
Virtual column as a foreign key «

[…] a comment of Charles ‘Hooper blog article I wrote (a) first that a function based index cannot cover the deadlock threat induced […]

25 09 2012
Damir Vadas

Just once interesting point.

The whole Oracle EBS, has no a single foreign key. All is handled with app logic.

And the best of all … have seen EBS with 1000 connections with no few locking events-amazing and maybe the best point where to go when you want to build enterprise app for thousand of concurrent connections.

1 10 2012
Charles Hooper

Damir,

Interesting that Oracle EBS does not use declared foreign keys. It would seem that this is not a desirable practice from a data integrity point of view.

1 10 2012
Damir Vadas

Yes but no locking problem on thousand of connections..
🙂

Leave a reply to Dom Brooks Cancel reply