<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: The Effects of Potential NULL Values in Row Sources During Updates Using an IN Subquery</title>
	<atom:link href="http://hoopercharles.wordpress.com/2010/01/10/the-effects-of-potential-null-values-in-row-sources-during-updates-using-an-in-subquery/feed/" rel="self" type="application/rss+xml" />
	<link>http://hoopercharles.wordpress.com/2010/01/10/the-effects-of-potential-null-values-in-row-sources-during-updates-using-an-in-subquery/</link>
	<description>Miscellaneous Random Oracle Topics: Stop, Think, ... Understand</description>
	<lastBuildDate>Thu, 23 May 2013 04:02:42 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Charles Hooper</title>
		<link>http://hoopercharles.wordpress.com/2010/01/10/the-effects-of-potential-null-values-in-row-sources-during-updates-using-an-in-subquery/#comment-162</link>
		<dc:creator><![CDATA[Charles Hooper]]></dc:creator>
		<pubDate>Sun, 10 Jan 2010 22:37:12 +0000</pubDate>
		<guid isPermaLink="false">http://hoopercharles.wordpress.com/?p=775#comment-162</guid>
		<description><![CDATA[I have not spent a lot of time analyzing why the UPDATE statement completed faster and with fewer consistent gets when the &quot;HASH JOIN RIGHT ANTI NA&quot; operation appeared in the execution plan, compared to the &quot;HASH JOIN RIGHT ANTI&quot; operation with either the NOT NULL constraint on the table, or specified in the WHERE clause (I mentioned it hoping that someone would be curious, and try to test to see why it happened).

Keep in mind that this is just one test case - you may find that if you change the order of the UPDATE statements so that the test statement with the IS NOT NULL predicate specified in the WHERE clause is executed first, that it then becomes the most efficient method (it could be that the previous rolled back UPDATEs changed the high watermark of the tables).  It could be that if I ran the first set of tests on the other server (the one that performed 212,520,861 consistent or current mode gets in 4,853.41, rather than the one that only performed 114,905,284 consistent or current mode gets in 9,763.26 seconds) that the results would be about even or in favor of the IS NOT NULL constraints.  It could be that if I had not changed the STATISTICS_LEVEL from TYPICAL to ALL that the elapsed time might have been identical.  What is great about well constructed test cases is that anyone can take the test case, run it on their system, make small modifications to the test case, and see how the performance changes.

Keep in mind that this test case only looked at the NOT IN type of subquery for an UPDATE statement (not that in one case the optimizer changed it to NOT EXISTS).  The results could be completely different for DELETE or SELECT statements.  As Greg Rahn confirmed, the potential for NULL values limits the options of the cost-based optimizer to find more efficient execution plans (and that can impact the performance).  You might ask, what if one (or both) of the columns were indexed - how much that impact the test, since by default NULL values are not included in single column B*Tree indexes (and there is the potential for NULLs)?  With the above test case, you could make a couple modifications, maybe add a more restrictive WHERE clause, to see how the potential for NULL values affects the optimizer.

Then, of course, there is always the risk that someone will set OPTIMIZER_FEATURES_ENABLE to a value less than 11.1.0.6, and suddenly, rather than executing in 4 minutes, it maxes out one of the CPUs for 3+ hours, and is not event 10% finished when the query is forced to be killed.

I would suggest that if a column should never include NULL values, that it should have a constraint that prevents NULL values from being added in the column.  Doing so will help the query optimizer, and reduce the need to add seemingly unneeded IS NOT NULL predicates to the WHERE clause.

Anyone else have an opinion?]]></description>
		<content:encoded><![CDATA[<p>I have not spent a lot of time analyzing why the UPDATE statement completed faster and with fewer consistent gets when the &#8220;HASH JOIN RIGHT ANTI NA&#8221; operation appeared in the execution plan, compared to the &#8220;HASH JOIN RIGHT ANTI&#8221; operation with either the NOT NULL constraint on the table, or specified in the WHERE clause (I mentioned it hoping that someone would be curious, and try to test to see why it happened).</p>
<p>Keep in mind that this is just one test case &#8211; you may find that if you change the order of the UPDATE statements so that the test statement with the IS NOT NULL predicate specified in the WHERE clause is executed first, that it then becomes the most efficient method (it could be that the previous rolled back UPDATEs changed the high watermark of the tables).  It could be that if I ran the first set of tests on the other server (the one that performed 212,520,861 consistent or current mode gets in 4,853.41, rather than the one that only performed 114,905,284 consistent or current mode gets in 9,763.26 seconds) that the results would be about even or in favor of the IS NOT NULL constraints.  It could be that if I had not changed the STATISTICS_LEVEL from TYPICAL to ALL that the elapsed time might have been identical.  What is great about well constructed test cases is that anyone can take the test case, run it on their system, make small modifications to the test case, and see how the performance changes.</p>
<p>Keep in mind that this test case only looked at the NOT IN type of subquery for an UPDATE statement (not that in one case the optimizer changed it to NOT EXISTS).  The results could be completely different for DELETE or SELECT statements.  As Greg Rahn confirmed, the potential for NULL values limits the options of the cost-based optimizer to find more efficient execution plans (and that can impact the performance).  You might ask, what if one (or both) of the columns were indexed &#8211; how much that impact the test, since by default NULL values are not included in single column B*Tree indexes (and there is the potential for NULLs)?  With the above test case, you could make a couple modifications, maybe add a more restrictive WHERE clause, to see how the potential for NULL values affects the optimizer.</p>
<p>Then, of course, there is always the risk that someone will set OPTIMIZER_FEATURES_ENABLE to a value less than 11.1.0.6, and suddenly, rather than executing in 4 minutes, it maxes out one of the CPUs for 3+ hours, and is not event 10% finished when the query is forced to be killed.</p>
<p>I would suggest that if a column should never include NULL values, that it should have a constraint that prevents NULL values from being added in the column.  Doing so will help the query optimizer, and reduce the need to add seemingly unneeded IS NOT NULL predicates to the WHERE clause.</p>
<p>Anyone else have an opinion?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Narendra</title>
		<link>http://hoopercharles.wordpress.com/2010/01/10/the-effects-of-potential-null-values-in-row-sources-during-updates-using-an-in-subquery/#comment-161</link>
		<dc:creator><![CDATA[Narendra]]></dc:creator>
		<pubDate>Sun, 10 Jan 2010 19:15:04 +0000</pubDate>
		<guid isPermaLink="false">http://hoopercharles.wordpress.com/?p=775#comment-161</guid>
		<description><![CDATA[Charles,

Nice one. Thanks.
But I am actually bit confused with the results in 11.1.0.7.
Why were the number of consistent gets more when we did &quot;right&quot; thing (i.e. either to use IS NOT NULL predicate or define a NOT NULL constraint) than those when we did not bother to do the &quot;right&quot; thing (i.e. no NOT NULL constraint or IS NOT NULL condition) ? Does this mean from 11g onwards, it is better to not use NOT NULL constraint or IS NOT NULL predicate ?]]></description>
		<content:encoded><![CDATA[<p>Charles,</p>
<p>Nice one. Thanks.<br />
But I am actually bit confused with the results in 11.1.0.7.<br />
Why were the number of consistent gets more when we did &#8220;right&#8221; thing (i.e. either to use IS NOT NULL predicate or define a NOT NULL constraint) than those when we did not bother to do the &#8220;right&#8221; thing (i.e. no NOT NULL constraint or IS NOT NULL condition) ? Does this mean from 11g onwards, it is better to not use NOT NULL constraint or IS NOT NULL predicate ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://hoopercharles.wordpress.com/2010/01/10/the-effects-of-potential-null-values-in-row-sources-during-updates-using-an-in-subquery/#comment-160</link>
		<dc:creator><![CDATA[Greg Rahn]]></dc:creator>
		<pubDate>Sun, 10 Jan 2010 16:59:18 +0000</pubDate>
		<guid isPermaLink="false">http://hoopercharles.wordpress.com/?p=775#comment-160</guid>
		<description><![CDATA[&lt;blockquote&gt;
My last comment in the post suggested that the problem might be that even though there might not be any NULL values in the columns, the column definitions might permit NULL values, and that alone might restrict the options that are available to the optimizer for re-writing the SQL statement into a more efficient form.
&lt;/blockquote&gt;

This is true.  If columns can be determined to be NOT NULL there are more query transformations that can take place.  When using NOT IN() it leaves NULL to be a possibility unless the column is NOT NULL or a AND NOT NULL predicate is used.  Execution times will generally be worse if the NOT NULL predicate can not be applied (implicitly or explicitly) and no NULL values exist.  I&#039;ve seen this countless times.]]></description>
		<content:encoded><![CDATA[<blockquote><p>
My last comment in the post suggested that the problem might be that even though there might not be any NULL values in the columns, the column definitions might permit NULL values, and that alone might restrict the options that are available to the optimizer for re-writing the SQL statement into a more efficient form.
</p></blockquote>
<p>This is true.  If columns can be determined to be NOT NULL there are more query transformations that can take place.  When using NOT IN() it leaves NULL to be a possibility unless the column is NOT NULL or a AND NOT NULL predicate is used.  Execution times will generally be worse if the NOT NULL predicate can not be applied (implicitly or explicitly) and no NULL values exist.  I&#8217;ve seen this countless times.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
