True or False – Hash Joins

12 05 2010

May 12, 2010

(Back to the Previous Post in the Series) (Forward to the Next Post in the Series)

Continuing the true or false blog article series with more questions that seem to have conflicting answers.  Today’s quiz is on the topic of the hash joins, inspired by a recent thread that I found in a discussion forum.

When answering the questions, answer True or False, followed by a justification for the answer that you provided.

The Articles:

The Quiz:

1.  A hash join is ideal when joining a small table to a large table, and in such cases a hash join is typically faster than a nested loops join or a sort-merge join.

2.  When presented with very large tables in a production database, the Oracle query optimizer will attempt to avoid using hash joins unless the PGA_AGGREGATE_TARGET or HASH_AREA_SIZE (if PGA_AGGREGATE_TARGET is not used) is increased to a large value.

3.  When a PGA_AGGREGATE_TARGET is in use, one session cannot use all of the memory specified by PGA_AGGREGATE_TARGET nor all of the available memory in the server, whichever is the lower value.

4.  A hash join may use only 200MB of memory, or 5% of the PGA_AGGREGATE_TARGET, whichever is the lower value.

5.  The query optimizer will not select to use a hash join unless the hash join is able to complete in memory, as controlled by the PGA_AGGREGATE_TARGET and/or HASH_AREA_SIZE parameter values.

6.  A hash join is only faster than a nested loops join when the memory available for the hash join is at least 1.6 times the sum of bytes in the rows of the table to be hashed.

7.  A hash join’s performance is affected when the hash join overflows the memory established by the HASH_AREA_SIZE parameter, and the hash join spills to the TEMP tablespace.

8.  The degree of parallelism for a table (or individual SQL statement) must be adjusted when a full table scan is used to retrieve the rows during a hash join.  Hash joins typically require the parallel degree to be set to a value greater than 1 in order to improve performance over other join methods.

9.  The _PGA_MAX_SIZE parameter should be adjusted to improve performance when joining large tables with a hash join.

10. The Oracle query optimizer will commonly ignore USE_HASH hints.

11. The OPTIMIZER_INDEX_COST_ADJ parameter and OPTIMIZER_MAX_PERMUTATIONS parameters control the optimizer’s decision to use hash joins.

12. When the PGA_AGGREGATE_TARGET parameter is not used, the HASH_AREA_SIZE parameter should be set to 50% of the square root of the bytes in the smaller of the two row sources being joined, with a minimum value of 1MB.

13. The row source listed below the words HASH JOIN in an execution plan is the source for the hash table – the optimizer typically tries to select the smaller estimated row source of the two row sources as the source for the hash table. The second row source listed below the words HASH JOIN in an execution plan is fully scanned, probing the generated hash table in search of a match.

When attempting to answer the true or false questions, think about under what conditions the answer to the question will be true and under what conditions the answer to the question must be false.