May 4, 2011
In a previous article I shared a couple of the slides that I put together for an Oracle Database presentation for a specific ERP group – a presentation that as designed was far too long to fit into the permitted time slot. I thought that in today’s blog article I would share a couple of “light-weight” slides from the presentation on the topic of statistics collection.
The notes that go along with the above slide are as follows:
Starting with Oracle Database 10.1 the query optimizer uses system statistics by default. These statistics tell the optimizer characteristics about the server’s performance. If you do not collect system statistics, Oracle will automatically use standardized system statistics, which are called NOWORKLOAD statistics. When your server is under a typical to heavy load, you should gather the system statistics, which may be accomplished using a SQL*Plus command similar to the command at the top of this slide – this command will collect the statistics for a 60 minute time period and then set the statistic values. You can check the current system statistics using the SQL statement at the left. If you see statistics like those at the right, that means your database is still using NOWORKLOAD statistics.
There is a bug in Oracle Database 11.2.0.1 and 11.2.0.2 related to the calculation of the SREADTIM – the average single-block read time expressed in milliseconds, and the MREADTIM – the average multi-block read time expressed in milliseconds.
—
The notes that go along with the above slide are as follows:
This slide shows an example of the statistics collection bug in Oracle Database 11.2.0.1 and Oracle Database 11.2.0.2. Notice the large values for SREADTIM and MREADTIM. These values are typically in the range of 4 to 12 milliseconds, and would normally appear as one or two digit values. Watch out for cases where the SREADTIM value is greater than the MREADTIM value – that is almost certainly an error in the statistics collecting process and could be due to SAN read-ahead caching.
The calculated value of 78 for the MBRC, which is the average number of blocks read from disk during multi-block reads, might be a little high. A high value for this parameter could cause the optimizer to use full table scans a little too frequently, and could lead to excessive CPU consumption problems – an example of this problem appears when Visual no longer uses the index on the PART_ID column when querying the INVENTORY_TRANS table when the PART_ID column is specified in the WHERE clause of a SQL statement.
—
The notes that go along with the above slide are as follows:
If you find that the System statistics were collected incorrectly, or if you need to set the system statistics of a test server to match the collected statistics of the production server, you can manually set the statistic values as shown at the top of this slide.
—
The notes that go along with the above slide are as follows:
It is important to make certain that the fixed object statistics are collected. If the statistics are not collected, there is a strong chance that some queries of Oracle Database views will be slow, or even result in the crashing of the user’s session – see the link at the bottom of this slide for an example that shows what might happen if these statistics are not collected. You can verify that the statistics were collected by executing the SQL statement at the left – if the SQL statement returns no rows, then the statistics have not been collected.
—
The notes that go along with the above slide are as follows:
Once you have collected the fixed object statistics, the query at the left will return output like you see on the right side of this slide.
—
The notes that go along with the above slide are as follows:
You should collect statistics weekly for the objects in the SYSADM schema (where the ERP data is located), typically when the database experiences little activity. You can collect those statistics using the SQL*Plus command shown at the top of this slide. Starting with Oracle Database 10.1 it is also a good idea to collect statistics for the objects in the SYS schema, and that can be done using the second command.
Starting with Oracle Database 10.1 a stale statistics collecting process collects missing and out of date statistics for objects, typically starting around 10PM. Even though the statistics collection is now partially automated in recent Oracle Database releases, it is still a good practice to collect statistics weekly. You can verify that statistics were recently collected using the SQL statements at the bottom of this slide.
—
The notes that go along with the above slide are as follows:
I have been mentioning this warning for the last couple of years in the ERP mailing list – do not use the Update Database Statistics menu item in Visual to collect statistics. Even in Visual 7.0, this menu command issues ANALYZE commands to collect the statistics, rather than the DBMS_STATS package which became the correct approach when Oracle 8i was introduced a decade ago. The ANALYZE command does not collect all of the statistics needed by the Oracle query optimizer, and collects some statistics that potentially cause problems for the query optimizer. You can execute the SQL statement shown on this slide to see if any tables in the SYSADM schema contain remnants from ANALYZE that need to be fixed. If the query returns any rows, execute the SQL statements that are returned.
—–
Any comments about the above? How would you do things differently?
On slide 18, regarding making a test db behave like a production db, it might be useful to mention the behavior you are talking about is plan generation.
While I’m certain it is obvious to you, some of your readers might misinterpret your remark as an outlandish claim that you can magically make a test server of presumably less horsepower and likely with less speedy storage media actually perform better. Of course you’re making no such claim and you’re just pointing out that if you feed the CBO the same data on two drastically different machines, you can get it to produce the same plans. Since you’re not interested in the optimal plan for the test server, but rather in seeing what plans will be generated on the production machine, this is entirely correct.
Mark,
Excellent points, and I agree that I should have been more clear with the description – in this case to indicate that the goal was to produce the same execution plans on test and production. Your comment also reminded me of this 4 part series:
https://hoopercharles.wordpress.com/2010/11/21/different-performance-from-standard-edition-and-enterprise-edition-1/
Thank you for taking the time to help clarify the meaning of the blog article contents.
Nice work!
THX for sharing this info.
Damir Vadas
Charles,
would it add something to mention that there are two flavour of NOWORKLOAD system statistics (default and gathered)? – perhaps not …
the (small) difference between the two modes is described in Randolf Geists Blog: http://oracle-randolf.blogspot.com/2009/04/understanding-different-modes-of-system.html
Thank you for referencing Randolf’s article. Either I forgot, or I was unaware that it was possible to gather NOWORKLOAD system statistics.
In retrospect, I should have referenced Randolf’s blog article series about system statistics on slides 16-18.
Charles,
“…There is a bug in Oracle Database 11.2.0.1 and 11.2.0.2 related to the calculation of the SREADTIM…”
are you talking about bug 11794684 ?
or the related one bug 9842771 ?
for the latter one, there is a patch available
We applied it and started to work with gathered system stats, which resulted on really bad overall performance.
So we decided to work with the system stats of 10.2, which resulted in much better performance.
Would be interesting to here if somebody else has experiences with Patch 9842771
Sokrates,
Thank you for locating the Metalink (MOS) articles.
My comment in the presentation was based on my observation of SYS.AUX_STATS$, after I gathered system statistics on a system using SSD drives (I was curious to know just how low the numbers could go). My first thought was that those numbers just can’t be right. I checked the documentation for Oracle Database 11.2 and compared it with the documentation for 10.2 and 11.1, but I did not find any indication that there was an intentional scale change in the statistic values. I simply manually set the values to match what I encountered with Oracle Database 11.1.0.7 and then forgot about the problem until I read this article:
http://antognini.ch/2010/11/workload-system-statistics-in-11g/
I need to double-check, but I believe that the problem is still present in patch 4 for 11.2.0.2 on the Windows 64 bit platform (11896292 https://supporthtml.oracle.com/ep/faces/secure/ml3/patches/PatchDownload.jspx?_afPfm=1f# ). I see that patch 5 for 11.2.0.2 on the Windows platform was just recently released (12344273 https://supporthtml.oracle.com/ep/faces/secure/ml3/patches/PatchDownload.jspx?_afPfm=1f# ) so I might test that patch release.
OK, I double-checked the system statistics collection with Oracle Database 11.2.0.2 on 64 bit Windows with patch 4 (early April 2011) installed. Here are the collected statistics:
Unless it takes an average of 1.6 seconds to perform a single block read and 3.9 seconds to perform a multi-block read from SSD, the bug is still present in patch 4. I will test patch 5 shortly to see if the bug fix is part of that patch.
It appears that 11.2.0.2 on Windows x64 with patch 5 (late April 2011) still encounters the bug when calculating SREADTIM and MREADTIM:
Statistics collection for fixed objects should also (same with system statistics as you already mentioned) be done during a time with “representative workload” and not e.g. during the night when the DB is idle.
Andreas,
Thank you for the comment. Could you provide a little more detail to explain why fixed object statistics should be collected during a time of “representative workload” and not during the night when the DB is idle.
There is a good chance that I am overlooking something that is very obvious, but I thought that gathering fixed object statistics only populated SYS.TAB_STATS$ and SYS.IND_STATS$, so it would seem that the best time to collect that information is when the database instance is not very busy servicing normal database requests. Obviously, if I am wrong I would like to know more information so that I do not repeat the same mistaken recommendation again in the future.
In AskTom (http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:60121137844769#tom62191850899057) Mr. Kyte wrote (in 2006): “the x$ tables would change size in response to init.ora setting changes generally. Setting processes higher will add rows to various x$ views for example. so, they could be considered a one time thing unless you make a big change to your parameter settings.”
so the system load should not be significant – as I understand it
Charles,
there are articles in metalink (e.g. How to gather statistics on SYS objects and fixed_objects? [ID 457926.1]) that mention to gather these statistics while there is a representative load (whatever a representative load avtually is).
“Since fixed objects record current database activity; statistics gathering should be done when database has a representative load so that the statistics reflect the normal database activity”. My interpretation was that these statistics have to be gathered again when the “representative” load changes significantly or if there are changes to server parameters (as already by Martin mentioned).
Regards,
Andreas
Andreas,
Thank you for the Metalink (MOS) Doc ID. Interesting, yet I was still skeptical because I have previously seen incorrect information in Metalink (MOS). I looked at the link provided by Martin, but I scrolled down a bit to a test case that was posted by Tom Kyte.
I did a bit of experimentation on 11.2.0.2 using the test case posted by Tom Kyte.
OK, in the above I see the number 39, let’s re-collect the fixed object statistics and see what happens:
Now we are seeing the number 26 where we had previously seen the number 39. Let’s start up about 15 sessions performing database activity and collect fixed object statistics again:
Now, what was originally showing 39 shows 29, which matches the output for COUNT(*) FROM V$SESSION, but I can’t explain why the number of sessions counted only changed by 5 when there were an additional 15 sessions connected to the database.
The above test case output does appear to agree with Andreas’ statement. I think that this is another case where the real value of the articles on my blog does not fully develop until comments start appearing on those article.
[…] have got to read the blog post to see its beauty and the hard work that goes behind its production. Charles Hooper, one of my favorite bloggers does it […]
Charles,
I know, this is not AskCharles, but since you have written a lot about Excel-access on Oracle databases perhaps you know the answer to the following question – which is loosely related to the thread’s topic:
When I make an OLEDEB-connection from Excel (or SQL Server Analysis Services) to an Oracle database, Excel tries to get a list of accessible objects with a query like:
In 10.2.0.4 the query needs a few seconds (for a user with dba role) but in 11.1.0.7 it’s considerably slower:
For the 11.1.0.7 database fixed table and dictionary stats are created (and seem to include reasonable values). It’s quite clear that the difference comes from all_objects: when I compare the definitions in catalog.sql (10.2.0.4) and cdcore.sql (11.1.0.7) I see some changes in the definition (no_expand hint, new columns namespace, edition, new object_types for mining and olap elements etc. in 11g) – but when I create a test view with the old 10.2.0.4-definition in 11.1.0.7 the performance is similar to the (bad) performance of the new all_objects; so I guess the problem is not the view but the different objects.
The execution plan in 11.1.0.7 shows a complex FILTER operation (and the cardinalities seem to be quite ok when I use gather_plan_statistics) – but I don’t plan to optimize internal data dictionary queries …
I have no idea if there are other options to improve the performance – or if this is the expected behavior in 11g. Do you know the problem – and is this a problem?
Regards
Martin
Notice in the above execution plans that there are problems with the cardinality estimates, and the MERGE JOIN CARTESIAN join method appear quite frequently (this may not be a problem).
At this time, I do not have the answer for speeding up this particular SQL statement. Someone famous once said that the fastest way to do something is to not do it at all – you might see if that is an option for you.
Charles,
thank you for the links and your research. The question was more of academic interest than of pratical need – and I think you are right: the best solution would be to avoid the query and to use and adapt existing connections. I don’t think, it’s possible to change the query – but DBMS_ADVANCED_REWRITE would certainly be an option. For the sake of completeness some additional observations:
– as already mentioned the performance is related to the user rights: a user with select_catalog_role (or dba role) gets the results faster than a user only with connect – and so the SQL Server forum suggestions are wrong: a reduction of the number of visible objects does not help
– a SQL_TRACE (or into sqlmonitor) reveals that waits are not a problem: the query is almost completly CPU bound
– the use of dba_objects would certainly improve the performance: the view definition of dba_objects is quite simple and a SELECT count(*) needs only 2.600 consistent gets for dba_objects and 169.000 consistent gets for all_objects
– another option to improve the performance of the complete excel query would be to use a common table expression with a materialize hint to replace the all_objects references:
with
all_objects_mat
as
(select /*+ materialize */ *
from all_objects)
SELECT *
FROM (SELECT ...
FROM all_objects_mat o1
WHERE ...
UNION
SELECT ...
FROM all_objects_mat o2,
all_objects_mat o3,
all_synonyms s
WHERE ...) tables
ORDER BY 4,
2,
3 ;
For a DBA user the changed query needs 5 sec (and 175K consistent gets) instead of 15 sec (and 500K consistent gets) for the original query.
For the CONNECT user the changed query runs in 15 sec while the original query needs 55 sec (I have no idea why the numbers differ from the results I got yesterday cause the database is a single user system with little load).
– the rule hint in your query is interesting: what client version did you use (my query is from Excel 2007)?
[…] from Sokrates in a Charles Hooper […]
Hi Charles,
from what I read so far, the values in CPUSPEED and CPUSPEEDNW do not actually represent the speed in MHz. It seems to be an internal value Oracle uses to compare the speed of a CPU in relation to others. See e.g. http://www.sql.ru/forum/actualthread.aspx?tid=682014&pg=5&mid=10015821#10015821 for a collection of those values gathered with different CPUs.
Could be that setting CPUSPEED to e.g. 2664 manually (like in your case) might cause unexpected results with Optimizer, because it’s way too high? Just an idea.
– totally agree with manually setting SREADTIM and MREADTIM though!
Benny
Benny,
I believe that you are correct that the CPUSPEED and CPUSPEEDNW values are not directly derived from the CPU’s speed in MHz (or GHz) – I hope that this article did not give the impression that I was stating there was a direct relationship between the speed in MHz and the CPUSPEED/CPUSPEEDNW values. If I recall correctly, the CPUSPEED and CPUSPEEDNW values indicate the number of “Oracle operations” performed in a certain time period (perhaps the number of consistent gets performed per millisecond, although I am not sure that the scale is millisecond). I had previously mentioned that observation a couple of times, including in this article:
https://hoopercharles.wordpress.com/2009/12/27/high-value-for-mbrc-causes-high-bchr-high-cpu-usage-and-slow-performance/
You are correct that setting the CPUSPEED to an arbitrary value could lead to problems with execution plans (and performance) – but doing so might be helpful in certain situations as mentioned in my discussion with Mark Farnham in the comments section of this article. The 2664 value that I manually specified is in fact a value that is higher than the values displayed in the article that you linked, however that value is smaller than the 2714 value that Oracle Database manually calculated. I suppose that the higher manually calculated value might simply mean that the memory throughput (and access time) is faster in the system that I used than it is in the other systems (might be non-ECC vs. ECC type memory, triple/quad channel vs. single/dual channel, 1600MHz memory vs. 800MHz memory, etc.).
Hi Charles!
I didn’t read through all the comments, so I didn’t notice the discussion you mentioned. Thank you for the explanations!
Hey Charles,
Quick one:
do you know what is the difference between
BEGIN
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
END;
and
BEGIN
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS(null);
END;
Cheers,
Dani
Dani,
There should be no difference between the two calls – NULL is the default value of the first parameter. NULL may be specified for additional clarity regarding the intended call behavior. Documentation reference:
http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/d_stats.htm#i1039162
Cheers Charles, as usual!!
Be well,
Dani