Buffer Cache Hit Ratio, Optimizer Index Cost Adj, Clustering Factor, Performance Testing – How to Break a Test Computer

14 08 2010

August 14, 2010

My replacement laptop arrived this week to fix an occasional problem that I had with the old laptop:

I thought that I would perform a test on the new laptop, so I thought that I would try a slightly modified version of the script that appeared in the “Expert Oracle Practices” book in the Buffer Cache Hit Ratio section of chapter 8.  The table creation part of the script tends to be very CPU intensive for an hour or two, so if you want to try the script, set aside plenty of time.

SET LINESIZE 150
SET PAGESIZE 2000
SET TRIMSPOOL ON
SET TIMING ON
SET ARRAYSIZE 100
SPOOL C:\BCHRTestShortAsync11Gr2.txt

CREATE TABLE T1 (
  ID NUMBER,
  DESCRIPTION VARCHAR2(80));

INSERT INTO T1
SELECT
  CEIL(ABS(SIN(ROWNUM/9.9999)*10000)),
  'This is the long description for this number '|| TO_CHAR(CEIL(ABS
(SIN(ROWNUM/9.9999)*10000)))
FROM
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000),
  (SELECT
    ROWNUM RN
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10000);

COMMIT;

CREATE INDEX IND_T1 ON T1(ID);

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE, METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')

ANALYZE INDEX IND_T1 VALIDATE STRUCTURE;

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH SHARED_POOL;

SET AUTOTRACE TRACEONLY EXPLAIN
ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=5;

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

SET ARRAYSIZE 100
SET AUTOTRACE TRACEONLY STATISTICS

SET TIMING ON

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER='IND_COST_LARGE100';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER='IND_COST_LARGE5';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=5;

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

SPOOL OFF

The script builds a table that has 100,000,000 rows, displays the execution plan for a SQL statement that selects a small percent of the table when the OPTIMIZER_INDEX_COST_ADJ is set to 100 and again at 5, and finally excutes the SQL statements with a 10046 trace enabled.

A screen capture during the table creation with Oracle Database 11.2.0.1 (SQL*Plus Windows 7 Task Manager, Windows 7 Resource Monitor):

Note that the laptop only managed to achieve 3MB per second while creating the table – that was a fairly consistent average during the table creation that required 40 minutes and 29 seconds. 

The execution plans generated are as follows:

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  4000K|   206M|   222K  (1)| 00:44:29 |
|*  1 |  TABLE ACCESS FULL| T1   |  4000K|   206M|   222K  (1)| 00:44:29 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=400 AND "ID">=1)

——–

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=5;

Execution Plan
----------------------------------------------------------
Plan hash value: 634656657
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  4000K|   206M|   202K  (1)| 00:40:25 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |  4000K|   206M|   202K  (1)| 00:40:25 |
|*  2 |   INDEX RANGE SCAN          | IND_T1 |  4067K|       |   427   (1)| 00:00:06 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=1 AND "ID"<=400)

So, the optimizer is predicting that an execution using a full table scan will complete in 44 minutes and 29 seconds, while an execution using an index range scan will complete in 40 minutes and 25 seconds.   Let’s see how the different execution plans performed.

Full Table Scan:

SQL> ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> SELECT
  2    ID,
  3    DESCRIPTION
  4  FROM
  5    T1
  6  WHERE
  7    ID BETWEEN 1 AND 400;

2547158 rows selected.

Elapsed: 00:00:14.39

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     838515  consistent gets
     813246  physical reads
          0  redo size
  140941880  bytes sent via SQL*Net to client
     280701  bytes received via SQL*Net from client
      25473  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    2547158  rows processed

—-

Index Range Scan:

SQL> ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=5;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> SELECT
  2    ID,
  3    DESCRIPTION
  4  FROM
  5    T1
  6  WHERE
  7    ID BETWEEN 1 AND 400;

2547158 rows selected.

Elapsed: 00:01:24.36

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    2573595  consistent gets
     818246  physical reads
          0  redo size
   17418653  bytes sent via SQL*Net to client
     280701  bytes received via SQL*Net from client
      25473  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    2547158  rows processed

The full table scan when selecting 2.55% of the rows in the table required 14.39 seconds while the index range scan required 1 minute 24.36 seconds.  I guess that is not a bad improvement for leaving the OPTIMIZER_INDEX_COST_ADJ parameter at the default value of 100.  But wait, the full table scan was CPU bound, not IO bound!  What if we allow the SQL statement to use more than 1 CPU?

SET ARRAYSIZE 100
SET AUTOTRACE TRACEONLY STATISTICS

SET TIMING ON

ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
ALTER SESSION SET TRACEFILE_IDENTIFIER='IND_COST_PARALLEL100';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';

ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ=100;

SELECT /*+ PARALLEL(T1, 8 ) */
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400;

Here is a screen capture that was captured as the full table scan ended (I had to execute the above several times so that the screen capture showed the very end of the execution):

That helped a little, the full table scan completed in 12.01 seconds and used about 24% of the CPU capacity.  Interesting that the screen capture shows that the network utilization was 207Mb/s – the laptop was connected to a wireless connection with Oracle Database 11.2.0.1 running locally.  The full table scan might be IO bound now (note that there were only two datafiles, so I supposed that we could still be CPU bound if only two parallel processes could be used).

Let’s see the storage requirements for the table and index (ASSM AUTO tablespace):

SET AUTOTRACE OFF

SELECT
  SEGMENT_NAME SEGMENT,
  COUNT(*) EXTENTS,
  BYTES/1024 EXT_SIZE_KB,
  (COUNT(*) * BYTES)/1048576 TOTAL_MB
FROM
  DBA_EXTENTS
WHERE
  OWNER=USER
  AND SEGMENT_NAME IN ('IND_T1','T1')
GROUP BY
  SEGMENT_NAME,
  BYTES
ORDER BY
  SEGMENT_NAME,
  BYTES;

SEGMENT       EXTENTS EXT_SIZE_KB   TOTAL_MB
---------- ---------- ----------- ----------
IND_T1             16          64          1
IND_T1             63       1,024         63
IND_T1            120       8,192        960
IND_T1             10      65,536        640
T1                 16          64          1
T1                 63       1,024         63
T1                120       8,192        960
T1                  1      60,416         59
T1                 83      65,536      5,312

It might be somewhat interesting to mention that somehow we ended up with one odd 59MB extent in the table, otherwise the extent allocation followed the typical pattern on 16 extents at 64KB, 63 extents at 1MB, 120 extents at 8MB, and then 64MB extents.

Let’s take a quick peek at a couple of the index statistics:

SELECT
  HEIGHT,
  BLOCKS,
  LF_BLKS,
  LF_ROWS,
  DISTINCT_KEYS,
  MOST_REPEATED_KEY,
  PCT_USED
FROM
  INDEX_STATS
WHERE
  NAME='IND_T1';

HEIGHT     BLOCKS    LF_BLKS     LF_ROWS DISTINCT_KEYS MOST_REPEATED_KEY   PCT_USED
------ ---------- ---------- ----------- ------------- ----------------- ----------
     3     212992     208854 100,000,000        10,000           900,324         90

SELECT
  INDEX_TYPE,
  BLEVEL,
  LEAF_BLOCKS,
  DISTINCT_KEYS,
  AVG_LEAF_BLOCKS_PER_KEY,
  AVG_DATA_BLOCKS_PER_KEY,
  CLUSTERING_FACTOR,
  SAMPLE_SIZE
FROM
  USER_INDEXES
WHERE
  INDEX_NAME='IND_T1';

INDEX_TYPE     BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR SAMPLE_SIZE
---------- ---------- ----------- ------------- ----------------------- ----------------------- ----------------- -----------
NORMAL              2     212,350        10,000                      21                  10,074       100,749,838     556,356

Interesting – the clustering factor is higher than the number of rows in the table – maybe we should have sampled 100% of the table and index when collecting statistics.

A quick look at the table statistics:

SELECT
  NUM_ROWS,
  BLOCKS,
  AVG_ROW_LEN,
  SAMPLE_SIZE
FROM
  USER_TABLES
WHERE
  TABLE_NAME='T1';

   NUM_ROWS     BLOCKS AVG_ROW_LEN SAMPLE_SIZE
----------- ---------- ----------- -----------
100,000,000    817,275          54 100,000,000

It might be a good idea to take a look at the TKPROF summaries for the executions.  Just as a reminder, the normal (non-parallel, serial) table scan required 14.39 seconds, the parallel table scan required 12.01 seconds, and the index range scan required 1 minute 24.36 seconds.

TKPROF output full table scan: elapsed time 6.18 seconds (8.21 seconds less than what was reported by SQL*Plus) with 5.28 seconds of CPU time:

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    25473      5.28       6.18     813246     838515          0     2547158
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    25475      5.28       6.18     813246     838515          0     2547158

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 185 

Rows     Row Source Operation
-------  ---------------------------------------------------
2547158  TABLE ACCESS FULL T1 (cr=838515 pr=813246 pw=0 time=6130508 us cost=222369 size=216021546 card=4000399)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   25473        0.00          0.02
  db file sequential read                         3        0.00          0.00
  direct path read                             4080        0.00          0.91
  SQL*Net message from client                 25473        0.00          7.00
  db file scattered read                          1        0.00          0.00

TKPROF output parallel full table scan: elapsed time 3.42 seconds (8.59 seconds less than what was reported by SQL*Plus) with 1.63 seconds of CPU time:

SELECT /*+ PARALLEL(T1, 8 ) */
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.04         11         25          0           0
Fetch    25473      1.63       3.42          0          0          0     2547158
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    25475      1.63       3.47         11         25          0     2547158

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 185 

Rows     Row Source Operation
-------  ---------------------------------------------------
2547158  PX COORDINATOR  (cr=25 pr=11 pw=0 time=2345038 us)
      0   PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=30853 size=216021546 card=4000399)
      0    PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us cost=30853 size=216021546 card=4000399)
      0     TABLE ACCESS FULL T1 (cr=0 pr=0 pw=0 time=0 us cost=30853 size=216021546 card=4000399)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                         3        0.00          0.00
  db file scattered read                          1        0.00          0.00
  os thread startup                               8        0.00          0.02
  PX Deq: Join ACK                                8        0.00          0.00
  PX Deq: Parse Reply                             8        0.00          0.01
  SQL*Net message to client                   25473        0.00          0.02
  PX Deq: Execute Reply                        5495        0.00          1.92
  SQL*Net message from client                 25473        0.00          7.32
  latch free                                      1        0.00          0.00
  PX qref latch                                   1        0.00          0.00
  PX Deq: Signal ACK RSG                          1        0.00          0.00
  PX Deq: Signal ACK EXT                          7        0.00          0.00
  PX Deq: Slave Session Stats                     3        0.00          0.00
  enq: PS - contention                            1        0.00          0.00

TKPROF output index range scan: elapsed time 76.85 seconds (16.51 seconds less than what was reported by SQL*Plus) with 29.28 seconds of CPU time:

SELECT
  ID,
  DESCRIPTION
FROM
  T1
WHERE
  ID BETWEEN 1 AND 400

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    25473     29.28      76.85     818246    2573595          0     2547158
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    25475     29.28      76.85     818246    2573595          0     2547158

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 185 

Rows     Row Source Operation
-------  ---------------------------------------------------
2547158  TABLE ACCESS BY INDEX ROWID T1 (cr=2573595 pr=818246 pw=0 time=31008522 us cost=202004 size=216021546 card=4000399)
2547158   INDEX RANGE SCAN IND_T1 (cr=30646 pr=5238 pw=0 time=2313805 us cost=427 size=0 card=4067270)(object id 77111)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   25473        0.00          0.03
  db file sequential read                     81689        0.10         32.96
  SQL*Net message from client                 25473        0.00          6.11
  db file parallel read                       36586        0.87         26.09

Numbers from the twilight zone?  But wait, there is more.  The numbers were so illogical that I took the database offline, rebooted, and was immediately greeted with a warning that the RAID 0 array in the new laptop failed!  Try this experiement at your own risk.

————————————-

So, what  have we learned from this blog article?  To get you started, what was the buffer cache hit ratio for each of the executions?  How can you explain the decrease in CPU time when parallel execution was used?

—-

Edit August 14, 2010: In case you are wondering, the Oracle database was running on the laptop, not a remote server.  The 11.2.0.1 database was configured with Oracle’s default parameters, with the exception of SGA_TARGET which was set to 12G and PGA_AGGREGATE_TARGET which was set to 1800M.  Oracle Database on the Windows platform supports only direct, asynchronous I/O.  Windows 7 Ultimate running on the laptop does not offer large memory page support, unlike the Server type Windows operating systems.


Actions

Information

11 responses

15 08 2010
Micke

Just out of curiosity, what kind of monster laptop did you get..?

regards
/M

15 08 2010
Charles Hooper

It is a Dell Precision M6500 – Dell’s 17″ engineering class laptop with an Intel Core i7 940XM CPU.

The laptop has a lot of nice features. The only faults that I have found with it so far are:
* It needs a SATA 3.0 spec (6Gb/s) hard drive interface
* The slim power supply (power brick) for the laptop is about twice as wide and 1.5 times as long as the power supply for a Dell Inspiron 9400/1705 laptop – for comparison, the power supply for the Dell Inspiron 9400/1705 is probably twice as large as what is found on Dell’s current 17.3″ Inspiron laptops.
* The aluminum case on the laptop has fairly sharp edges and the laptop probably weighs 30% more than a Dell Inspiron 9400/1705 laptop.
* I can’t tell the difference between the high-end LCD screen in the laptop from that of the high-end LCD in the 3.5 year old Dell Inspiron 9400 laptop, other than it has a matte finish while the LCD in the 9400 is glossy.

I was strongly considering the purchase of a Sager branded Clevo X7200 (http://www.clevo.com.tw/en/products/prodinfo.asp?productid=279) with a 6 core desktop CPU (I was also considering the older Clevo 900f with the same 6 core desktop CPU). The reasons why I did not buy the Sager:
* The X7200 was not available for sale at the end of July, and the X7200 page on Clevo’s site was blank.
* The X7200 only offered 1920 by 1080 resolution in a wide-screen 16:9 aspect ratio, and I suspected that the LCD quality was poor like that of the Clevo 900f.
* Offers USB 3.0, but does not offer SATA 3.0 spec (6Gb/s).
* The 300 watt power supply is supposedly huge, although I have not seen it.
* The laptop weighs close to twice as much as a Dell Inspiron 9400/1705 laptop.
* The laptop is probably 50%, possibly 100% thicker than a Dell Inspiron 9400/1705 laptop.
* I was a little concerned about the quality of construction – would it be sturdy enough to be used as a laptop computer, in one’s lap.
* The 12 cell battery only offers about 1 hour of run time.
* Would a laptop this large fit under an airplane seat when in a large enough laptop bag?

15 08 2010
Aman....

Charles,

Very impressive one! Would you please have a look at this model and give your opinion about that whether its worth to be used for an Oracle professional to buy who would be running VM’s over it,
http://www.sony.co.in/product/vpcz128gg

Regards
Aman….

15 08 2010
Charles Hooper

Aman,

It does not appear that Sony offers that laptop in the United States, but does offer the “Z” series (VPCZ1290X) which appears to have several features in common.

I looked at the link that you provided – it appears to be a very impressive laptop. I do not know much about Sony’s compluter/laptop reliability. I know that the quality of most of their comsumer electronics products (TV, DVD, Blu-Ray player, etc.) is very high.

When reviewing the specifications, this is what I found:
* The Core i7 CPU is a dual core 32nm processor with hyperthreading support. The 32nm processors were introduced in early 2010 and reduce the amount of power needed for the CPU, which should extend battery life. The Core i7 620 in the laptop should be faster than the Core i7 940QM in my laptop when there are only a couple of active processes (maybe up to 4) in the VMs. The Core i7 620 uses 1066MHz (or 800MHz) memory rather than the faster 1333MHz memory – so memory accesses could be a bit slow when you have several VMs running at the same time. The CPU supports Intel Virtualization Technology, which is required by some VM packages (Microsoft’s free VM no longer requires CPUs to offer this feature).
http://ark.intel.com/Product.aspx?id=43560&processor=i7-620M&spec-codes=SLBPD,SLBPE,SLBTQ
* The laptop uses 4 – 64GB SSD drives in a RAID 0 array. I do not think that 4 standard sized SSD drives will fit into such a small laptop, so these could be very expensive to replace if they break. Having 4 of these drives in the RAID 0 array will double your chances of encountering the problem that I had with the RAID 0 array in my laptop. In theory, this could be a very fast SSD configuration, offering up to 1GB/s of performance, but 64GB drives are typically 25% to 50% as fast as 256GB SSD drives. The number of drives should help when you have multiple VMs running at the same time if the RAID controller is optimized for performance.
* The laptop has an extremely high resolution LCD (13 inches) in a 16:9 aspect ratio, offering the same resolution as some of Dell’s high-end 17.3″ laptops.
* The laptop comes with 8GB of memory, which is the maximum supported by most laptops running 64 bit Windows. How many VMs will you be able to open if you want to allocate 2GB or 4GB per VM? Will that be a problem?
* Wireless and wired network options appear to offer the current standard for highest possible performance (802.11n and gigabit Ethernet)
* The video camera is low resolution 640 x 480 pixels (0.3 megapixels). This could be too low of resolution for video conferencing.
* The laptop has both a VGA (15 pin D sub) and HDMI (high definition TV) output.
* Offers bluetooth support and USB 2 connectors, no USB 3 connectors.
* Very light weight at 1.43KG (3.15LBS)
* The 5 hour battery will probably provide 3 hours of light use while on battery.

It appears to be a very good laptop, offering a lot of power and capability in a very small package.

16 08 2010
Aman....

Charles,

Many thanks for looking into the laptop and giving the detailed review. I guess there are different variations for this series which may be there in the United States as I picked it from Sony’s India website. About the performance of the laptops from Sony, there is a known thing about them which is fan’s voice but other than that, they are pretty good and in fact, becoming very popular, at least here in India.

>>The laptop comes with 8GB of memory, which is the maximum supported by most laptops running 64 bit Windows. How many VMs will you be able to open if you want to allocate 2GB or 4GB per VM? Will that be a problem?

I guess that’s the maximum RAM which is coming at the moment in the laptops. I shall not run more than 2 VMs at one time with max of 2gb RAM given to them so I believe, that shouldn’t impact the host machine too much. What’s your thoughts are ?

>> The laptop uses 4 – 64GB SSD drives in a RAID 0 array. I do not think that 4 standard sized SSD drives will fit into such a small laptop, so these could be very expensive to replace if they break. Having 4 of these drives in the RAID 0 array will double your chances of encountering the problem that I had with the RAID 0 array in my laptop.

Okay, I didn’t get this one. How can the SSDs break? Can you please this elaborate more since the laptop is projected for moving professionals and this kind of thing shouldn’t happen for any reasons. I do agree that Sony must has done some unique thing to fit those into such small cabinet.

>> In theory, this could be a very fast SSD configuration, offering up to 1GB/s of performance, but 64GB drives are typically 25% to 50% as fast as 256GB SSD drives. The number of drives should help when you have multiple VMs running at the same time if the RAID controller is optimized for performance.

I won’t be able to do any such tuning myself I guess, at least not without reading about it beforehand. But I am sure, this should be a faster media to work with.

Regards

16 08 2010
Charles Hooper

>> Okay, I didn’t get this one. How can the SSDs break? Can you please this elaborate more since the laptop is projected for moving professionals and this kind of thing shouldn’t happen for any reasons. I do agree that Sony must has done some unique thing to fit those into such small cabinet.

I have been reading a lot about SSD drives in the last couple of months. Apparently, SSD drives do not immediately erase data when files are deleted (or replaced). The operating system, specifically Windows 7, will periodically send a TRIM command to the SSD drives when the drives have not been used for a while (maybe an hour), and that causes the SSD drives to start the cleanup process which helps to restore the SSD drive’s speed to nearly new condition. Not all SSD drives support the TRIM command, and if the drives are connected to a RAID array the TRIM command will very likely never reach the SSD drives (I heard that Intel’s latest driver update corrects this problem when the drives are connected using Intel’s motherboards, but I have not located the driver – possibly http://downloadcenter.intel.com/detail_desc.aspx?agr=N&ProductID=&DwnldID=15251). Some SSD drives support a built-in feature called garbage collection that performs a similar task, even when the drives are connected to a RAID controller – the drives themselves determine when to perform garbage collection.

How can SSD drives fail? I researched this a little after hearing about problems with Intel’s high-end SSD drives. The following are search results from a recent search:
http://forum.notebookreview.com/sony/474808-brand-new-sony-vaio-z-2010-128gb-ssd-raidfailed.html
http://forum.notebookreview.com/sony/509224-sony-vaio-z-128gb-ssd-raid-0-failing-frequently.html
(A lot of good information) http://forum.notebookreview.com/sony/469272-sony-z11-long-term-ssd-performance-post-your-results.html
http://www.techradar.com/news/computing-components/storage/ssd-failure-rate-is-10-20-per-cent–268368
http://communities.intel.com/thread/5763
http://www.tomshardware.com/news/intel-firmware-ssd-bug-x25m,9016.html
http://www.ocztechnologyforum.com/forum/archive/index.php/t-52969.html
http://ssdtechnologyforum.com/threads/559-One-of-my-Patriot-Torqx-SSD-died-(Common-Patriot-SSD-failure-issue)
http://forum.crucial.com/t5/Solid-State-Drives-SSD/Crucial-CT256M225-SSD-drive-crash-failure-on-MacBook-Pro/td-p/5003

The most common reason for failure appears to be software related – the software that is built into the SSD drives has bugs or is “flashed” by the owner to an updated version. A bad batch of memory chips, faulty capacitor (several 3 year old motherboards in Dell computers have had this problem, requiring a motherboard replacement), exceeding the maximum number of re-writes for a block (this should be rare if the drives are not full, the SSD drive should try to even out the wear on all memory cells), and probably a couple of other causes for SSD drive failure.

I still do not know what happened to my laptop when I restarted the laptop after the test for this blog article – only 1 drive was indicated as attached to the computer following the BIOS post operation. I rebooted again and both drives appeared, but I received a message that no boot device was available. I went into the BIOS setup, where I found that booting from the hard drive was not an option. I rebooted again and the laptop was fine.

16 08 2010
joel garry

I’m wondering about that “even out the wear on all memory cells” in terms of Oracle controlfile and data file header blocks. Does that mean you inevitably get into a state where those blocks are continuously running around the SSD? Just thinking about that wears me out.

17 08 2010
Charles Hooper

Joel,

That is one of my concerns as well.

The less expensive, higher capacity MLC chips in most SSD devices offer fewer write cycles (by several orders of magnitude) than the SLC chips in some of the higher-end SSD devices (I think that Intel’s SSDs are SLC, although they recently partnered with Micron/Crucial to develop new 34nm chips for SSDs – I believe that Crucial’s high-end C300 line uses MLC chips, so I wonder if future Intel SSDs will use MLC also). It might be interesting to see if a file “update” on a particular SSD model is handled like an Oracle index update, where an index update is actually a delete followed by an insert – and the new index entry cannot occupy the same position as the old (unless there was a commit in between). If that is the case, write leveling might still help. But of course this also raises the question of how datafile updates might be interpretted by the SSD.

I wonder if these might help explain some items, just found during a search:
http://en.wikipedia.org/wiki/Wear_leveling
http://en.wikipedia.org/wiki/Solid-state_drive

http://www.computerworld.com/s/article/9180278/Micron_ships_its_first_enterprise_class_SSD
“Because of the SLC NAND and write-leveling software, which spreads data out evenly so as to not wear out any one area of the drive, the P300 [200MB? drive] is rated to sustain over 3.5PB of data over a five-year period, or 1.9TB of write data every day for five years, Mankin said.”

It almost sounds like repeatedly updating a file will intentional “fragment” the file, scattering the data all over the drive. It would seem that after a while this should degrade read performance – unless of course read performance is already constrained to a lower threshold by the SATA interface (or PCI Express bus or some other internal limit in the server).

20 08 2010
Aman....

Charles,

I did a bit more research and this is what I found where the project manager of Vaio Z series, explained about the underlying technology.
http://translate.google.com/translate?js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&u=http://chinese.engadget.com/2010/03/04/vaio-z-explained-by-project-leader/&sl=zh-CN&tl=en&swap=1

I know I am sounding little fussy here but I am not sure(yet) that whether it would be a right purchase or not and the reason for this confusion is just the SSD. I know its good, fast but may be its just a new piece of technology, which I never used thats why I am more worried. I loved the rest of the part of the notebook, have read couple of other reviews as well, so far , so good! What do you recommend, will it be a right purchase? I am afraid that the HDD model is not there and won’t be there as well so if I have to get, the only choice is the SSD!

PS: I have got my copy of the Expert best practices from Oak Table 🙂 . I had a quick glance at the chapter co-authored by you and Randolf and it just appeared “awesome” . Hope it would sink in my slow mind slowly 🙂 . Congratulations for the book and I hope I shall write a review too soon about it :).

23 08 2010
Charles Hooper

Aman,

I saw your comment here:
http://www.freelists.org/post/oracle-l/data-buffer-access-confusion
“Is it a correct thing to say that when the data buffer is being read from the disk, its first kept in the PGA and then later on, it would be copied from the PGA memory to the standard buffer cache?”

And the follow-up here:
http://www.freelists.org/post/oracle-l/data-buffer-access-confusion,2
“I asked this question because this came up at two different places, one in a conversation with an oracle professional and the 2nd, in a book which I have just purchased.”

Which book did you read that stated a block read into the PGA could be copied into the SGA? I read that somewhere also, and I thought that I wrote a comment about it, but I cannot find that comment right now. I am disappointed/frustrated that I cannot remember where I saw that statement.

20 08 2010
Charles Hooper

Aman,

Thank you for posting additional information about the laptop computer. If I understand the article correctly, there are actually only two SSD drives in the laptop (a total of two SATA2 interfaces in use). Each of the two SSD drives is actually composed of two 64GB memory banks that are probably treated lke dual channel SDRAM memory – or an internal RAID 0 array if you want to look at it that way. I understand that some of the SSD manufacturers are creating their SSD drives with a similar feature (treated like dual channel SDRAM memory) to help improve performance. While the manufacturer advertises 256GB of storage space, the actual storage space will probably be between 220GB and 230GB because the SSD will likely reserve some of its storage capacity for use when some of the memory cells start failing. It sounds like the thoughput is roughly the same as a single SSD drive with a SATA 2 connection – about 250MB per second read speed, and probably much slower write performance.

Will 220GB be enough storage capacity for all of the virtual machines’ operating systems, Oracle binaries, Oracle documentation, trace files, and database files? My 3.5 year old laptop has an upgraded 320GB hard drive and only 29GB of storage space available – and there are no videos, MP3s, or other files that take up a lot of room. My new laptop has 2 – 256GB Crucial C300 SSD drives in a RAID 0, and provides 476GB of storage space. The read throughput of one of these Crucial SSD drives when attached to a SATA 3 (6Gb/s) connection is a bit more than 360MB/s, but only achieves about 250MB/s when connected to a SATA 2 connection. For comparison, a fast 7200 RPM hard drive might be able to achive 130MB/s, and might offer between 500GB and 750GB of storage capacity in a size small enough for a laptop (2.5 inch form factor).

I suggest finding one of the Sony laptops and trying to type on it for a while. The keyboards on some laptops are extremely poor quality. My old Dell laptop has a very nice keyboard, even though some of the keys do not make contact very well (this might cause missing letters). My new Dell laptop has a very nice keyboard, but there is a pointer stick (like the old IBM Thinkpad laptops) in the center of the keyboard. It seems like I bump that pointer stick frequently, which causes the cursor to jump and then I start typing into the wrong part of the document. My Toshiba netbook has an OK keyboard – it even has full size keys on the keyboard, but I can feel the keys flex a little when typing. My boss at work just bought an HP laptop, where the keys have square corners, rather than tapered edges – that seems to make it easy to hit the wrong key on the keyboard.

I look forward to reading your review of the book. The first chapter than Randolf and I wrote was originally expected to be a quick 5 or 10 page introduction to the different options, and then the remaining 20 or 25 pages was intended to be used to tell when someone should use one method (such as Statspack) or another (such as a review of a raw 10046 trace file) (there was only supposed to be a single chapter). If you saw the first of the two chapters, you probably know that the material for the 10 page introduction idea actually became the strength of the two chapters – the number of pages grew significantly. In my opinion, the 10053 trace and 10046 trace sections are the most impressive parts of the first chapter. Check the typos in Chapter 8 here – most of these typos were caught before printing, but the changes could not be fixed at that point:
https://hoopercharles.wordpress.com/2010/01/01/expert-oracle-practices-oracle-database-administration-from-the-oak-table-book/

Leave a reply to Charles Hooper Cancel reply