October 9, 2015
In 2006 I did a bit of experimentation with an Oracle characterset that supports Unicode (AL32UTF8 if I remember correctly) using Oracle Database 10.2.0.2. That test, performed on a desktop computer with a bit less than 1GB of RDRAM (Rambus) memory, was a huge failure, and not just because the computer did not have enough memory to meet the minimum requirements for Oracle Database 10.2.0.2. Sure, the database access was slower with a characterset that sometimes used one byte to store a single character, two bytes for other characters, and three bytes to store each of the remaining characters. There was another, more serious problem. I put together this article mostly as a reminder to myself to recall what I tested 9.5 years ago.
I am in the process of testing an upgrade to the most recent version of the ERP package used by the company where I work – in addition to being an Oracle DBA for the company, I am also the IT Manager and in-house expert for the ERP system. Reading through the ERP installation documentation, I found the following:
“It is highly recommended that you perform the upgrade procedures in a test environment before you upgrade your production environment. The upgrade procedure potentially involves deleting your existing database instance. Make sure you understand the ramifications of deleting the database instance before you upgrade your production environment… If you intend to use the same database instance name with your 8.0.0 installation as you did with your previous VISUAL installation, delete your existing database instance. Do not delete your existing instance until after you have verified that the data has been exported properly…”
“Create a new Oracle Instance. In 8.0.0, the AL32UTF8 character set and the Al16UTF16 universal national character set are used. You must create a new Oracle instance to specify these character sets.”
So, the ERP company would like for me to export the production WE8MSWIN1252 characterset (AL16UTF16 national characterset) database using Datapump, DROP DATABASE, create a new database with the AL32UTF8 characterset (AL16UTF16 national characterset), import the database using Datapump, and then start the ERP upgrade conversion utility. The statement “you must create a new Oracle instance to specify these character sets,” was a bit of a red flag. Something is not right with that statement.
If I followed those directions, something would certainly take a bite out of some of the data in a few critical tables:
SELECT * FROM NLS_DATABASE_PARAMETERS ORDER BY PARAMETER; PARAMETER VALUE ------------------------ ---------------------------- NLS_CALENDAR GREGORIAN NLS_CHARACTERSET WE8MSWIN1252 NLS_COMP BINARY NLS_CURRENCY $ NLS_DATE_FORMAT DD-MON-RR NLS_DATE_LANGUAGE AMERICAN NLS_DUAL_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_LANGUAGE AMERICAN NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_NCHAR_CONV_EXCP FALSE NLS_NUMERIC_CHARACTERS ., NLS_RDBMS_VERSION 188.8.131.52.0 NLS_SORT BINARY NLS_TERRITORY AMERICA NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR NLS_TIME_FORMAT HH.MI.SSXFF AM NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_LENGTH_SEMANTICS: BYTE – so why is that a problem?
In this particular ERP system, which is designed for use by manufacturers, one of the critical tables in the database is named PART – there are quite a number of defined foreign keys in the database that point back to this particular table. What do the columns in this table look like before the conversion?
SQL> DESC PART Name Null? Type ----------------------- -------- ------------ ID NOT NULL VARCHAR2(30) DESCRIPTION VARCHAR2(40) STOCK_UM NOT NULL VARCHAR2(15) PLANNING_LEADTIME NOT NULL NUMBER ORDER_POLICY NOT NULL CHAR(1) ORDER_POINT NUMBER(14,4) SAFETY_STOCK_QTY NUMBER(14,4) FIXED_ORDER_QTY NUMBER(14,4) ...
Where is the potential problem with exporting everything, dropping the database, creating a new database with a different characterset, and then importing the data. When I tested the procedure in 2006 I found that one or two rows would not re-import into the PART table due to the values of some columns being too long. Too long? Funny things (or not so funny things) happen when the foreign keys are created after the table data import if one or two rows referenced by other tables are not in the PART table. It took me a couple of minutes to recreate the SQL statement that I set up in 2006 to see what caused the import problem – that SQL statement is shown below:
SELECT ID, DESCRIPTION, LENGTHB(DESCRIPTION) CUR_BYTE_LEN, LENGTHB(CONVERT(DESCRIPTION, 'AL32UTF8', 'WE8MSWIN1252')) NEW_BYTE_LEN FROM PART WHERE LENGTHB(CONVERT(DESCRIPTION, 'AL32UTF8', 'WE8MSWIN1252'))>LENGTHB(DESCRIPTION);
The above SQL statement instructs Oracle Database to take a database column’s values (DESCRIPTION), consider the text stored in that column to be in the WE8MSWIN1252 characterset, translate the column values to the AL32UTF8 characterset, and then compare the byte lengths of the translated column values with the byte lengths of the original column. If there are differences, the primary key value (ID), DESCRIPTION, and column lengths of the DESCRIPTION column before and after conversion are output. The output was slightly different using Oracle SQL Developer, SQL*Plus on a Windows client, and SQL*Plus on a Synology NAS connected by a Windows Telnet session, so I thought that I would show screen captures of the output from all three utilities.
The DESCRIPTION column output of Oracle SQL Developer matches what appears in the ERP system. The degree character (°), center line/diameter character (Ø – might also be considered the empty set character), and registered trademark character (®) require a single byte for storage in the WE8MSWIN1252 characterset, but two bytes for storage in the AL32UTF8 characterset. Once translated to the AL32UTF8 characterset, the description for part ID LINED021274 will be too long to fit into the maximum 40 byte long DESCRIPTION column. Several utilities exist for scanning a database to identify similar characterset conversion issues.
The Oracle Database 11.2 documentation lists three ways to change the characterset of the database:
- Migrating Character Data Using a Full Export and Import
- Migrating a Character Set Using the CSALTER Script
- Migrating Character Data Using the CSALTER Script and Selective Imports
I started wondering if it really was necessary, in this case, to change the database characterset? I examined the table creation script that the ERP system installer would use to create a new database – below is a portion of the script to create the PART table:
CREATE TABLE PART( ID NVARCHAR2(30) NOT NULL, DESCRIPTION NVARCHAR2(40), STOCK_UM NVARCHAR2(15) NOT NULL, PLANNING_LEADTIME INTEGER DEFAULT 0 NOT NULL, ORDER_POLICY NCHAR(1) DEFAULT ' ' NOT NULL, ORDER_POINT NUMBER(14,4), ORDER_UP_TO_QTY NUMBER (14, 4), SAFETY_STOCK_QTY NUMBER(14,4), FIXED_ORDER_QTY NUMBER(14,4), ...
Well, that table creation script would generate a bit different table when compared with the DESC PART output that I posted above – the VARCHAR2 columns would be created as NVARCHAR2, and the CHAR columns would be created as NCHAR. So, the new version of the ERP system will be working with NVARCHAR2 and NCHAR columns, which are already setup to use the AL16UTF16 national characterset.
I setup a test version of the database using the same Oracle Database version, and instructed the ERP conversion software to upgrade from version 6.5.2 to version 8.0 (temporarily passing the ERP database through versions 6.5.3, 6.5.4, 7.0, 7.1.1, and 7.1.2). I measured the amount of redo generated to make certain that I would not run out of disk space in the fast (flash) recovery area during the production database upgrade (because I still had the test database in noarchivelog mode, I simply checked the redo size statistic in V$SYSSTAT at various points during the conversion).
The redo generated to upgrade from version 6.5.2 through version 7.1.2 was about 1.75 times the byte size of the tables and indexes in the ERP schema in the database. The redo generated to upgrade from version 7.1.2 to 8.0 (the next version) required approximately 27.06 times the byte size of the tables and indexes in the ERP schema in the database! To put those numbers in perspective with simple numbers, assume that the byte size of the tables and indexes in the ERP schema in the database is about 100GB. The upgrade from version 6.5.2 to 7.1.2 will generate approximately 175GB of redo, so there must be at least that much space in the fast (flash) recovery area for the archived redo logs generated, if that is where the archived redo logs are placed. The first RMAN backup might also backup the archived redo logs, and if those backups are also stored in the fast (flash) recovery area there will be a need for at least an additional 175GB in the fast (flash) recovery area. For the upgrade from version 7.1.2 to 8.0 approximately 2706GB of redo would be generated, and again the first RMAN backup might also include those archived redo logs.
What is happening during the 7.1.2 to 8.0 upgrade? The 7.1.2 to 8.0 upgrade finds all of the CHAR and VARCHAR2 columns in all SYSADM (schema owner) tables, and for each table, sorts those columns alphabetically, creates a new column for one of those columns (with a _NEW suffix in the column name) that is defined as NCHAR or NVARCHAR2, executes a SQL statement to set the new column value to be the same as the original column value, drops the original column, and renames the new column so that it matches the old column name. This process is then repeated for each of the CHAR and VARCHAR2 columns in the table, before moving onto the next table. This is likely NOT the ideal way to change the column data types as this method generates enormous amounts of redo due to the number of times each data block is modified (and also likely leads to a lot of row chaining), which if you are correctly running the database in archived redo log mode, could very well completely fill your fast (flash) recovery area several times over with the archived redo logs. The approach implemented also causes the tables’ primary key columns to be scattered though the table definition – with Oracle Database it is faster to access the primary key columns during a full table scan if those columns are the first columns in the table definition (reference and the follow up article).
There are two other methods that I can think of right now that might have been used to reduce the amount of redo generated and preserve the column order in the tables during the 7.1.2 to 8.0 upgrade (I wonder if the ERP vendor would be willing to modify their approach?). They could have done something like this to simply change all of the column definitions for a single table all at once in a single ALTER TABLE command:
ALTER TABLE MYCUSTOM.AUDIT_SHEET MODIFY( WORKORDER_TYPE NCHAR(1), WORKORDER_BASE_ID NVARCHAR2(30), WORKORDER_LOT_ID NVARCHAR2(3), WORKORDER_SPLIT_ID NVARCHAR2(3), WORKORDER_SUB_ID NVARCHAR2(3), PART_ID NVARCHAR2(30), CUSTOMER_ID NVARCHAR2(15) ...);
A second option is to create a new table (with a _NEW suffix in the name) with all of the columns from the original table (just with CHAR changed to NCHAR and VARCHAR2 changed to NVARCHAR2), perform a statement such as INSERT INTO PART_NEW SELECT * FROM PART;, DROP PURGE the old table, then rename the new table as the old table.
Due to the method implemented, the converted PART table after the upgrade to version 8.0 appears something like this, where the primary key column (ID) is now the 68th column in the table definition:
SQL> DESC PART Name Null? Type ------------------------- -------- -------------- PLANNING_LEADTIME NOT NULL NUMBER ORDER_POINT NUMBER(14,4) SAFETY_STOCK_QTY NUMBER(14,4) FIXED_ORDER_QTY NUMBER(14,4) DAYS_OF_SUPPLY NUMBER ... DEF_SLS_TAX_GRP_ID NVARCHAR2(15) DEMAND_HISTORY NOT NULL NCHAR(1) DESCRIPTION NVARCHAR2(40) DETAIL_ONLY NOT NULL NCHAR(1) DIMENSIONS_UM NVARCHAR2(15) ... HTS_CODE NVARCHAR2(20) ID NOT NULL NVARCHAR2(30) INSPECTION_REQD NOT NULL NCHAR(1)
Other tables, such as the OPERATION table, have composite primary keys that are composed of character and number columns – after the conversion to 8.0 the numeric column (the last column in the composite primary key – WORKORDER_TYPE, WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID, SEQUENCE_NO) is listed first in the table definition, with the other columns of the primary key scattered by alphabetic order after the numeric and date columns.
Thoughts and opinions about the above? Am I being overly critical? I did not mention that it took about five or six tries for a successful ERP upgrade – two tries failed due to a bug in the conversion utility, and the other failed attempts were caused by the conversion utility stumbling over extensions (mostly logging triggers) that I added to the database over the years (but disabled before the ERP upgrade).
This blog is currently at 1,001,992 views, having crossed the 1,000,000 mark a couple of days ago. I would like to take this time to say thanks to the readers of the blog for visiting, and thanks to the ERP vendor for giving me something a bit different to think about, even if it was something that I thought about almost a decade ago.