Characterset Changes

9 10 2015

October 9, 2015

In 2006 I did a bit of experimentation with an Oracle characterset that supports Unicode (AL32UTF8 if I remember correctly) using Oracle Database  That test, performed on a desktop computer with a bit less than 1GB of RDRAM (Rambus) memory, was a huge failure, and not just because the computer did not have enough memory to meet the minimum requirements for Oracle Database  Sure, the database access was slower with a characterset that sometimes used one byte to store a single character, two bytes for other characters, and three bytes to store each of the remaining characters.  There was another, more serious problem.  I put together this article mostly as a reminder to myself to recall what I tested 9.5 years ago.

I am in the process of testing an upgrade to the most recent version of the ERP package used by the company where I work – in addition to being an Oracle DBA for the company, I am also the IT Manager and in-house expert for the ERP system.  Reading through the ERP installation documentation, I found the following:

“It is highly recommended that you perform the upgrade procedures in a test environment before you upgrade your production environment. The upgrade procedure potentially involves deleting your existing database instance. Make sure you understand the ramifications of deleting the database instance before you upgrade your production environment…  If you intend to use the same database instance name with your 8.0.0 installation as you did with your previous VISUAL installation, delete your existing database instance. Do not delete your existing instance until after you have verified that the data has been exported properly…”

“Create a new Oracle Instance. In 8.0.0, the AL32UTF8 character set and the Al16UTF16 universal national character set are used. You must create a new Oracle instance to specify these character sets.”

So, the ERP company would like for me to export the production WE8MSWIN1252 characterset (AL16UTF16 national characterset) database using Datapump, DROP DATABASE, create a new database with the AL32UTF8 characterset (AL16UTF16 national characterset), import the database using Datapump, and then start the ERP upgrade conversion utility.  The statement “you must create a new Oracle instance to specify these character sets,” was a bit of a red flag.  Something is not right with that statement.

If I followed those directions, something would certainly take a bite out of some of the data in a few critical tables:

PARAMETER                VALUE
------------------------ ----------------------------
NLS_COMP                 BINARY
NLS_CURRENCY             $
NLS_SORT                 BINARY

NLS_LENGTH_SEMANTICS: BYTE – so why is that a problem?

In this particular ERP system, which is designed for use by manufacturers, one of the critical tables in the database is named PART – there are quite a number of defined foreign keys in the database that point back to this particular table.  What do the columns in this table look like before the conversion?

 Name                    Null?    Type
 ----------------------- -------- ------------
 ID                      NOT NULL VARCHAR2(30)
 DESCRIPTION                      VARCHAR2(40)
 STOCK_UM                NOT NULL VARCHAR2(15)
 ORDER_POINT                      NUMBER(14,4)
 SAFETY_STOCK_QTY                 NUMBER(14,4)
 FIXED_ORDER_QTY                  NUMBER(14,4)

Where is the potential problem with exporting everything, dropping the database, creating a new database with a different characterset, and then importing the data.  When I tested the procedure in 2006 I found that one or two rows would not re-import into the PART table due to the values of some columns being too long.  Too long?  Funny things (or not so funny things) happen when the foreign keys are created after the table data import if one or two rows referenced by other tables are not in the PART table.  It took me a couple of minutes to recreate the SQL statement that I set up in 2006 to see what caused the import problem – that SQL statement is shown below:


The above SQL statement instructs Oracle Database to take a database column’s values (DESCRIPTION), consider the text stored in that column to be in the WE8MSWIN1252 characterset, translate the column values to the AL32UTF8 characterset, and then compare the byte lengths of the translated column values with the byte lengths of the original column.  If there are differences, the primary key value (ID), DESCRIPTION, and column lengths of the DESCRIPTION column before and after conversion are output.  The output was slightly different using Oracle SQL Developer, SQL*Plus on a Windows client, and SQL*Plus on a Synology NAS connected by a Windows Telnet session, so I thought that I would show screen captures of the output from all three utilities.

Oracle SQL Developer:

SQL*Plus on a Windows client:

SQL*Plus on a Synology NAS connected by a Windows Telnet session:

The DESCRIPTION column output of Oracle SQL Developer matches what appears in the ERP system.  The degree character (°), center line/diameter character (Ø – might also be considered the empty set character), and registered trademark character (®) require a single byte for storage in the WE8MSWIN1252 characterset, but two bytes for storage in the AL32UTF8 characterset.  Once translated to the AL32UTF8 characterset, the description for part ID LINED021274 will be too long to fit into the maximum 40 byte long DESCRIPTION column.  Several utilities exist for scanning a database to identify similar characterset conversion issues.

The Oracle Database 11.2 documentation lists three ways to change the characterset of the database:

  • Migrating Character Data Using a Full Export and Import
  • Migrating a Character Set Using the CSALTER Script
  • Migrating Character Data Using the CSALTER Script and Selective Imports

I started wondering if it really was necessary, in this case, to change the database characterset?  I examined the table creation script that the ERP system installer would use to create a new database – below is a portion of the script to create the PART table:


Well, that table creation script would generate a bit different table when compared with the DESC PART output that I posted above – the VARCHAR2 columns would be created as NVARCHAR2, and the CHAR columns would be created as NCHAR.  So, the new version of the ERP system will be working with NVARCHAR2 and NCHAR columns, which are already setup to use the AL16UTF16 national characterset.

I setup a test version of the database using the same Oracle Database version, and instructed the ERP conversion software to upgrade from version 6.5.2 to version 8.0 (temporarily passing the ERP database through versions 6.5.3, 6.5.4, 7.0, 7.1.1, and 7.1.2).  I measured the amount of redo generated to make certain that I would not run out of disk space in the fast (flash) recovery area during the production database upgrade (because I still had the test database in noarchivelog mode, I simply checked the redo size statistic in V$SYSSTAT at various points during the conversion).

The redo generated to upgrade from version 6.5.2 through version 7.1.2 was about 1.75 times the byte size of the tables and indexes in the ERP schema in the database.  The redo generated to upgrade from version 7.1.2 to 8.0 (the next version) required approximately 27.06 times the byte size of the tables and indexes in the ERP schema in the database!  To put those numbers in perspective with simple numbers, assume that the byte size of the tables and indexes in the ERP schema in the database is about 100GB.  The upgrade from version 6.5.2 to 7.1.2 will generate approximately 175GB of redo, so there must be at least that much space in the fast (flash) recovery area for the archived redo logs generated, if that is where the archived redo logs are placed.  The first RMAN backup might also backup the archived redo logs, and if those backups are also stored in the fast (flash) recovery area there will be a need for at least an additional 175GB in the fast (flash) recovery area.  For the upgrade from version 7.1.2 to 8.0 approximately 2706GB of redo would be generated, and again the first RMAN backup might also include those archived redo logs.

What is happening during the 7.1.2 to 8.0 upgrade?  The 7.1.2 to 8.0 upgrade finds all of the CHAR and VARCHAR2 columns in all SYSADM (schema owner) tables, and for each table, sorts those columns alphabetically, creates a new column for one of those columns (with a _NEW suffix in the column name) that is defined as NCHAR or NVARCHAR2, executes a SQL statement to set the new column value to be the same as the original column value, drops the original column, and renames the new column so that it matches the old column name.  This process is then repeated for each of the CHAR and VARCHAR2 columns in the table, before moving onto the next table.  This is likely NOT the ideal way to change the column data types as this method generates enormous amounts of redo due to the number of times each data block is modified (and also likely leads to a lot of row chaining), which if you are correctly running the database in archived redo log mode, could very well completely fill your fast (flash) recovery area several times over with the archived redo logs.  The approach implemented also causes the tables’ primary key columns to be scattered though the table definition – with Oracle Database it is faster to access the primary key columns during a full table scan if those columns are the first columns in the table definition (reference and the follow up article).

There are two other methods that I can think of right now that might have been used to reduce the amount of redo generated and preserve the column order in the tables during the 7.1.2 to 8.0 upgrade (I wonder if the ERP vendor would be willing to modify their approach?).  They could have done something like this to simply change all of the column definitions for a single table all at once in a single ALTER TABLE command:


A second option is to create a new table (with a _NEW suffix in the name) with all of the columns from the original table (just with CHAR changed to NCHAR and VARCHAR2 changed to NVARCHAR2), perform a statement such as INSERT INTO PART_NEW SELECT * FROM PART;, DROP PURGE the old table, then rename the new table as the old table.

Due to the method implemented, the converted PART table after the upgrade to version 8.0 appears something like this, where the primary key column (ID) is now the 68th column in the table definition:

 Name                      Null?    Type
 ------------------------- -------- --------------
 ORDER_POINT                        NUMBER(14,4)
 SAFETY_STOCK_QTY                   NUMBER(14,4)
 FIXED_ORDER_QTY                    NUMBER(14,4)
 DAYS_OF_SUPPLY                     NUMBER
 DEF_SLS_TAX_GRP_ID                 NVARCHAR2(15)
 DESCRIPTION                        NVARCHAR2(40)
 DETAIL_ONLY               NOT NULL NCHAR(1)
 DIMENSIONS_UM                      NVARCHAR2(15)
 HTS_CODE                           NVARCHAR2(20)
 ID                        NOT NULL NVARCHAR2(30)

Other tables, such as the OPERATION table, have composite primary keys that are composed of character and number columns – after the conversion to 8.0 the numeric column (the last column in the composite primary key – WORKORDER_TYPE, WORKORDER_BASE_ID, WORKORDER_LOT_ID, WORKORDER_SPLIT_ID, WORKORDER_SUB_ID, SEQUENCE_NO) is listed first in the table definition, with the other columns of the primary key scattered by alphabetic order after the numeric and date columns.

Thoughts and opinions about the above?  Am I being overly critical?  I did not mention that it took about five or six tries for a successful ERP upgrade – two tries failed due to a bug in the conversion utility, and the other failed attempts were caused by the conversion utility stumbling over extensions (mostly logging triggers) that I added to the database over the years (but disabled before the ERP upgrade).

This blog is currently at 1,001,992 views, having crossed the 1,000,000 mark a couple of days ago.  I would like to take this time to say thanks to the readers of the blog for visiting, and thanks to the ERP vendor for giving me something a bit different to think about, even if it was something that I thought about almost a decade ago.



3 responses

14 10 2015
Jose Rodriguez

It stunts me to find out how ERP vendors, in general, seem to be unable to hire a development DBA who may improve these things “in house” instead of having their customers to deal with a quite enough complicated process, and the burdens the lack of knowledge, from the ERP side, on how to do things at DB base more efficiently.

Now something I don’t understand of your process: why would you have to DB in archivelog mode? I would understand the process as an atomic one, either upgrade or not so, no archivelog and snapshots of the DB before, during if possible to pause the process, and after.

14 10 2015
Charles Hooper

Thanks for the comments. This particular ERP system used to support Oracle Database, SQL Server, and Centura (Gupta/Unify) SQLBase – support for SQLBase was dropped a couple of years ago. Infor (the company that offers the ERP system) is pushing customers that had been using SQLBase primarily to SQL Server (Oracle Corp. is a competitor in the ERP market, but so is Microsoft). Up until 12-13 months ago Infor had been heading in a direction with the ERP system to use a different underlying development platform (Mongoose), one where SQL Server is the primary database platform, and Oracle Database would be accidentally supported when support was added for PostgreSQL sometime later. As part of that upgrade, all database-side specific code (triggers, procedures, foreign keys, etc.) would be eliminated. Previous generations of the ERP platform made good use of triggers, procedures, foreign keys, logical and consistent naming of tables and columns, etc. Fortunately, after several years (starting around 2006-2008) of heading in the direction of treating the database as a generic data dumping ground, the company reversed course 12-13 months ago. We have more than 20 years of historical transaction data in the ERP system database, so it was a relief to see the company not throw away their decent data model.

I also shared a rewritten portion of this article with a Yahoo group mailing list for the ERP platform, hoping to attract the interest of a couple of other companies that are running the ERP platform with an Oracle Database backend, before I file a bug report with Infor asking that they fix their upgrade process. I gave an Oracle Database presentation (the only Oracle presentation) at their 2008 user conference, but suspect that does not carry as much weight today as it did in 2008. I am hoping that other customers will also file a similar bug report.

The Oracle Database containing the ERP data is currently in archivelog mode with a recovery catalog database, and I suspect (hope) that most other production databases used with the ERP system are also running in archivelog mode. You do have a good point, and switching to noarchivelog mode during the upgrade is one of the options that I am considering if Infor does not change the approach used during the final stage of the ERP system upgrade. Their upgrade process is not resumable in the event of an error, although it is possible to pause after each of the intermediate version upgrades (6.5.3, 6.5.4, 7.0, 7.1.1, 7.1.2) to create a backup, recover that backup, and then continue the upgrade from that point onward.

15 10 2015
Jose Rodriguez

I see the issue here and is not new to me. New development frameworks that completely ignore the database they are dropping their data may be good for small business, agile projects and toying around, but big business need a strong development team knowing what they do at all levels or, at least, at the three basic ones: client, application server, database.

I understand and encourage production systems to be in archivelog mode, of course, I was just showing how I would face the upgrade process itself, as you have correctly understood.

Anyway, good luck with the upgrade process.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: