Character set conversion in Oracle

Here is the procedure to convert Oracle database (11G) from  US7ASCII to  UTF8/ALT32UTF8, we must be using similar steps to convert to a different character set.

In this example we will be using csscan utility to find lossy and convertible data in database and csalter utility to alter characterset.

  • Install csscan utility :
SQL> @$ORACLE_HOME/rdbms/admin/csminst.sql
  • Verify current characterset :
SQL> select * from nls_database_parameters where parameter='NLS_CHARACTERSET';
PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET US7ASCII
  • Check for invalid objects :
SQL> SELECT owner,
object_name,
object_type,
status
FROM dba_objects
WHERE status ='INVALID'
/

Recompile all invalid objects in the database :

SQL> @?/rdbms/admin/utlrp.sql
  • Run DB healthcheck :
BEGIN
DBMS_HM.RUN_CHECK('Dictionary Integrity Check', 'my_run');
END;
  • Run CSSCAN utility (US7ASCII to US7ASCII) :
csscan \"/ as sysdba\" FULL=Y FROMCHAR=US7ASCII TOCHAR=US7ASCII LOG=dbcheck CAPTURE=N ARRAY=1000000 PROCESS=2

Here we will get 2 files, dbcheck.txt and dbcheck.err. Verify application data in both files and fix all lossy data or ask application teams to fix lossy data obtained in dbcheck.err file ([Application data individual exceptions] section). It is very important to fix all lossy data after running csscan, else csalter will not work.

  • Run CSSCAN utility (TOCHAR UTF8) :
csscan \"/ as sysdba\" FULL=Y TOCHAR=UTF8 LOG=TOUTF8_2 CAPTURE=Y ARRAY=1000000 PROCESS=2

We again need to run csscan utility(with target charset as UTF8) to find lossy and convertible data because of characterset conversion, now if we get any lossy data in this step we need to fix the lossy data first. All convertible data still needs to be exported and imported, which is done in next step.

  • Backup all impacted tables (got from cscan report) and truncate tables:
SQL> CREATE DIRECTORY backup_dir AS '/u01/backup/table_data';
expdp \"/ as sysdba\" tables=schema1.table1,schema2.table2 directory=backup_dir dumpfile=table.dmp logfile=table.log

Truncate above tables with convertible/lossy data.

  • Run csscan utility again:
csscan \"/ as sysdba\" FULL=Y TOCHAR=UTF8 LOG=TOUTF8_3 CAPTURE=Y ARRAY=1000000 PROCESS=2

Should result following output in logs
[Scan Summary]
1) All character type application data remain the same in the new character set
2) The data dictionary can be safely migrated using the CSALTER script
Note that csalter will fail if we dont see above output.

  • RUN csalter:
@?/rdbms/admin/csalter.plb
  • RUN import to import data back to db :
impdp \"/ as sysdba\" tables=schema1.table1,schema2.table2 directory=backup_dir dumpfile=table.dmp logfile=import_table.log TABLE_EXISTS_ACTION=APPEND

We are using TABLE_EXISTS_ACTION=APPEND to just import data since we only truncated tables.

  • Restart DB and Verify :
SQL> shut immediate
SQL> startup
SQL> select * from nls_database_parameters where parameter='NLS_CHARACTERSET';
PARAMETER VALUE
------------------------------ ------------------------------------------------------------------------------------------------------------------------
NLS_CHARACTERSET UTF8

View oracle 11g charset Migration doc for further information.

Leave a Reply

Your email address will not be published. Required fields are marked *