ops-25: unicode and the dataserver david moloney software architect

49
OPS-25: Unicode and the DataServer David Moloney Software Architect

Upload: camron-griffin-harrell

Post on 12-Jan-2016

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OPS-25: Unicode and the DataServer David Moloney Software Architect

OPS-25: Unicode and the DataServer

David MoloneySoftware Architect

Page 2: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation2 OPS-25: Unicode and the DataServer

Agenda

Unicode:• How did we get here ?

• What are its broader OpenEdge implications ?

• What are its DataServer implications ?

• Specific Implementation in the DataServers for: – Oracle®

– MS SQL Server

Unicode deployment with OpenEdge® DataServers

Page 3: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation3 OPS-25: Unicode and the DataServer

Code Pages

Special Chars

9 \t(Tab)

10 \n(NL)

13 \r(CR)

32 Space

33 !34 “35 #37

65 A66 B67 C68 D69 E70 F71 G72

……

… …

…… …

Upper Case

97 a98 b99 c100 d101

… …… …

Lower Case128 € 129 � 130 ‚ 131 ƒ 132 „ 133 …

……

ü 253 ý 254

255

……

………

125

126

127

……

ASCII: 7-bit 127 Character SetExtended ASCII

Extended 255Character Sets:• ISO8859-1 • 1250• IBM437/850

Page 4: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation4 OPS-25: Unicode and the DataServer

8-bit Code Pages

Examples of character encoding:

ISO8859-1 ISO8859-2 1252 1250 IBM437 IBM850 IBM852

a 61 61 61 61 61 61 61

á E1 E1 E1 E1 A0 A0 A0

È C8 n/a C8 n/a n/a D4 n/a

Č n/a C8 n/a C8 n/a n/a AC

“ n/a n/a 93 93 n/a n/a n/a

Page 5: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation5 OPS-25: Unicode and the DataServer

Data Corruption

“è”

France Czech Republic

E8 E8

“č”

1250ISO8859-1

Avoid Avoid ThisThis

Page 6: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation6 OPS-25: Unicode and the DataServer

What is Unicode ? (“Unique Code”)

A character encoding standard that:

• Replaces all legacy SBCS & MBCS systems• Can assign more than a million numbers

– Highest code point: “U+10FFFF”=2^20+2^16=1,114,112

• Gives one “unique” number/text-symbol-character• Provides one internationalization process• Is Not platform, program, country or language

specific• Is essential to the Web (HTML, XML, etc.)

Page 7: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation7 OPS-25: Unicode and the DataServer

How is Unicode encoded ?

“UTF-x”UTF = Unicode Transformation Format x = Minimum length of coding unit

U+0000

U+0001

U+0002

U+0003

U+00FF ÿ …

…U+100000 … …

U+10FFFD

U+10FFFE

U+10FFFF

Extended ASCII(ISO8859-1)

… UTF-32

UTF-32 UTF-8

The Encoding Tradeoff

Ease of Use Storage Space

= 1,114,112

…… ……UTF-16

CharANSI

Number

Unicode

Number

ANS

Hex

Unicode

Hex

Unicode

Range

ÿ 255 255 0xFF U+00FFBasic Latin

Supplementary Range

……

U+FFFF

BMP

Page 8: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation8 OPS-25: Unicode and the DataServer

UTF Encoding Examples

Unicode UTF-8 UTF-16 UTF-32

U+004D 4D 00 4D 00 00 00 4D

U+00A1 C2 A1 00 A1 00 00 00 A1

U+00E1 C3 A1 00 E1 00 00 00 E1

U+0470 D0 C0 04 70 00 00 04 70

U+4E9C E4 BA 9C 4E 9C 00 00 4E 9C

U+10302 F0 90 9C 82 D8 00 DF 02 00 01 03 02

BMP

BMP

Page 9: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation9 OPS-25: Unicode and the DataServer

UTF Encoding Examples

Unicode UTF-8 UTF-16 UTF-32

U+004D 4D 00 4D 00 00 00 4D

U+00A1 C2 A1 00 A1 00 00 00 A1

U+00E1 C3 A1 00 E1 00 00 00 E1

U+0470 D0 C0 04 70 00 00 04 70

U+4E9C E4 BA 9C 4E 9C 00 00 4E 9C

U+10302 F0 90 9C 82 D8 00 DF 02 00 01 03 02

BMP

BMP

AL32UTF8 4-byte “Standard”: F0 90 9C 82UTF8 3-byte “Modified”: C0 D8 00 80 DF 02

(Oracle) NLS_LANG

Page 10: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation10 OPS-25: Unicode and the DataServer

Unicode Conversion

All code pages convert to Unicode Unicode may not convert to other code pages

UnicodeUnicode

IBM437IBM437

IBM852IBM852

IBM850IBM850

12501250

12521252

ISO8859-2ISO8859-2

ISO8859-1ISO8859-1

IBM437IBM437

IBM852IBM852

IBM850IBM850

12501250

12521252

ISO8859-2ISO8859-2

ISO8859-1ISO8859-1

??

Page 11: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation11 OPS-25: Unicode and the DataServer

Agenda

Unicode:• How did we get there ?

• What are its broader OpenEdge implications ?

• What are its DataServer implications ?

• Specific Implementation in the DataServers for: – Oracle– MS SQL Server

The path to successful development & deployment

Page 12: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation12 OPS-25: Unicode and the DataServer

The Unicode “Solution” ? Yes !

YES !• One stop shopping for Internationalization!

NO, there are considerations to be addressed:

• Operating System• Web Server (XML Schemas and HTML)• Print drivers• Data from/to other systems• OCX’s• Terminal Emulators

Page 13: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation13 OPS-25: Unicode and the DataServer

OpenEdge Globalization Settings

Primary

Parameters

Secondary

Parameters

Database

Settings -cpinternal -cplog _db._db-xl-name

-cpstream -cpterm _db._db-coll-name

-cpcoll -cpprint

-d -numsep

-E -numdec

-cprcodein

-cprcodeout

-lng

For more info: See “Internationalizing Applications” Guide

Existing OpenEdge Constructs: • Convmap.cp – Character Processing Tables• Progress.ini Fonts

New OpenEdge Construct:• ICU Library – For Linguistic Sorting

Page 14: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation14 OPS-25: Unicode and the DataServer

Common Mistakes

Loading or importing data with the wrong code page

C4 8C 7A 65

63 68

C4 8C 7A 65

63 68

ČzechČzech

ÄŚzechÄŚzech

ÄŒzechÄŒzechISO8859-1

UTF-8

1250

Page 15: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation15 OPS-25: Unicode and the DataServer

Byte Order Mark (BOM)

EF BB DF C4

8C 7A 65 63

68

EF BB DF C4

8C 7A 65 63

68

ČzechČzech

ČzechČzech

ČzechČzechISO8859-1

UTF-8

Write

CautionCaution !

OUTPUT TO text.txt CONVERT TARGET "UTF-8".PUT CONTROL "~357~273~277". /* BOM */PUT UNFORMATTED "UTF-8 text".OUTPUT CLOSE

1250

Page 16: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation16 OPS-25: Unicode and the DataServer

(…)"imuller" "Ian Muller" "Y" "C" 1657 283200"jdoe" "Jane Doe" "N" "U" 3275 450010"jsmith" "John Smith" "Y" "C" 1450 323700"jsanchez" "Juan Sánchez" "Y" "C" 4250 323900.PSCfilename=usersrecords=0000000001133ldbname=mydatabasetimestamp=2007/03/28-20:55:03numformat=44,46dateformat=mdy-1950map=NO-MAPcpstream=ISO8859-1.0000143373

(…)"imuller" "Ian Muller" "Y" "C" 1657 283200"jdoe" "Jane Doe" "N" "U" 3275 450010"jsmith" "John Smith" "Y" "C" 1450 323700"jsanchez" "Juan Sánchez" "Y" "C" 4250 323900.PSCfilename=usersrecords=0000000001133ldbname=mydatabasetimestamp=2007/03/28-20:55:03numformat=44,46dateformat=mdy-1950map=NO-MAPcpstream=ISO8859-1.0000143373

Common Mistakes

Loading or importing data with the wrong code page

Page 17: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation17 OPS-25: Unicode and the DataServer

_progres_progres

E0-cpstream IBM850

Common Mistakes

Updating data with the wrong code page

_mprosrv_mprosrv

OS = 1252

àà ÓE0

D3

E0D3-cpinternal IBM850

-cpinternal ISO8859-1

_db-xl-nameISO8859-1

Page 18: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation18 OPS-25: Unicode and the DataServer

_progres_progres

E0-cpstream 1252

Common Mistakes

Updating data with the CORRECT code page

_mprosrv_mprosrv

OS = 1252

àà àE0

E0

85E0-cpinternal IBM850

-cpinternal ISO8859-1

_db-xl-nameISO8859-1

Page 19: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation19 OPS-25: Unicode and the DataServer

DataServerfor ODBC

DataServerfor ODBC

0x0AIso8859-1 ASCII

Real Life Story

ASCII Linefeed (0x0A) to EBCDIC Newline (0x25)

0D 0A0D 0A

-cpinternal iso8859-1

0x0A0x0A

0x0A

0x0AIBM037 EBCDIC

_db-xl-nameIBM037

OpenEdge Client

Hi Bob,CRLFHow are you?CRLFBye

Hi Bob,CRLFHow are you?CRLFBye

Hi Bob,▐How are you?▐ByeHi Bob,▐How are you?▐Bye

-cpstreamiso8859-1

Page 20: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation20 OPS-25: Unicode and the DataServer

DataServer for ODBC

DataServer for ODBC

0x0AIBM850 ASCII

Real Life Story

ASCII Linefeed (0x0A) to EBCDIC Newline (0x25)

OD 0AOD 0A

-cpinternal IBM850

0x0A0x0A

0x25

0x25IBM037 EBCDIC

_db-xl-nameIBM037

OpenEdge Client

Hi Bob,CRLFHow are you?CRLFBye

Hi Bob,CRLFHow are you?CRLFBye

Hi Bob,How are you? Bye

Hi Bob,How are you? Bye

-cpstreamIBM850

Page 21: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation21 OPS-25: Unicode and the DataServer

Tips & Hints

Un-corrupting data

FOR EACH myTable EXCLUSIVE-LOCK. RUN FixChar(INPUT-OUTPUT myTable.myField).END.

PROCEDURE FixChar: DEF INPUT-OUTPUT PARAM c AS CHAR NO-UNDO. c = CODEPAGE-CONVERT(c,"IBM850","ISO8859-1").END PROCEDURE.

ISO8859-1 database with data encoded in IBM850

Run on session with -cpinternal iso8859-1

Page 22: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation22 OPS-25: Unicode and the DataServer

Database Sorting Rules

FOR EACH table WHERE name <= CHR(126).

Are not all the same

#$~Alphanumerics

-cpinternalMSS 1252

#$Alphanumerics~

_Db._Db-collateIso8859-1 Basic

FOR EACH table WHERE name >= CHR(126).

Page 23: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation23 OPS-25: Unicode and the DataServer

Agenda

Unicode:• How did we get there ?

• What are its broader OpenEdge implications ?

• What are its DataServer implications ?

• Specific Implementation in the DataServers for: – Oracle– MS SQL Server

The path to successful development & deployment

Page 24: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation24 OPS-25: Unicode and the DataServer

D I S C L A I M E R

Under Development

This talk includes information about potential future products and/or product enhancements.

What I am going to say reflects our current thinking, but the information contained herein is preliminary and subject to change. Any future products we ultimately deliver may be materially different from what is described here.

D I S C L A I M E R

Page 25: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation25 OPS-25: Unicode and the DataServer

Unicode Deliverables

10.0A

10.0B

10.1B03

10.1C

10.1C01

Future

10.0A

10.0B

10.1B03

10.1C

10.1C01

Future

Unicode

ICUCollation

Unicodefor

MSSDataSrvr(limited)

Unicodefor

MSS+

OracleDataSrvr

+ CLOBs

OracleNCLOBSupport

MSSCLOB

Support

+

CLOBParams

ToStoredProc.’s

Page 26: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation26 OPS-25: Unicode and the DataServer

OpenEdge Settings

OpenEdge ProcessOpenEdge Process

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

DatabaseDatabase

_db-xl-name

OpenEdgeOpenEdgecode pagecode page

conversionsconversions

OpenEdgeOpenEdgecode pagecode page

conversionsconversionsPrinterPrinter

GU

IG

UI

CH

UI

CH

UI

_db-xl-name, -cpinternal and -cpstream

Page 27: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation27 OPS-25: Unicode and the DataServer

DataServer

Layer or process

DataServer

Layer or process

OpenEdge Settings

OpenEdge ProcessOpenEdge Process

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

ForeignData

Source

ForeignData

SourceDatabase CP

OpenEdgeOpenEdgecode pagecode page

conversionsconversions

OpenEdgeOpenEdgecode pagecode page

conversionsconversionsPrinterPrinter

GU

IG

UI

CH

UI

CH

UI

_db-xl-name, -cpinternal and -cpstream

DBDriver

DBDriver

SchemaHolder

SchemaHolder

_db-xl-name

Nee

ds to

mat

ch

DriverDriverConversions ?Conversions ?

DriverDriverConversions ?Conversions ?

Nee

ds to

mat

ch

Page 28: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation28 OPS-25: Unicode and the DataServer

DATASERVER_orasrv

DATASERVER_orasrv

OS filesOS files

-cpinternal

-cpstream

OpenEdge Settings

GUI CLIENTprowin32

GUI CLIENTprowin32

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

ORACLEDatabase

ORACLEDatabase

_db-xl-name

WEBSPEED™_progres -web

WEBSPEED™_progres -web

OS filesOS files

WebBrowser

WebBrowser

-cpinternal

-cpstream

APPSERVER™

_proapsv

APPSERVER™

_proapsv

OS filesOS files

-cpinternal

-cpstream

PrinterPrinter

PrinterPrinter

CHUI CLIENT_progres

CHUI CLIENT_progres

OS filesOS files

KeyboardKeyboard

ScreenScreen

-cpinternal

-cpstream

PrinterPrinter

DriverDriver

SchemaHolder

SchemaHolder

_db-xl-name

mat

ch

mat

ch

Page 29: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation29 OPS-25: Unicode and the DataServer

Dictionary Utilities changed for Unicode

• Schema Migration *– Including Unicode batch mode parameters

• Update/Add Table Definitions +

• Verify Table Definitions +

• Adjust Schema +

• Generate delta.sql *

• Dump as Create Table Statement *

For Both Oracle and MS SQL Server

* “Use Unicode Types” GUI selection provided+ Modified to handle Unicode types internally

Page 30: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation30 OPS-25: Unicode and the DataServer

Comparing 10.1C Unicode: Oracle vs. MSS

Attribute OpenEdge ORACLE MSS

Unicode Definitions

DB-Codepage(_db._db-xl-name)

DB-Codepage Data Types

Data Types

Data TypesCHAR,

LONGCHAR,

CLOB

CHAR,VARCHAR2, LONG, CLOB

NCHAR, NVARCHAR2,

NCLOB (in 10.1C01)

NCHAR, NVARCHAR,

NVARCHAR(max)and NTEXT mapped to OpenEdge CHAR

Max. Char SizeCHAR: 30,000 bytes

LONGCHAR/CLOB: 1G

CHAR types: 4000 bytes

CLOB types: 4G

CHAR types: 8000 bytes

CLOB types: 2G

Max. Char Size for Unicode

Same as above but...

CHAR: 15,000 bytesusing MSS DataServer

4000 bytes 4000 chars

Semantics Character Character or Byte (double-byte) Character

Driver Settings N/A NLS_LANG=.AL32UTF8 ACP=Active Code Page

Database Code Pages

UTF-8

NLS_CHARACTERSETS:

AL32UTF8 & UTF8

NLS_NCHAR_CHARACTERSETS

AL16UTF16 or UTF8

UCS-2 (partial UTF-16)

Page 31: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation31 OPS-25: Unicode and the DataServer

Common Unicode Requirements

OpenEdge ProcessOpenEdge Process

.d file

cpstream=ISO8859-1

.d file

cpstream=ISO8859-1

-cpinternalUTF-8

-cpstreamUTF-8

DatabaseDatabase

_db-xl-nameANSI

orUTF-8

DataServer

Layer or process

DataServer

Layer or processForeign

DataSource

ForeignData

Source

Database CP

DBDriver

DBDriver

SchemaHolder

SchemaHolder

_db-xl-nameUTF-8

.d file

cpstream=ISO8859-5

.d file

cpstream=ISO8859-5

Build from:$DLC/prolong/utf/empty

PR

OD

B

DataServer Migration

Recommended: Set $DLCDB environment variable to$DLC/prolong/utf

Needs to m

atch

Nee

ds to

mat

ch

DriverDriverConversions ?Conversions ?

DriverDriverConversions ?Conversions ?

OpenEdgeOpenEdgecode pagecode page

conversionsconversions

OpenEdgeOpenEdgecode pagecode page

conversionsconversions

Page 32: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation32 OPS-25: Unicode and the DataServer

Agenda

Unicode:• How did we get there ?

• What are its broader OpenEdge implications ?

• What are its DataServer implications ?

• Specific Implementation in the DataServers for: – Oracle– MS SQL Server

The path to successful development & deployment

Page 33: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation33 OPS-25: Unicode and the DataServer

10.1C ORACLEDataServer

Layer or process

10.1C ORACLEDataServer

Layer or process

Oracle DataServer Migration

ORACLE 9i+Database

ORACLE 9i+Database

Database CharsetNational Charset

_db-xl-name, -cpinternal and -cpstream

OCIClient LibraryNLS_LANG=.AL32UTF8

OCIClient LibraryNLS_LANG=.AL32UTF8

SchemaHolder

SchemaHolder

VARCHARNVARCHAR

CLOBCFILE

NCLOB

DriverDriverconversionsconversions

DriverDriverconversionsconversions

OpenEdge ProcessOpenEdge Process

.d file

cpstream=ISO8859-1

.d file

cpstream=ISO8859-1

-cpinternalUTF-8

-cpstreamUTF-8

DatabaseDatabase

_db-xl-nameANSI

orUTF-8

.d file

cpstream=ISO8859-5

.d file

cpstream=ISO8859-5

OpenEdgeOpenEdgeconversionsconversionsOpenEdgeOpenEdge

conversionsconversions

_db-xl-nameUTF-8

Mat

ch

Page 34: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation34 OPS-25: Unicode and the DataServer

Oracle Unicode Migration

What version of ORACLE Unicode Instance and

Unicode drivers must be 9i or above

Codepage for Schema Image Declares Unicode

Collation Name Sets ICU collation

Page 35: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation35 OPS-25: Unicode and the DataServer

Oracle Unicode Migration

Use Unicode Types

Unchecked – Uses Database CharsetNLS_CHARACTERSETS:

AL32UTF8 UTF8

Checked – Uses National Language Charset

NLS_NCHAR_CHARACTERSETS:

AL16UTF16 UTF8

Two ways to configure an ORACLE database to store Unicode:

Page 36: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation36 OPS-25: Unicode and the DataServer

Oracle Unicode Migration

For field width’s use Width (recommended)

Use SQL Width Tool

Char semantics Checked –

CHAR(10) = 10 chars

(w/UTF8 =10–30 bytes)(w/AL32UTF8=10-40 bytes)

Unchecked –CHAR(10) = 10 bytes

Page 37: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation37 OPS-25: Unicode and the DataServer

Oracle Unicode Migration

Maximum char length Use Unicode Types

= 2000 (assumesNCS = AL16UTF16 )

Use Unicode Types = 1000 (assumes

DB CP = AL32UTF8

Expand to CLOB Checked –

Greater than Maximum char length produces CLOB

Unchecked –Greater than Maximum charlength produces LONG(backward compatible)

Page 38: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation38 OPS-25: Unicode and the DataServer

Agenda

Unicode:• How did we get there ?

• What are its broader OpenEdge implications ?

• What are its DataServer implications ?

• Specific Implementation in the DataServers for: – Oracle– MS SQL Server

The path to successful development & deployment

Page 39: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation39 OPS-25: Unicode and the DataServer

10.1C MSSDataServer

Layer or process

10.1C MSSDataServer

Layer or process

MS SQL Server DataServer Migration

MSS 2005Database

MSS 2005Database

UCS-2UTF-16

_db-xl-name, -cpinternal and -cpstream

ODBC

DriverACP =OS CP

ODBC

DriverACP =OS CP

SchemaHolder

SchemaHolder

NCHARNVARCHAR

NTEXTNVARCHAR(max)

DriverDriverconversionsconversions

DriverDriverconversionsconversionsOpenEdge ProcessOpenEdge Process

.d file

cpstream=ISO8859-1

.d file

cpstream=ISO8859-1

-cpinternalUTF-8

-cpstreamUTF-8

DatabaseDatabase

_db-xl-nameANSI

orUTF-8

.d file

cpstream=ISO8859-5

.d file

cpstream=ISO8859-5

OpenEdgeOpenEdgeconversionsconversionsOpenEdgeOpenEdge

conversionsconversions

_db-xl-nameUTF-8

Implie

d Match

Page 40: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation40 OPS-25: Unicode and the DataServer

MS SQL Server Unicode Migration

ODBC Data Source Name Must be Unicode Driver

Codepage for Schema Image Declares Unicode

Collation Name Sets ICU collation

Use Unicode Types Checked – Selects Unicode (Changes Codepage to

UTF-8)

NVARCHAR types

Unchecked – Uses non-Unicode character types

VARCHAR types

Page 41: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation41 OPS-25: Unicode and the DataServer

MS SQL Server Unicode Migration

Maximum char length Use Unicode Types

= 4000 (assumesMSS 2005 = UCS-2

For field width’s use Width (recommended)

Use SQL Widtth Tool

Expand width (utf-8) Checked – Doubles width defined

for NVARCHAR types NVARCHAR(1000) becomes

NVARHCAR (2000)

Page 42: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation42 OPS-25: Unicode and the DataServer

Linguistic Sorting and Collation

FOR EACH mytable BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-fi"): DISPLAY myfield WITH FONT 8.END.

Sorting with Finnish collation

AaaÁááBbbCccĈĉĉÇççZzzÄää

AaaÁááÄääÇççĈĉĉBbbCccZzz

Basic ICU-fi

AaaÁááÄääBbbCccĈĉĉÇççZzz

ICU-UCA

Page 43: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation43 OPS-25: Unicode and the DataServer

Linguistic Sorting and Collation

FOR EACH mytable WHERE COMPARE(myfield,">=","C", "CASE-INSENSITIVE","ICU-fi") BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-fi"): DISPLAY myfield WITH FONT 8.END.

Comparing with Finnish collation

CccĈĉĉÇççZzzÄää

CccZzz

Basic ICU-fi

CccĈĉĉÇççZzz

ICU-UCA

Page 44: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation44 OPS-25: Unicode and the DataServer

Linguistic Sorting and Collation

Global Setup

DatabaseDatabase

-cpcoll ICU-uca-cpcoll ICU-uca

AppServerAppServerEnglish User

French User

Czech User

Finnish User

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

TEMP-TABLES

-cpcoll ICU-en-cpcoll ICU-en

-cpcoll ICU-fr-cpcoll ICU-fr

-cpcoll ICU-cs-cpcoll ICU-cs

-cpcoll ICU-fi-cpcoll ICU-fi

-cpcoll ICU-uca---

Uses clientcollation inCOMPARE

andCOLLATE

-cpcoll ICU-uca---

Uses clientcollation inCOMPARE

andCOLLATE

RUN ASprg.p ON hAppServer (INPUT SESSION:CPCOLL, INPUT USERID, INPUT <other parameters>, OUTPUT TABLE ttMytable).

Caution with performance!Caution with performance!

Page 45: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation45 OPS-25: Unicode and the DataServer

8-bit Code Pages

Where to find code page tables:• 10.1B Internationalizing Applications manual (IBM850 and

ISO8859-1) • http://www.microsoft.com/globaldev/reference/cphome.mspx• http://www-03.ibm.com/servers/eserver/iseries/software/

globalization/codepages.html• http://en.wikipedia.org• http://www.fileformat.info/info/charset/index.htm

Where to find Unicode Fonts:• http://en.wikipedia.org/wiki/Code2000

Information about Windows fonts:http://www.microsoft.com/typography/fonts/default.aspxhttp://www.microsoft.com/globaldev/getwr/steps/wrg_font.mspx

Page 46: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation46 OPS-25: Unicode and the DataServer

For More Information, go to…

PSDN• B2420-LV: From 26 to 96,000 Characters in 60 Minutes• DEV-10: Supporting Multiple Languages in Your

Application• DEV-23: Global Applications and Code Pages

Progress eLearning Community:• Understanding Internationalization – Salvador Vinals

Documentation:• OpenEdge Data Management: DataServer for Oracle• OpenEdge Data Management: DataServer for Microsoft

SQL Server• OpenEdge Development: Internationalizing Applications

Page 47: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation47 OPS-25: Unicode and the DataServer

Questions?

Page 48: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation48 OPS-25: Unicode and the DataServer

Thank You

Page 49: OPS-25: Unicode and the DataServer David Moloney Software Architect

© 2008 Progress Software Corporation49 OPS-25: Unicode and the DataServer