Download - Implementation Issues
![Page 1: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/1.jpg)
Implementation Issues
Mark Davis2003-09-24
![Page 2: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/2.jpg)
PropertiesCore
Decomposition and Normalization
CJK
Code Point Canonical_Combining_Class IdeographicName Decomposition_Mapping Unified_IdeographRepresentative_Glyph Composition_Exclusion RadicalBlock Full_Composition_Exclusion IDS_Binary_Operator
General Decomposition_Type IDS_Trinary_OperatorAge Numeric Unicode_Radical_StrokeGeneral_Category Numeric_Value MiscScript Numeric_Type MathWhite_Space Hex_Digit Quotation_MarkAlphabetic ASCII_Hex_Digit DashHangul_Syllable_Type Case HyphenNoncharacter_Code_Point Uppercase Terminal_PunctuationDefault_Ignorable_Code_Point Lowercase DiacriticDeprecated Lowercase_Mapping ExtenderLogical_Order_Exception Titlecase_Mapping Grapheme_Base
Shaping and Rendering Uppercase_Mapping Grapheme_ExtendJoin_Control Case_Folding Grapheme_LinkJoining_Group Simple_Lowercase_Mapping Unicode_1_NameJoining_Type Simple_Titlecase_Mapping ISO_CommentLine_Break Simple_Uppercase_Mapping BidiEast_Asian_Width Simple_Case_Folding Bidi_Control
Identifiers Special_Case_Condition Bidi_MirroredID_Continue Soft_Dotted Bidi_ClassID_Start Bidi_Mirroring_GlyphXID_ContinueXID_Start
![Page 3: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/3.jpg)
Behavior
Bidirectional Algorithm (Arabic/Hebrew)Linebreak, User-Character, Word,…NormalizationCollationRegular ExpressionsProgramming Identifiers
…
![Page 4: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/4.jpg)
Scripts, not Languages
a
English
German
Italian
.
English
Russian
Armenian
।Hindi
Gujarati
Marathi
¨
English
Russian
Greek
![Page 5: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/5.jpg)
Size Doesn’t Matter
Text storage size is approximately the same for all languagesIn real data, other data dominatesCompression available if needed ZIP SCSU BOCU
![Page 6: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/6.jpg)
Normalization
Produces Unique FormComparison, Matching, CountingUsed in Collation International Domain Names W3C Character Model (Web) Network File System…
![Page 7: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/7.jpg)
Transcoding: ISCII - Unicode
ISCII Halant + Halant Halant + Nukta INV halant RA ATR EXT
Unicode Halant + ZWJ Halant + ZWNJ SPACE virama RA Not in plain text Not required
![Page 8: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/8.jpg)
Unicode = Lingua Franca
Transcoding = Converting from one character encoding to anotherMany standards / systems defined in terms of Unicode C#, Java, XML, …
Unicode
cp1252
SJISGB18030
ISCII ISCII
![Page 9: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/9.jpg)
Transliteration
Round-trip Transliterations ↔ श śa Ideal published form Unique source sequence → unique target
Best-Fit Transliterations श → sa For limited environments
Keyboard Transliterations श ← ssa Limited to QWERTY keys
Indic-Indic not simple mapping; “holes”
![Page 10: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/10.jpg)
Keyboards
One key → many characters
Many keys → one character
क0915
�्094D
ष0937
aà
00E0`
→
→
![Page 11: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/11.jpg)
Supporting Sequences
KeyboardsFontsSelection
![Page 12: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/12.jpg)
Fonts
Required Glyphs, Positioning
Sequences Necessary to produce them
Context (e.g. in OpenType)
क0915
�्094D
ष0937
![Page 13: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/13.jpg)
Selection
Use appropriate boundaries for user-charactersArrow keys, mouse selection, etc
![Page 14: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/14.jpg)
Unicode Stability
Encoding. Once a character is encoded, it will not be moved or removed.Name. Once a character is encoded, its character name will not be changed.Normalization. Once a character is encoded, its canonical combining class and decomposition mapping will not be changed in a way that will destabilize normalization.Identity. Once a character is encoded, its properties may still be changed, but not in such a way as to change the fundamental identity of the character.Property Value. The structure of certain property values in the Unicode Character Database will not be changed.
![Page 15: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/15.jpg)
Locale Data
(examples)
![Page 16: Implementation Issues](https://reader036.vdocuments.net/reader036/viewer/2022082711/56813997550346895da12d37/html5/thumbnails/16.jpg)
Q & A