standardization of internationalized domain name at ietf

28
Standardization of Internationalized Domain Name at IETF 24 Jan 2002 Yoshiro YONEYA <[email protected] d.jp> JPNIC

Upload: whitley

Post on 12-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Standardization of Internationalized Domain Name at IETF. 24 Jan 2002 Yoshiro YONEYA JPNIC. What is IDN?. I nternationalized D omain N ame. Current domain name is represented with ASCII alpha-numeric and hyphen characters. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Standardization of Internationalized Domain Name at IETF

Standardization of Internationalized Domain Name

at IETF

24 Jan 2002

Yoshiro YONEYA <[email protected]>

JPNIC

Page 2: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 2

What is IDN?

• Internationalized Domain Name.– Current domain name is represented with

ASCII alpha-numeric and hyphen characters.– IDN is a technical challenge to represent

domain name with not only ASCII but also NON-ASCII characters.

Page 3: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 3

What is Internationalization?

• Framework to extend character repertoire for domain names.

• Need to be a Global Standard not to lose global communication.

• IETF IDN (Internationalized Domain Name) WG is doing the work.

• Some confusion by using the word ‘Multilingualization’.– Character is just one of a component of languages.– Multilingual domain name is a service level’s aspect.

Page 4: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 4

Internationalized Domain Names

华人 .公司 .cn 華人 .商業 .tw

高島屋 . 会社 .jp

삼성 . 회사 .kr 三星 . 회사 .krم. االهرام

viagénie.qc.caקום.ישראל

ที�เอชนิ�ค.พาณิ�ชย์ .ไทีย์

現代 .com ヤフー .comhttp://www.jdna.jp/activities/event/jdn-tutorial/IDNSDK.pdf

Page 5: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 5

Why IDN?

• Increases of the Internet users who are not familiar with English.– Easy to memorize, type in, etc.

• Drastic changes of usage of domain name.– Domain name is now used as not only host

name but also signboard.

• Creates new business opportunities.– Many ventures began services.

Page 6: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 6

Drawback of IDN

• Loses global acceptability at end-user interface.– Hard to type in or display NON-ASCII characte

rs without appropriate I/O devices and / or softwares.

• Cause impact to the operation.– Requires software update and / or additional pr

ocessing.– Deployment issue.

Page 7: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 7

History of IDN WG

• Established on Jan 2000.– Mainly discussion is done on mailing list.

• Had 1st meeting at 47th IETF at Adelaide.– From then, having meeting every IETF.

• Decided WG’s solution at last (52nd) IETF.– IDNA, NAMEPREP and Punycode (formerly k

nown as AMC-ACE-Z).– Waiting for WG last call.

Page 8: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 8

Scope and priority of IDN WG

• Provide standard.– Not to divide the global connectivity and communication

of the Internet.

• Backward compatibility.– Compatibility with current DNS and application protocols

to work with current Internet infrastructure.

• No localization.– Independent from certain regions, countries and / or

languages– Refer to existing universal standards– Common framework essential to internationalization

Page 9: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 9

IDNA(Internationalizing Domain Names In Applications)

draft-ietf-idn-idna-06.txt

• An architecture denotes how to process IDN.– Use Unicode which is upper compatible with ASCII as

a character codeset.– Normalize internal representation of characters which h

as multiple code points such as upper/lower, full-width/half-width and composing characters, into a single representation not to fail matching.

– Represent NON-ASCII characters which inputted or displayed at user interface as an ASCII Compatible Encoding (ACE) string on the Network.

– Those processes be performed in application software.

Page 10: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 10

Important point of IDNA

• Representation at the user interface layer and the network layer is different.– Though the same for ASCII domain names.

• Application solution.– Least impact to the Internet infrastructure.

Page 11: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 11

Image of the IDNA

User

InternalRepresentation

UI

API

Application servers

End system

Application

Local

Int’l

Resolver

DNS servers

NAMEPREPTo/From Unicode

To/From ACE

NAMEPREP

To/From ACE

To/From Unicode

Page 12: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 12

NAMEPREP(Stringprep Profile for Internationalized Host Name

s) draft-ietf-idn-nameprep-07.txt

• Profile for STRINGPREP (Preparation of Internationalized Strings)– draft-hoffman-stringprep-00.txt

• Some scripts such as alphabet have multiple representation for a character.– Domain name is case insensitive.

• Normalization process to unify representation of strings that is the same in meaning or displaying into a single representation.– Case (upper / lower)– Compatible character (full / half width)– Composing character

Page 13: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 13

Important point of NAMEPREP

• Normalize representation of Internationalized domain name string to match correctly.– ‘a’ vs ‘A’– ‘u’+‘¨’ vs ‘ü’– ‘ ア’ vs ‘ ’ア

Page 14: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 14

Processes in NAMEPREP

1. map• Case folding of upper/lower characters

(UTR#21)

2. normalize• Normalize representation of string (UAX#15

NFKC)

3. prohibit• Check out inappropriate character as domain

name.

Page 15: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 15

ACE(ASCII Compatible Encoding)

• Represent NON-ASCII characters by ASCII characters.– Easy to apply current DNS.– Least impact to current applications.

• Decreases maximum characters in each label.– Penalty of using only 5bit to represent 8bit data.– Requires some sort of compression algorithm.

Page 16: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 16

ACE Identifier

• Requires explicit ACE-identifier.– For reverse conversion.– Choice of ACE-ID is political issue.

• ACE-ID itself is ASCII string, so that if any proposal for ACE-ID is raised, it will be registered as ASCII domain name.

• Actually happened at gTLD.

• IANA will assign the ACE-ID.

Page 17: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 17

Criteria of ACE selection

• Simple algorithm.– For ease implementation.– Interoperability.

• Effective compression results for practical IDNs.– To accommodate characters as much as possible.

• bilateral corresponding between encoding and decoding.– To avoid existence of alternative encoded representatio

n for one IDN.– Security consideration.

Page 18: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 18

Comparison of ACE proposals

RACE BQ--3BS6KZZMRKPDBSJQ4EYKIMHTKQGYUZU2CM.JP

Punycode ZQ--ECKWD4C7C777U7MWO4BOV4JIOAU09J.JP

Encoding sample of ‘ 日本語ドメイン名試験 .JP’

Evaluation resultfrom existingJapanese JPdomain names

Page 19: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 19

Punycode draft-ietf-idn-punycode-00.txt

• Selected ACE of IDN WG.• Compression algorithm.

– Extract characters by ascending order of codepoint.– Encode difference of codepoint from previously proces

sed character’s and the position into an integer.– Extract Letters, Digits and Hyphen as bootstring.

• ASCII conversion algorithm.– Introduced new concept named ‘Generalized variable-l

ength integers’.– BASE36 (A-Z, 0-9).

Page 20: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 20

Compression process of Punycode(simplified for understanding)

• “ 文字列例”• Compression.

1. 1:U+6587 2:U+5B57 3:U+5217 4:U+4F8B

2. 4:0x4F8B 3:0x28C 2:0x440 1:0xA30

3. 0x13E30 0xA33 0x1102 0x28C1

sort, diff

To integer(diff*chars+position)

Page 21: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 21

Generalized variable-length integers of Punycode

• 12345 in decimal is represented as 1*10^4+2*10^3+3*10^2+4*10^1+5*10^0

• Digits in all place are 0-9, so components in sequential 12345 cannot distinguish 123 and 45 or 1234 and 5.

• Furthermore, 012345 and 12345 are the same value with different representation.

• GVLI (Generalized variable-length integers) is an idea to solve this problem.

• Defines threshold for each place, and recognize a number below the threshold is delimiter.

• Threshold is an appropriate number smaller than base number.

Page 22: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 22

Encoding process of Punycode (simplified for understanding)

• Assign A-Z0-9 to GVLI.– Assume 36 for base, 10, 18, 25, 25 for thresholds.1. 0x13E30 0xA33 0x1102 0x28C1

2. OIUD3. BS44. CN85. XML

• “ 文字列例” =>“OUIDBS4CN8XML” .– Real Punycode generates “FSQW5D78MBSK”.

24*1+18*26(=1*(36-10))+30*468(=26*(36-18))+13*5148(=468*(36-25))

11*1+28*26+4*46812*1+23*26+8*468

33*1+22*26+21*468

Page 23: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 23

Standardization of IDN is just the start point of utilization

• End users uses IDN with application softwares.– Web, Mail, etc.

• IDNA requires application’s correspondence.• Must define how to deal IDNs in application proto

cols.

Standardization of IDN does not mean ready to use. Just a start point for applications incorporating

new features.

Page 24: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 24

HTTP Request(DNS resolve only)

Web

User

http:// ジェーピーニック .JP/

ZQ--HCKQZ9BZB1CYRB.JP

Web server’s

IP adress

GET http:// ジェーピーニック .JP/ HTTP/1.1Host: ジェーピーニック .JPReferer: http:// ジェーピーニック .JP/

Error!

DNS

Page 25: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 25

HTTP Request(ACE in HTTP header)

Web

User

http:// ジェーピーニック .JP/

ZQ--HCKQZ9BZB1CYRB.JP

Web server’s IP address

GET http://ZQ--HCKQZ9BZB1CYRB.JP/ HTTP/1.1Host: ZQ--HCKQZ9BZB1CYRB.JPReferer: http://ZQ--HCKQZ9BZB1CYRB.JP/

Contents

DNS

Page 26: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 26

References

• IETF IDN WG Web page– http://www.i-d-n.net/

• Unicode Consortium– http://www.unicode.org/

Page 27: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 27

Acknowledgement

• Telecommunications Advancement Organization of Japan (TAO).– JPNIC’s research activity of security investigati

on of IDN is a part of TAO’s research.– http://www.shiba.tao.go.jp/

Page 28: Standardization of Internationalized Domain Name at IETF

24 Jan 2002 APAN2002 Conference 28

IDN Compliant clients & implementations

• Mozillahttp://playground.i-dns.net/mozilla/index.html– Plug-in to Mozilla, resolution using RACE

• Operahttp://www.opera.com/– Native, Resolution using RACE

• Internet Explorer 5 or higherhttp://www.microsoft.com/windows/ie/default.asp– Uses keyword search engine as RACE converter

• mDNkithttp://www.nic.ad.jp/jp/research/idn/mdnkit/download/– Opensource toolkit for developing IDN compliant softwares