proposal for a malayalam script root zone label generation ... · west of india, particularly in...

39
1 Proposal for a Malayalam Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-04-22 Document version: 2.1 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information The purpose of this document is to give an overview of the proposed Malayalam LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used, the repertoire of code points included, variant code point(s), whole label evaluation rules and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-malayalam-lgr-22apr19-en.xml. Labels for testing can be found in the accompanying text document: malayalam-test-labels-22apr19-en.txt 2. Script for Which the LGR Is Proposed ISO 15924 Code: Mlym ISO 15924 Key N°: 347 ISO 15924 English Name: Malayalam Latin transliteration of native script name: malayāḷaṁ Native name of the script: മലയാളം Maximal Starting Repertoire (MSR) version: MSR-4 3. Background on Script and Principal Languages Using It Malayalam is a Dravidian language with about 38 million speakers spoken mainly in the south west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji, Israel, Malaysia, Qatar, Singapore, UAE and the UK. Malayalam was first written with the Vatteluttu alphabet (വെ)ഴ, Vaṭṭeḻuttŭ), which means 'round writing' and developed from the Brahmi script. The oldest known written text in Malayalam is known as the Vazhappalli or Vazhappally inscription, is in the Vatteluttu alphabet and dates from about 830 AD. A version of the Grantha alphabet originally used in the Chola kingdom was brought to the southwest of India in the 8th or 9th century and was adapted to write the Malayalam and Tulu languages. By the early 13th century it is thought that a systematized Malayalam alphabet had emerged. Some changes were made to the alphabet over the following centuries, and by the middle of the 19th century the Malayalam alphabet had attained its current form.

Upload: others

Post on 21-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

1

ProposalforaMalayalamScriptRootZoneLabelGenerationRuleset(LGR)LGRVersion:3.0Date:2019-04-22Documentversion:2.1Authors:Neo-BrahmiGenerationPanel[NBGP]

1. GeneralInformationThepurposeofthisdocumentistogiveanoverviewoftheproposedMalayalamLGRintheXMLformatandtherationalebehindthedesigndecisionstaken.Itincludesadiscussionofrelevantfeaturesofthescript,thecommunitiesorlanguagesusingit,theprocessandmethodologyused,therepertoireofcodepoints included,variantcodepoint(s),whole labelevaluationrulesandinformation on the contributors. The formal specification of the LGR can be found in theaccompanyingXMLdocument:proposal-malayalam-lgr-22apr19-en.xml.Labels for testingcanbefoundintheaccompanyingtextdocument:malayalam-test-labels-22apr19-en.txt

2. ScriptforWhichtheLGRIsProposedISO15924Code:MlymISO15924KeyN°:347ISO15924EnglishName:MalayalamLatintransliterationofnativescriptname:malayāḷaṁNativenameofthescript:മലയാളംMaximalStartingRepertoire(MSR)version:MSR-4

3. BackgroundonScriptandPrincipalLanguagesUsingItMalayalamisaDravidianlanguagewithabout38millionspeakersspokenmainlyinthesouthwestofIndia,particularlyinKerala,theLakshadweepIslandsandneighbouringstates,andalsoinBahrain,Fiji,Israel,Malaysia,Qatar,Singapore,UAEandtheUK.

Malayalam was first written with the Vatteluttu alphabet (വെ)ഴു,് Vaṭṭeḻuttŭ), whichmeans'roundwriting'anddevelopedfromtheBrahmiscript.TheoldestknownwrittentextinMalayalamisknownastheVazhappalliorVazhappallyinscription,isintheVatteluttualphabetanddatesfromabout830AD.

A version of the Grantha alphabet originally used in the Chola kingdomwas brought to thesouthwestofIndiainthe8thor9thcenturyandwasadaptedtowritetheMalayalamandTululanguages.Bytheearly13thcenturyitisthoughtthatasystematizedMalayalamalphabethademerged. Some changesweremade to the alphabet over the following centuries, and by themiddleofthe19thcenturytheMalayalamalphabethadattaineditscurrentform.

Page 2: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

2

AsaresultofthedifficultiesofprintingMalayalam,asimplifiedorreformedversionofthescriptwasintroducedduringthe1970sand1980s.Themainchangeinvolvedwritingconsonantsanddiacritics separately rather than as complex characters. These changes are not appliedconsistentlysothemodernscriptisoftenamixtureoftraditionalandsimplifiedletters.

Thescripthasthefollowingnotablefeatures:

● Malayalamscript iswritten left toright inhorizontal linesusingasyllabicalphabet inwhichallconsonantshaveaninherentvowel.Diacritics,whichcanappearabove,below,beforeorafteraconsonant,areusedtochangetheinherentvowel.

● When they appear at the beginning of a syllable, vowels are written as independentletters.

● ChillaksharamisanotherfeatureofMalayalam.Achilluisapureconsonantwithouttheuseofavirama,whichkillstheinherentvowelofaconsonant.

● When certain consonants occur together, special conjunct symbols are used whichcombinetheessentialpartsofeachletter.

3.1 TheEvolutionofMalayalamScriptMalayalamwasfirstwrittenintheVatteluttualphabet,anancientscriptofTamil.However,themodern Malayalam script evolved from the Grantha alphabet, which was originally used towriteSanskrit.BothVatteluttuandGranthaevolvedfromtheBrahmiscript,butindependently.

3.2 VatteluttualphabetVatteluttu (Malayalam:വെ)ഴു,്, Vaṭṭeḻuttŭ, “roundwriting”) is a script that had evolvedfromTamil-Brahmiandwasonceusedextensively in the southernpartofpresent-dayTamilNaduandinKerala.

MalayalamwasfirstwritteninVatteluttu.TheVazhappallyinscriptionissuedbyRajashekharaVarman is the earliest example, dating fromabout830CE. In theTamil country, themodernTamil script had supplanted Vatteluttu by the 15th century, but in the Malabar region,Vattelutturemainedingeneraluseuptothe17thcentury,orthe18thcentury.Avariantformofthisscript,Kolezhuthu,wasuseduntilaboutthe19thcenturymainlyintheKochiareaandinthe Malabar area. Another variant form, Malayanma, was used in the south ofThiruvananthapuram.

3.3 Grantha,TigalariandMalayalamscriptsAccordingtoArthurCokeBurnell,oneformoftheGranthaalphabet,originallyusedintheCholadynasty,was imported into thesouthwestcoastof India in the8thor9thcentury,whichwasthenmodifiedincourseoftimeinthissecludedarea,wherecommunicationwiththeeastcoastwas very limited. It later evolved into the Tigalari-Malayalam script used by the Malayali,HavyakaBrahminsandTuluBrahminpeople,butwasoriginallyonlyappliedtowriteSanskrit.Thisscriptsplitintotwoscripts:TigalariandMalayalam.WhileMalayalamscriptwasextendedandmodifiedtowritethevernacularMalayalamlanguage,TigalariwasusedforSanskritonly.In Malabar, this writing system was termed Arya-eluttu (ആര0എഴു,്, Ārya eḻuttŭ),

Page 3: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

3

meaning “Aryawriting”. (Sanskrit is an Indo-Aryan languagewhileMalayalam is a Dravidianlanguage).

Vatteluttuwasingeneraluse,butwasnotsuitableforliteratureinwhichmanySanskritwordswereused.LikeTamil-Brahmi,itwasoriginallyusedtowriteTamil,andassuch,didnothaveletters for thevoicedoraspiratedconsonantsused inSanskritbutnotused inTamil.For thisreason, Vatteluttu and the Grantha alphabetwere sometimesmixed, as in theManipravalamliterature (a literary style used in medieval liturgical texts in South India). One of the oldestexamplesofthis,Vaishikatantram(ൈവശികത78ം,Vaiśikatantram),datesbacktothe12thcentury,where theearliest formof theMalayalamscriptwasused,but it seemstohavebeensystematizedtosomeextentbythefirsthalfofthe13thcentury.

ThunchaththuEzhuthachan,apoetfromaroundthe17thcentury,usedArya-eluttutowritehisMalayalampoemsbasedonClassicalSanskritliterature.ForafewlettersmissinginArya-eluttu(ḷa,ḻa,ṟa),heusedVatteluttu.HisworksbecameunprecedentedlypopulartothepointthattheMalayalipeopleeventuallystartedtocallhimthefatheroftheMalayalamlanguage,whichalsopopularized Arya-eluttu as a script to write Malayalam. However, Grantha did not havedistinctionsbetweene andē, andbetweeno andō, as itwasonlyused towrite theSanskritlanguage.TheMalayalamscriptas it is todaywasmodified in themiddleof the19thcenturywhenHermannGundertinventedthenewvowelsignstodistinguishthem.

Bythe19thcentury,oldscriptslikeKolezhuthuhadbeensupplantedbyArya-eluttu–thatisthecurrentMalayalamscript.Nowadays,itiswidelyusedinthepressoftheMalayalipopulationinKerala.

Malayalam and Tigalari are sister scripts descended from the Grantha alphabet. Both sharesimilarglyphicandorthographiccharacteristics.

3.4 OrthographyreformIn 1971, the Government of Kerala reformed the orthography of Malayalam by passing agovernmentordertotheeducationdepartment.Theobjectivewastosimplifytheuseofprintandtypewriting technologyof that time,byreducingthenumberofglyphsrequired. In1967,the government appointed a committee headed by Sooranad Kunjan Pillai the editor of theMalayalamLexiconproject. It reduced thenumberof glyphs required forMalayalamprintingfrom around 1000 to around 250. The above committee's recommendations were furthermodifiedbyanothercommitteein1969[105].

Noneof themajornewspapers implemented itcompletely.Buteverynewspapertook itsownsubset from theproposal. The reformed script came into effect on15April 1971 (theKeralaNewYear),byagovernmentorderreleasedon23March1971.

3.5 LanguagesusingtheMalayalamscriptThe script is also used towrite several other languages such as Paniya, Betta Kurumba, andRavula (all at EGIDS 5). The Malayalam language itself was historically written in severaldifferentscripts.

NBGPconsideredlanguageswithEGIDSscale1to4forinclusion.Malayalamisoneofthetwolanguages written in Malayalam script (viz Malayalam & Sanskrit) meeting this criterion.

Page 4: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

4

Malayalamisplacedamongthe22scheduledlanguagesofIndia.Sanskrit,althoughitfallsunderEGIDS4,isnotconsideredinMalayalamscriptLGRbecauseMalayalamisrarelyusedtowriteSanskrit.

3.6 ZWJ/ZWNJApart from the existingUnicode character codepoints inMalayalam [110], ZeroWidth Joiner(ZWJ, U+200D) and ZeroWidth Non-Joiner (ZWNJ, U+200C) arewidely used to control howligatures are formed. Being invisible characters, they are often removed while doingnormalization, particularly before doing a string comparison, or collation. ICANN's MaximalStartingRepertoire(MSR)forIDNLGRisdoesnotincludeZWJandZWNJ.[101]

Impactofexcludingthemfromdomainnamesystem:AlthoughIDNA2008allowstheuseofZWJandZWNJindomainnames,theyarenotallowedintherootzonelabels,duetoexclusionfromMSR.

HenceitisnotpossibletoregisterMalayalamgTLDswithwordsthatcontainzwj/zwnj.

Therearethreecases:

● MissingZWNJisconsideredasaspellingmistake.Example:TamilNadu(tamiɭnadu)iswrittenas: തമി9നാ;[0D240D2E0D3F0D340D4D200C0D280D3E0D1F0D4D](correct),[0D240D2E0D3F0D340D4D0D280D3E0D1F0D4D](incorrect).

ButtherearenoidentifiedcaseswhereamissingZWNJformsanothervalidwordwithdifferentmeaning.

● MissingZWJmeans, theword isadifferentwordwithdifferentmeaning.This isveryrare – vaNyavanika (meaning: large curtain) വന0വനികvanyaVanika (meaning: wild garden) pair is often cited as an example for this. Butmanypeoplearguethisisnotavalidcase.[102][103]

● MissingZWJnevermeans a spellingmistake, but just awriting style.There aremanyexamplesforthis.-ന<(meaning:goodness)isoneobviousone.

Historically, ZWJwas used to render chillu in certain fonts but later Unicode included chillucharacters as standalone code points and MSR-4 also includes these standalone chillucharacters.

Pre-Unicode5.0,ChilluletterswereencodedasasequenceusingJoiners.Theolderencodingisstillprevalentindata,suchascorporaandmayevenbeincurrentuse.

ButthislegacyrepresentationofChilluusingViramaandZWJisruledoutbecausetherootdoesnotallow joiners, so there isno issuewith theduplicateencodingofChillu.Hence, it is tobenoted that although atomic encoding of Chillu letters is not universally used, Root Zone onlyallowstheatomicencoding.

Page 5: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

5

Figure1:AtomicEncodingMalayalamChillus[107]

ZWNJ,isusedtopreventtheformationofconjunctligaturesanditisrequiredtoavoidspellingmistakesandunnecessaryconjuncts.Forexample, ina2-word label, the firstwordending inviramacanformconjunctwiththesecondwordstartinginaconsonant.Thiscausesaspellingmistake.

3.7 TheStructureofMalayalamScriptTheMalayalamAksharamorgraphemeclusterisbasedontheMalayalamphonologicalsystem,withthefollowingbasicphonologicaltemplate.PhonologyVowels:Malayalamhasfiveshortandfivelongvowels.Vowelsoccurinallpositionsinaword,exceptforo,whichisnotpermittedattheendofit.Italsohastwodiphthongs,ai,au.

Figure2:MalayalamVowelPhonology[109]

Consonants: Besides a Dravidian consonantal inventory, Malayalam has aspirated stops andsupplementarysibilantsborrowedfromIndo-Aryan.[f]occursmostlyinEuropeanborrowings.Voiceless unaspirated stops, nasals and laterals [l], [ɭ] can be germinated. The distinctionbetweensingleandgeminatedconsonantsisphonemic.Onlysixconsonants,[m],[n],[ɳ],[r],[l],and[ɭ],canoccurwordfinally.

Page 6: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

6

Figure3:MalayalamConsonantPhonology[109]

Sandhi: internal and external sandhi are commonplace. They result in vowel and consonantdeletion,assimilationofconsonantsandfusion.Stress:itfallsalwaysonthefirstsyllableofawordScriptandOrthographyMalayalam is written in an abugida script derived ultimately from Brāhmī in which everyconsonant carries an inherent a. The alphabetic order is based on phonological principles: itbeginswiththesimplevowelsanddiphthongsfollowedby25stopsandnasalsarrangedinfivegroupsaccordingtotheirplaceofarticulation.Itcontinueswithsemivowels(liquidsandglides)and fricatives toend in tworetroflex liquidswhichdon'texist inSanskritand, thus,werenotrepresentedinBrāhmī.Geminatedconsonantsandotherconsonantclustersarewrittensidebysideoroneabovetheother. Below eachMalayalam sign appears the standard transliteration in the Latin alphabet,andbetweensquarebracketsitsequivalentintheInternationalPhoneticAlphabet.The followingsectionsprovidedetailsof theMalayalamsoundsandhowthesearewritten inMalayalam.Monophthongs

Short Long

Independent

Dependent Independent

Dependent

Vowelsign

Example Vowelsign

Example

a അ a (none) പ pa ആ ā ാ പാ pā

Page 7: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

7

/a/ /pa/ /aː/ /paː/

i ഇ i

/i/

ി

പി pi

/pi/

ഈ ī

/iː/

പീ pī

/piː/

u ഉ u

/u/

പു pu

/pu/

ഊ ū

/uː/

പൂ pū

/puː/

r̥ ഋ r̥

/rɨ/

പൃ pr̥

/prɨ/

e എ e

/e/

പെ pe

/pe/

ഏ ē

/eː/

പേ pē

/peː/

o ഒ o

/o/

പൊ po

/po/

ഓ ō

/oː/

പോ pō

/poː/

Page 8: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

8

Diphthongs

Independent Dependent

Vowelsign Example

ai ഐai

/ai/̯

ൈ◌

ൈപpai

/pai/̯

au ഔau

/au̯/

െ◌ൗ

(archaic)

െപൗpau

/pau̯/

◌ൗ

(modern)

പൗpau

/pau̯/

Anusvaram

aṁ

അം aṁ

/am/

ം ṁ

/m/

പം paṁ

/pam/

Visargam

aḥ

അഃ aḥ

/ah/

ഃ ḥ

/h/

പഃ paḥ

/pah/

Consonants

Voiceless Voiced

Unaspirated

Aspirated

Unaspirated

Aspirated

Nasal

Page 9: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

9

Velar കka

/ka/KA

ഖkha

/kʰa/KHA

ഗga

/ɡa/GA

ഘgha

/ɡʱa/GHA

ങṅa

/ŋa/NGA

Palatal

or

Postalveolar

ചca

/t͡ʃa/CA

cha

ഛcha

/t͡ʃʰa/CHA

chha

ജja

/ɟa/JA

'"jha"'

ഝjha

/ɟʱa/JHA

'"jhha"'

ഞña

/ɲa/NYA

nha(nja)

Retroflex ടṭa

/ʈa/TTA

ta(hardta)

ഠṭha

/ʈʰa/TTHA

tta(hardtha)

ഡḍa

/ɖa/DDA

da (hardda)

ഢḍha

/ɖʱa/DDHA

dda(harddha)

ണṇa

/ɳa/NNA

hardna

Dental തta

/t̪a/TA

tha (softta)

ഥtha

/t̪ʰa/THA

ttha(softtha)

ദda

/d̪a/DA

dha (softda)

ധdha

/d̪ʱa/DHA

ddha(softdha)

നna

/n̪a,na/NA

softna

Labial പpa

/pa/PA

ഫpha

/pʰa/PHA

ബba

/ba/BA

ഭbha

/bʱa/BHA

മma

/ma/MA

Otherconsonants

യya രra ലla വva

Page 10: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

10

/ja/YA

/ɾa/RA

/la/LA

/ʋa/VA

Dentalnasaloralveolarnasal,dependingontheword

Alveolartap

Thetipofthetonguealmosttouchestheteeth([l]̪),forwardthantheEnglishl

ശśa

/ʃa/SHA

softsha(sha)

ഷṣa

/ʂa/SSA

sha(hardsha)

സsa

/sa/SA

ഹha

/ɦa/HA

Voicelessapico-palatalapproximant

Dentalsibilantfricative

ളḷa

/ɭa/LLA

hardla

ഴḻa

/ɻa/LLLA/ṛ/ɽ/

zha(retroflexedra)

റṟa,ṯa

/ra,ta/RRA

(hardra)

Apico-palatal Voicedapico-palatalapproximant[ʐ̠̺˕].Thisconsonantisusuallydescribedas/ɻ/,butalsocanbeapproximatedby/ɹ/

alveolartrill(apical)

[f] is found mostly in Urdu and English loanwords and doesn't have a specific sign; it isrepresentedwithphthatalsoservesfor[pʰ].Vowels

Vowelsarewritteninthisformwhentheyareindependentlyused.

അ U+0D05 A

ആ U+0D06 AA

ഇ U+0D07 I

ഈ U+0D08 II

ഉ U+0D09 U

ഊ U+0D0A UU

ഋ U+0D0B R

എ U+0D0E E

ഏ U+0D0F EE

ഐ U+0D10 AI

ഒ U+0D12 O

ഓ U+0D13 OO

ഔ U+0D14 AU

Table1:MalayalamVowels

Page 11: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

11

Voweldiacritics

Vowels canalsobewrittenasdiacritics referred toasMatras,when these followconsonants.Theirformsaregivenbelow,illustratedwiththeletterക(U+0D15)MALAYALAMLETTERKA.

ക U+0D15 KA

കാ U+0D15 U+0D3E KAA

കി U+0D15 U+0D3F KI

കീ U+0D15 U+0D40 KII

കു U+0D15 U+0D41 KU

കൂ U+0D15 U+0D42 KUU

കൃ U+0D15 U+0D43 KR

െക U+0D15 U+0D46 KE

േക U+0D15 U+0D47 KEE

ൈക U+0D15 U+0D48 KAI

െകാ U+0D15 U+0D4A KO

േകാ U+0D15 U+0D4B KOO

കൗ U+0D15 U+0D57 KAU

Table2:MalayalamVowelDiacriticsConsonants

Malayalam has the following consonants, generally arranged my manner and place ofarticulation.

ക U+0D15 KA

ഖ U+0D16 KHA

ഗ U+0D17 GA

ഘ U+0D18 GHA

ങ U+0D19 NGA

ച U+0D1A CA

ഛ U+0D1B CHA

ജ U+0D1C JA

ഝ U+0D1D JHA

ഞ U+0D1E NYA

ട U+0D1F TTA

ഠ U+0D20 TTHA

ഡ U+0D21 DDA

ഢ U+0D22 DDHA

ണ U+0D23 NNA

ത U+0D24 TA

ഥ U+0D25 THA

ദ U+0D26 DA

ധ U+0D27 DHA

ന U+0D28 NA

പ U+0D2A PA

ഫ U+0D2B PHA

ബ U+0D2C BA

ഭ U+0D2D BHA

മ U+0D2E MA

യ U+0D2F YA

ര U+0D30 RA

റ U+0D31 RRA

ല U+0D32 LA

ള U+0D33 LLA

Page 12: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

12

ഴ U+0D34 LLLA

വ U+0D35 VA

ശ U+0D36 SHA

ഷ U+0D37 SSA

സ U+0D38 SA

ഹ U+0D39 HA

Table3:MalayalamConsonantsAnusvaramandVisargam

Anusvaram:Ananusvaram(അനുസmാരംanusvāram),orananusvara,originallydenotedthenasalization where the preceding vowel was changed into a nasalized vowel, and hence istraditionallytreatedasakindofvowelsign.InMalayalam,anusvararepresentedas◌ം(0D02)however,simplyrepresentsaconsonant/m/afteravowel,thoughthis/m/maybeassimilatedtoanothernasalconsonant.Itisaspecialconsonantletter,differentfroma"normal"consonantletter,inthatitisneverfollowedbyaninherentvoweloranothervowel.Ingeneral,ananusvaraattheendofawordinanIndianlanguageistransliteratedasṁinISO15919,butaMalayalamanusvaraattheendofawordistransliteratedasmwithoutadot.

Visargam:Avisargam(വിസർഗം,visargam),orvisarga,representsaconsonant/h/afteravowel,andistransliteratedasḥ.Liketheanusvara,itisaspecialsymbol,andisneverfollowedbyaninherentvoweloranothervowel.InMalayalam,◌ഃ(0D03)isthevisargasymbol.

Chilluletters(Chillaksharam)andSamvruthokarams

In the Indo-European family of languages like Sanskrit, a large number of words end inconsonants. But inDravidian languages likeMalayalam themajority ofwords end in vowels.But,thechillaksharamsofMalayalamareexceptionstothisgeneralfeature.Chillaksharamsarepureconsonants,withoutanyvowelsound.[111]

ChillaksharamisanoriginalfeatureofMalayalamusedonlywith6consonantsatpresent.Theconsonants areന (na),ണ (ṇa),ര (ra),ല (la)ള (ḷa) andക (ka) and their correspondingchillusare ൻ (ṉ), ൺ (ṇ), ർ (r), ൽ (l) ൾ (ḷ) and ൿ (ḳ)incertaincontexts,occurringattheendofthewordwithouttheimplicitvowel.TheChillu0D7Feventhoughisrare,isstillinusepredominantly in religious literature and in proper nouns such as names and place names.HenceitisincludedintheLGRtotreatChillucharactersconsistently.

ൺ U+0D7A NN

ൻ U+0D7B N

ർ U+0D7C RR

ൽ U+0D7D L

ൾ U+0D7E LL

ൿ U+0D7F K

Table4:MalayalamChilluletters

Samvruthokaram is a soft ending virama (chandrakkala). Any consonant can be followed byconsonant+◌ു(0D41)+◌◌്(0D4D),creatingthesamvruthokaramformofthatconsonant.InsouthernKerala,theUmatra◌ു(0D41)andchandrakkala(virama)◌◌് (0D4D)togetherformthe grapheme for samvruthokaram. However, in northern Kerala, just chandrakkala (visible

Page 13: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

13

virama)standingaloneisused.Inthatcase,chandrakkalaaloneattheendofawordistreatedasSamvruthokaram.

Chandrakkala comingwithin aword (followed by other character(s) of theword) denotes aconjunct letter formed by the character(s) preceding and following the chandrakkala.Traditional Orthography fonts is used below, since it discusses display forms such assamvruthokaram,whichdoesnotexistinModernOrthography.

ExamplesofSamvruthokaram:

/ഏതു്

(ethumeaningwhich),codepoints-U+0D0FU+0D24U+0D41U+0D4D

/അതു്

(athumeaningthat)codepoints-U+0D05U+0D24U+0D41U+0D4D

For thewords thatend inchillu, Samvruthokaram isused tomake thepronunciationclearer.Eithersamvruthokaramisaddeddirectlytotheword-endingchillaksharam,ortheword-endingchillaksharamisgeminatedandSamvruthokaramisaddedtoit.

Thefollowingarethemainphonologicaltransformationsofchillaksharam.[113]

1.Theword-endingconsonantwrittenaschillaksharam,isgeminatedandasamvrukthokaramisattached:

2.Totheword-endingconsonantwrittenaschillaksharam,asamvrukthokaramisattached:

3.Thechillaksharamundergoesthesamephonologicalchanges(inprogressive/regressive

assimilation,gemination,etc.)asinthecaseofotherconsonantsinthecontextofcombination

ofsyllables:

Page 14: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

14

4.Insandhi,whenavowelfollowsachillaksharam,theyjoininthesamewayaswhenvowels

followotherconsonants:

EventhoughSamvruthokarammaybeseenasderivedfromthevowelsഅ(a)orഉ(u),infact,ithasanindependentidentityasavowel.ThisfeatureisseenonlyinMalayalam.[111]Aselectionofconjunctconsonants

AconsonantcanbecombinedwithanotherconsonantorconjunctusingVirama.Conjunctswithmorethanfourconsonantsarerare.Theconjunctu7v0isformedbyfiveconsonants.

kka ṅka ṅṅa cca ñca ñña ṭṭa ṇṭa ṇṇa tta nta nna

NLF wക xക xങ yച

zച zഞ ;ട {ട {ണ |ത }ത }ന

LF ~ � � � � � ) � � , 8 �

Table5:MalayalamConjunctConsonants

NLF-Non-ligatedformhasavisiblevirama(chandrakkala)LF-Ligatedforminwhichconsonantsareconjoinedfullyorpartially(asrenderedbyfonts)Conjunctswithdiacriticsusingയ(U+0D2F),ര(U+0D30),ല(U+0D32),വ(U+0D35)

Conjunctconsonantsformedwithയ(0D2F),ര(0D30),ല(0D32)andവ(0D35)arerenderedwithdiacriticmarks/signsintheglyph.Examplesoftheseincombinationwithക(0D15)andപ(0D2A)aregivenbelow.Otherconsonantscanbecombinedinsimilarfashion.

Consonant + യ Consonant + ര Consonant + ല Consonant + വ

ക0 (0D15 0D4D 0D2F)

7ക (0D15 0D4D 0D30)

� (0D15 0D4D0D32)

കm (0D15 0D4D 0D35)

പ0 (0D2A 0D4D 0D2F)

7പ (0D2A 0D4D 0D30)

� (0D2A 0D4D0D32)

പm (0D2A 0D4D 0D35)

Page 15: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

15

Table6:MalayalamConjunctswithdiacriticsusingയ(U+0D2F),ര(U+0D30),ല(U+0D32),വ(U+0D35)

4. OverallDevelopmentProcessandMethodologyTheNeo-BrahmiGenerationPanel(NBGP)hasbeenformedfrommembershavingexperienceinlinguistics and computational linguistics. Under the Neo-Brahmi Generation Panel, there areninescriptsbelongingtoseparateUnicodeblocks.Eachofthesescripts isassignedaseparateLGR;howeverNeo-BrahmiGPensuresthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmi-derivedscripts.

The Malayalam script LGR proposal was published for public comment to allow those who had not participated in the NBGP to make their views known. The NBGP analyzed all comments received to finalize the proposal. The analysis of public comments can be accessed online given at [114].

4.1GuidingPrinciples

TheNBGP adopts the followingbroadprinciples for the selection of code-points in the code-pointrepertoireacrosstheboardforallthescriptswithinitsambit.

4.1.1Inclusionprinciples:

4.1.1.1Modernusage:Everycharacterproposedshouldbeintheeverydayusageofaparticularlinguisticcommunity.CharacterswhichhavebeenencodedinUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire.

4.1.1.2Unambiguoususe:Every character proposed should have unambiguous understanding among the linguisticcommunityaboutitsusageinthelanguage.

4.1.2Exclusionprinciples:Themainexclusionprinciple is thatofExternalLimitsonScope.ThesecompriseprotocolsorstandardswhichareprerequisitestotheLabelGenerationRulesets.Allfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity.

4.1.2.1ExternalLimitsonScope:The code point repertoire for root zone being a very special case, at the top of the protocolhierarchies,therangeofavailablecharactersforselectionasapartoftheRootZonecodepointrepertoire is already constrained by various protocol layers beneath it. The following threemainprotocols/standardsactassuccessivefilters:i.TheUnicodeChart:Outofallthecharactersthatareneededbythegivenscript,ifthecharacterinquestionisnotencodedinUnicode,itcannotbeincorporatedinthecodepointrepertoire.Suchcasesarequite

Page 16: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

16

rare, given the elaborate and exhaustive character inclusion efforts made by the UnicodeConsortium.ii.IDNAProtocol:Unicode being the character encoding standard for providing the maximum possiblerepresentation of a given script/language, it has encoded as far as possible all the possiblecharacters needed by the script. However, the Domain name being a specialized case, it isgoverned by an additional protocol known as IDNA (Internationalized Domain Names inApplications). The IDNA protocol introduces exclusion of some characters out of Unicoderepertoirefrombeingpartofthedomainnames.iii.MaximalStartingRepertoire:TheRoot-zoneLGRbeingarepertoireofthecharacterswhicharegoingtobeusedforcreationof therootzoneTLDs,which in turnareanevenmorespecializedcaseofdomainnames, theROOTLGRprocedureintroducesadditionalexclusionsonIDNAallowedsetofcharacters.Example: MALAYALAMSIGNAVAGRAHA"ഽ "(U+ 0D3D)evenifallowedbyIDNAprotocol,is

notpermittedintheRootZoneRepertoireasperthe[MSR].Tosumup,therestrictionsstartoffbyadmittingonlysuchcharactersasarepartofthecode-blockofthegivenscript/language.This is furthernarroweddownbytheIDNA2008ProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore.

4.1.2.2NoRareandObsoleteCharacters:There are characters which have been added to Unicode to accommodate rare forms likeMALAYALAM LETTER VOCALIC L "ഌ" (U+0D0C), which is an obsolete vowel used to writeSanskritwordsand isnotconsideredaspartof themodernMalayalamorthography.All suchcharacterswillnotbe included.This is inconsonancewith theConservatismprincipleas laiddownintheRootZoneLGRprocedure.

Page 17: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

17

5. RepertoireBasedontheLGRProcedurefortheRootZoneandtheMSR,NBGPconductedthecodepointanalysisoftheMalayalamscript.Theanalysisispresentedinthissection,includingthelistofcodepointsrecommendedforinclusionandexclusionfromtherepertoire.

Page 18: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

18

5.1 MalayalamsectionofMaximalStartingRepertoire[MSR]Version4

Figure4:MalayalamCodePagefrom[MSR]

Color convention1: All characters that are included in the [MSR] - Yellow background PVALID in IDNA2008 but excluded from the [MSR] - Pinkish background Not PVALID in IDNA2008 - White background

5.2 UnicodeCodePointsInclusionThefollowingcodepointsareincludedintherepertoire.

Sr. No.

Unicode Code Point

Glyph Character Name Category Refs.

1This document needs to be printed in color for this to be read correctly.

Page 19: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

19

1 0D02 ◌ം MALAYALAM SIGN ANUSVARA Anusvaram [106]

2 0D03 ◌ഃ MALAYALAM SIGN VISARGA Visargam [106]

3 0D05 അ MALAYALAM LETTER A Vowel [106]

4 0D06 ആ MALAYALAM LETTER AA Vowel [106]

5 0D07 ഇ MALAYALAM LETTER I Vowel [106]

6 0D08 ഈ MALAYALAM LETTER II Vowel [106]

7 0D09 ഉ MALAYALAM LETTER U Vowel [106]

8 0D0A ഊ MALAYALAM LETTER UU Vowel [106]

9 0D0B ഋ MALAYALAM LETTER VOCALIC R Vowel [106]

10 0D0E എ MALAYALAM LETTER E Vowel [106]

11 0D0F ഏ MALAYALAM LETTER EE Vowel [106]

12 0D10 ഐ MALAYALAM LETTER AI Vowel [106]

13 0D12 ഒ MALAYALAM LETTER O Vowel [106]

14 0D13 ഓ MALAYALAM LETTER OO Vowel [106]

15 0D14 ഔ MALAYALAM LETTER AU Vowel [106]

16 0D15 ക MALAYALAM LETTER KA Consonant [106]

17 0D16 ഖ MALAYALAM LETTER KHA Consonant [106]

18 0D17 ഗ MALAYALAM LETTER GA Consonant [106]

19 0D18 ഘ MALAYALAM LETTER GHA Consonant [106]

20 0D19 ങ MALAYALAM LETTER NGA Consonant [106]

21 0D1A ച MALAYALAM LETTER CA Consonant [106]

22 0D1B ഛ MALAYALAM LETTER CHA Consonant [106]

23 0D1C ജ MALAYALAM LETTER JA Consonant [106]

24 0D1D ഝ MALAYALAM LETTER JHA Consonant [106]

25 0D1E ഞ MALAYALAM LETTER NYA Consonant [106]

26 0D1F ട MALAYALAM LETTER TTA Consonant [106]

Page 20: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

20

27 0D20 ഠ MALAYALAM LETTER TTHA Consonant [106]

28 0D21 ഡ MALAYALAM LETTER DDA Consonant [106]

29 0D22 ഢ MALAYALAM LETTER DDHA Consonant [106]

30 0D23 ണ MALAYALAM LETTER NNA Consonant [106]

31 0D24 ത MALAYALAM LETTER TA Consonant [106]

32 0D25 ഥ MALAYALAM LETTER THA Consonant [106]

33 0D26 ദ MALAYALAM LETTER DA Consonant [106]

34 0D27 ധ MALAYALAM LETTER DHA Consonant [106]

35 0D28 ന MALAYALAM LETTER NA Consonant [106]

36 0D2A പ MALAYALAM LETTER PA Consonant [106]

37 0D2B ഫ MALAYALAM LETTER PHA Consonant [106]

38 0D2C ബ MALAYALAM LETTER BA Consonant [106]

39 0D2D ഭ MALAYALAM LETTER BHA Consonant [106]

40 0D2E മ MALAYALAM LETTER MA Consonant [106]

41 0D2F യ MALAYALAM LETTER YA Consonant [106]

42 0D30 ര MALAYALAM LETTER RA Consonant [106]

43 0D31 റ MALAYALAM LETTER RRA Consonant [106]

44 0D32 ല MALAYALAM LETTER LA Consonant [106]

45 0D33 ള MALAYALAM LETTER LLA Consonant [106]

46 0D34 ഴ MALAYALAM LETTER LLLA Consonant [106]

47 0D35 വ MALAYALAM LETTER VA Consonant [106]

48 0D36 ശ MALAYALAM LETTER SHA Consonant [106]

49 0D37 ഷ MALAYALAM LETTER SSA Consonant [106]

50 0D38 സ MALAYALAM LETTER SA Consonant [106]

51 0D39 ഹ MALAYALAM LETTER HA Consonant [106]

52 0D3E ◌ാ MALAYALAM VOWEL SIGN AA Matra [106]

Page 21: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

21

53 0D3F ◌ി MALAYALAM VOWEL SIGN I Matra [106]

54 0D40 ◌ീ MALAYALAM VOWEL SIGN II Matra [106]

55 0D41 ◌ു MALAYALAM VOWEL SIGN U Matra [106]

56 0D42 ◌ൂ MALAYALAM VOWEL SIGN UU Matra [106]

57 0D43 ◌ൃ MALAYALAM VOWEL SIGN VOCALIC R Matra [106]

58 0D46 െ◌ MALAYALAM VOWEL SIGN E Matra [106]

59 0D47 േ◌ MALAYALAM VOWEL SIGN EE Matra [106]

60 0D48 ൈ◌ MALAYALAM VOWEL SIGN AI Matra [106]

61 0D4A െ◌ാ MALAYALAM VOWEL SIGN O Matra [106]

62 0D4B േ◌ാ MALAYALAM VOWEL SIGN OO Matra [106]

63 0D4D ◌് MALAYALAM SIGN VIRAMA Chandrakkala / Virama

[106]

64 0D57 ◌ൗ MALAYALAM AU LENGTH MARK Matra [106]

65 0D7A ൺ MALAYALAM LETTER CHILLU NN Chillu Letters [106]

66 0D7B ൻ MALAYALAM LETTER CHILLU N Chillu Letters [106]

67 0D7C ർ MALAYALAM LETTER CHILLU RR Chillu Letters [106]

68 0D7D ൽ MALAYALAM LETTER CHILLU L Chillu Letters [106]

69 0D7E ൾ MALAYALAM LETTER CHILLU LL Chillu Letters [106]

70. 0D7F ൿ MALAYALAM LETTER CHILLU K Chillu Letters [106]

Table7:MalayalamCodePointRepertoire

5.3 CodePointSequenceThefollowingsequenceshavebeendefinedforthepurposeofvariantdefinitionsandWLErules

(seesection6.1andsection7).

1 U+0D28 U+0D4D U+0D31 ന ◌് റ [}റ]

MALAYALAM LETTER NA MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA

2 U+0D31 U+0D31 റ റ MALAYALAM LETTER RRA MALAYALAM

Page 22: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

22

[ററ] LETTER RRA

3 U+0D31 U+0D4D U+0D31 റ ◌് റ [�]

MALAYALAM LETTER RRA MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA

4 U+0D31 U+0D31 U+0D4D U+0D31

റ റ ◌് റ [റ�]

MALAYALAM LETTER RRA MALAYALAM LETTER RRA MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA

5 U+0D31 U+0D4D U+0D31 U+0D31

റ ◌് റ റ [�റ]

MALAYALAM LETTER RRA MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA MALAYALAM LETTER RRA

6 U+0D33 U+0D33 ള ള [ളള]

MALAYALAM LETTER LLA MALAYALAM LETTER LLA

7 U+0D33 U+0D4D U+0D33 ള ◌് ള [�]

MALAYALAM LETTER LLA MALAYALAM SIGN VIRAMA MALAYALAM LETTER LLA

8 U+0D33 U+0D33 U+0D4D U+0D33

ള ള ◌് ള

[ള�]

MALAYALAM LETTER LLA MALAYALAM LETTER LLA MALAYALAM SIGN VIRAMA MALAYALAM LETTER LLA

9 U+0D33 U+0D4D U+0D33 U+0D33

ള ◌് ള ള

[�ള]

MALAYALAM LETTER LLA MALAYALAM SIGN VIRAMA MALAYALAM LETTER LLA MALAYALAM LETTER LLA

10 U+0D7B U+0D31 ൻ റ [ൻറ]

MALAYALAM LETTER CHILLU N MALAYALAM LETTER RRA

Table7a:MalayalamCodePointSequences

5.4 UnicodeCodePointExclusionThe following code points are excluded because they are archaic or obsolete in currentMalayalamorthography.

Sr. No.

Unicode Code Point

Glyph Character Name Category Reason

1. 0D0C ഌ MALAYALAM LETTER VOCALIC L

Vowel ഌ (0D0C) an obsolete vowel used to write Sanskrit words. The letter ഌ is very rare, and are not considered as part of the modern Malayalam orthography.

Page 23: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

23

2. 0D44 ◌ൄ MALAYALAM VOWEL SIGN VOCALIC RR

Matra ◌ൄ (0D44) is the matra sign of obsolete vowel VOCALIC RR ൠ (0D60) which is not among the approved codepoints in MSR-4. It is no longer used in Malayalam orthography.

3. 0D29 ഩ MALAYALAM LETTER NNNA

Consonant ഩ (0D29) corresponds to Tamil ṉa ன. Used

rarely in scholarly texts to represent the alveolar nasal, as opposed to the dental nasal. [108]. In ordinary texts both are represented by na ന (0D28).

Table8:MalayalamExcludedCodePoint

6. VariantsThissectiondiscussesthevariantcodepointsfoundinMalayalamwithinscriptandwithotherrelatedscripts.

6.1 In-scriptvariantsThissectionlistssequencesthatshouldbeconsideredvariantsofoneanother.

Set # Characters Code Points Glyph

1. a) } + റ 0D28 +0D4D +0D31 or

b) ൻ + ◌് + റ 0D7B + 0D4D + 0D31

c) ൻ + റ 0D7B + 0D31 ൻറ

2. a) � + ള 0D33 + 0D4D + 0D33 �

b) ള + ള 0D33 + 0D33 ളള

Page 24: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

24

3. a) റ + റ 0D31 + 0D31 ററ

b) റ + ◌് + റ 0D31 + 0D4D + 0D31 � or

Table9:In-scriptVariantAnalysisSet1:Thesearevariouswaystowritetheconjunct“nta”inMalayalam.1a)Herentaisencodedasacombinationof0D28+0D4D+0D31anditisrenderedas in most of the MalayalamUnicodefontsandafewoftheMicrosoftfontsrenderitas}റ.

1b)ishowsomeMicrosoftfontshaveencodednta0D7B+0D4D+0D31anditisrenderedas inthosefontsandas inotherfonts.However,asperUnicode(StandardVersion11.0.0§12.9page506table12-38)<chillu-n,virama,rra>istheprescribedsequencefortheform{chillu-nbase,rrabelow-base}.BecauseofthisconflictwithUnicode,thesequence1b)shouldbedisallowed.Although1.c)hasalsobeenusedhistoricallytowritentaandsuchsequentialstyleofwritingisstillinuse,thatcombinationcanalsobeusedtowritenrainwordslikeെഹൻറി(Henry)or

എൻറി~(Enrica).[112]Hencethesequenceof1.c)isallowed.Thevariantsinset1

containtheremainingtwovariantsequenceswithdisposition“blocked”.Set 2: The consonantള (0D33) rarely follows anotherള inMalayalam, except in the case of

someplace names. The double conjunct ofള (0D33) formed by code points 0D33 + 0D4D+

0D33isrenderedastheglyph�whichlooksvisuallyverysimilartoaളfollowinganotherള.

Thiscanresultinspoofedlabels.Forexample,inMalayalamwewrite“vellam”as“െവ�ം”-

0D350D460D330D4D0D330D02(meaning:water),aspoofedlabelcanwriteitas“െവളളം”

-0D350D460D330D330D02.Thisshouldbeblocked.

However,thispatterngivesrisetosomecomplicationsbecauseiteffectivelymakestheHalant(0D4D)avariantofa"nullposition",inthiscase,wheneveritoccursbetweentwoinstancesof0D33ളLLA.Variantdefinitionsofthatnaturecanleadtounexpectedresultsbecausealabel:

0D330D4D0D330D4D0D33canbeanalyzedtwoways:

{0D330D4D0D33}{0D4D}{0D33}and{0D33}{0D4D}{0D330D4D0D33}

NBGPtakesintoaccountthedataprovidedbytheIPonoccurrenceofthesesequencesincertainlabelswhereaconsonantള(0D33)followsanotherള:IPhadfoundthatthefrequencyissmall.

However, the community feedback shows an increase in usage due to foreign-language-borrowedwordslanguage.ThedetailedanalysisandsupportingdatacanbefoundinAppendixC.

Page 25: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

25

Therefore, NBGP has decided to define a rule (rule 7 in Section 7). The sequences U+0D33U+0D33(ളള )/U+0D33U+0D4DU+0D33(ള്ള )andU+0D33U+0D33U+0D4DU+0D33(ളള്ള ) / U+0D33 U+0D4D U+0D33 U+0D33 ( ള്ളള ) have been defined as variant pairs.However, these sequences and variants are further constrained by context rules on bothsequences andvariants.Tomake the "null" variantwell-behaved,noneof the sequences, norU+0D33(ള),maybefollowedbyafurtherU+0D33.ThatlimitsalloccurrencesofU+0D33to

singletonsorexplicitlyenumeratedsequences.Atthesametime,thevariantmappingsarenotdefinedifasequencefollowsU+0D33U+0D4DorfollowsU+0D4DU+0D33,inotherwords,ifitispartofalongersequenceof0D33(ള)joinedbyHalant.

Ifareordrantmatrafollowsasequenceitwouldgraphicallyintervene,thusmakingthesequencesnolongervariants.ReordrantmatrasareU+0D46(െ),U+0D47(േ),U+0D48(ൈ),U+0D4A(ൊ),U+0D4B(ോ),andasequenceU+0D4D( ്) U+0D30(ര).Therefore,thevariantsarealsonotdefinedifasequenceisfollowedbyareordrantmatra.Thesetwocontextrulesarecombinedintothesinglecontextonthevariantmapping:

V1:Avariantprecededby0D33+Halantorfollowedby0D33orRorHalant+0D33isnotdefinedThesequenceU+0D4D(◌്)U+0D30(ര)isnotrequiredinthenormativepartoftheproposal

asitdoesn'tcreateanyconfusinglabel.Restrictitwillonlybethespellingrule.Set3:Thecaseof�issimilarto�.Afontthatdoesnotstacktheറ+◌്+റcanrenderitin

horizontalformat.Soawordlikeമീററ�canbespoofedbyapplyingviramatothelasttwoറ.

Itisraretoseeafontthatdoesnotstack�,butinsteadofdependingonthatweakassumption,

sequencesandvariantshavebeendefined inanentirelyanalogousmannertoU+0D33withavariantcontext:V2: A variant preceded by 0D31+Halant or followed by 0D31 or R or Halant+0D31 is notdefined.(ThisisalsomentionedinAppendixpartofthedocumentascommunityfeedback.)

6.2 Cross-ScriptVariantsThe Malayalam characters in tables below are considered variant code points with somecharacters in Oriya and Tamil as they could be considered visually same for the users. SeeAppendix A for additional code points for other scripts which are visually similar but notconsideredasvariantcodepointsforthereasonslisted.

Page 26: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

26

6.2.1 Cross-scriptvariantsforTamilandMalayalam

Variant Set Tamil Malayalam

CP Glyph CP Glyph

1. 0B9C ஜ 0D1C ജ

2. 0BB5 வ 0D16 ഖ

3. 0BAE ம 0D25 ഥ

4. 0BBF ◌ி 0D3F ◌ി

5. 0BC6 ெ◌ 0D46 െ◌

6. 0BC7 ே◌ 0D47 േ◌

Table10:Tamil–MalayalamCrossScriptVariants

6.2.2 Cross-scriptvariantsforOriyaandMalayalamCaseofMalayalamandOdia(Oriya)TTHAConsonant:

Thisisthecaseof"ConsonantTtha"whichhappenedtoretainthesameshapedespitebeing

partofdifferentscripts,i.e.,MalayalamandOdia.Thesecharactersare:

ഠ-MALAYALAMLETTERTTHA(U+0D20)

ଠ-ORIYALETTERTTHA(U+0B20)

Bothcharacterslookexactlyalikeandbelongtoa"Consonant"category.Astheyareconsonants,

eachof them,even inthesimplest formi.e. thecharactersthemselves,arevalid labels.Asper

theNBGPcross-script variant inclusionpolicy (AppendixD), this is a valid case for inclusion.

Also,even if theyaresinglecharacters,whenthesamecharactercombines, theoretically they

can forman infinite2numberofcross-scriptvariant labelsbetweenthescripts involved.Here

aresamplesofsomeofthoselabels:

Malayalam Oriya

ഠഠഠ U+0D20 U+0D20 U+0D20

ଠଠଠ U+0B20 U+0B20 U+0B20

2Though theoretically infinite, this number would be limited to the number of such labels whose equivalent punycode string would not exceed 63 characters including the ACE prefix "xn--".

Page 27: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

27

ഠഠഠഠ U+0D20 U+0D20 U+0D20 U+0D20

ଠଠଠଠ U+0B20 U+0B20 U+0B20 U+0B20

ഠഠഠഠഠ U+0D20 U+0D20 U+0D20 U+0D20 U+0D20

ଠଠଠଠଠ U+0B20 U+0B20 U+0B20 U+0B20 U+0B20

Since, having such labels is a realistic possibility and the corresponding labels look almostexactly alike,NBGP has proposed them (together with similar combining marks) as blockedvariants.

Variant Set Oriya Malayalam

CP Glyph CP Glyph

1. 0B20 ଠ 0D20 ഠ

Table11:Oriya–MalayalamCrossScriptVariants

6.2.3Cross-scriptvariantsforMyanmarandMalayalam

Variant Set Myanmar Malayalam

CP Glyph CP Glyph

1. 1002 ဂ 0D31 റ

2. 101D ဝ 0D20 ഠ

7. WholeLabelEvaluation(WLE)RulesThissectionprovidestheWLErulesthatarerequiredbyallthelanguagesmentionedinSection4whenwritteninMalayalamScript.TheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecifications.Below are the symbols used in the WLE rules, for each of the "Indic Syllabic Category" asmentionedinthetableprovidedforcodepointrepertoireinSection5.

7.1.1 Variablesordefinitions

V → VowelM → Matra(VowelSign)C → Consonant

Page 28: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

28

L → ChilluH → Chandrakkala/Halant/Virama(◌◌്U+0D4D)B → Anusvaram(◌ംU+0D02)X → Visargam(◌ഃU+0D03)R → ReordrantMatra "R"isusedinvariantcontextsandpointthereadertoSection6.1fordetails.

7.1.2 RulesforFormingAksharamRule1: HmustbeprecededbyCortheM◌ു(0D41)

Rule2: MmustbeprecededbyC

Rule3: BmustbeprecededbyC,VorM

Rule4: XmustbeprecededbyC,VorM

Rule5: LcannotbeprecededbyB,XorH

Rule6: LabeldoesnotbeginwithL

Rule7: Thecharacterള (0D33)cannotimmediatelyfollow ള (0D33),exceptaspartofadefinedsequence

Rule8:Thecharacterറ(0D31)cannotimmediatelyfollowറ(0D31),exceptaspartofadefinedsequence

8. ContributorsNeo-BrahmiGenerationPanel(NBGP)VeenaSolomon([email protected])PrasadPattarumadomKesavaKurup([email protected])SanthoshThottingal([email protected])AnivarAravind([email protected])JijoPappachan([email protected])

9. References[MSR] IntegrationPanel,"MaximalStartingRepertoire—MSR-4OverviewandRationale",7

February2019https://www.icann.org/en/system/files/files/msr-4-overview-25jan19-en.pdf(Accessedon18thFebruary,2019)

[EGIDS]ExpandedGradedIntergenerationalDisruptionScale,

https://www.ethnologue.com/about/language-status(Accessedon5thJuly,2018)[101] Unicode®StandardAnnex#31MarkDavis,“UnicodeIdentifierAndPatternSyntax”:2.3

LayoutandFormatControlCharactershttp://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters(Accessedon5thJuly,2018)

[102] “ReportonMalayalamUnicodeIssues”(2012)preparedbySanthoshThottingal(alsopartofNEGP)andsubmittedtoUnicodeviaWikimediaFoundation.Itdiscussesbothchilluandntaissues:

Page 29: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

29

http://thottingal.in/documents/ReportonMalayalamUnicodeIssues.pdf (Accessedon5thJuly,2018)

[103] ഓളംDictionary,https://olam.in/ (Accessedon5thJuly,2018) [104] RoozbehPournaderandCibuJohny,“OldandNewChillusinMalayalamand

implicationsforSinhala” http://www.unicode.org/L2/L2013/13036-chillus-uptake.pdf (Accessedon5thJuly,2018)

[105] Wikipedia,“Malayalamscript” https://en.wikipedia.org/wiki/Malayalam_script (Accessedon5thJuly,2018)

[106] Omniglot,“Malayalam(മലയാളം)” https://www.omniglot.com/writing/malayalam.htm (Accessedon5thJuly,2018)

[107] TheUnicodeStandard,Version10.0.,Chapter12“SouthandCentralAsiaI:OfficialScriptsofIndia”, https://www.unicode.org/versions/Unicode10.0.0/ch12.pdf#page=65 (Accessedon5thJuly,2018)

[108] Everson,Michael(2007)."ProposaltoaddtwocharactersforMalayalamtotheBMPoftheUCS"(PDF).ISO/IECJTC1/SC2/WG2N3494.Retrieved2009-09-09: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3494.pdf (Accessedon5thJuly,2018)

[109] AlejandroGutmanandBeatrizAvanzati“Malayalam,TheLanguageGulper” http://www.languagesgulper.com/eng/Malayalam.html (Accessedon5thJuly,2018)

[110] MalayalamRange:0D00–0D7F,TheUnicodeStandard,Version11.0https://unicode.org/charts/PDF/U0D00.pdf (Accessedon5thJuly,2018)

[111] R.Chitrajakumar,N.GangadharanRachanaAksharaVedi“SamvruthokaramandChandrakkala”https://www.unicode.org/L2/L2005/05213-samvruktokaram.pdf(Accessedon2ndAugust,2018)

[112] SanthoshThottingal,“}റ-ഭാഷ,യുണിേ~ാ�,ചി7തീകരണം”https://blog.smc.org.in/nta-rendering-rules/(AccessedonAug2nd,2018]

[113] R.Chitrajakumar,N.GangadharanRachanaAksharaVedi“Chillaksharamof MalayalamLanguage”https://unicode.org/L2/L2005/05214-chillu.pdf

(Accessedon27thAugust2018)[114] PubliccommentfeedbackforMalayalam,TamilScriptLGRProposals,

https://docs.google.com/document/d/1Am1qJXSYPpuUifcfUWT01uwCV-LCAe3XgBsnJvM5tHs/edit#heading=h.1k12tx1767k9(Accessedon18thFebruary2019)

Page 30: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

30

10. AppendixA:ExcludedIn-ScriptVariantsAsthefollowingformationsarenotvalidasperAksharamformationrules,thesecasesarenotproposedasvariants.

1. ഈ 0D08 ഈ

ഇ + ◌ൗ 0D07 + 0D57 ഇ◌ൗ

2. ഊ 0D0A ഊ

ഉ + ◌ൗ 0D09 + 0D57 ഉ◌ൗ

3. ഔ 0D14 ഔ

ഒ + ◌ൗ 0D12 + 0D57 ഒ◌ൗ

4. ഓ 0D13 ഓ

ഒ + ◌ാ 0D12 + 0D3E ഒ◌ാ

5. ഐ 0D10 ഐ

എ + െ◌ 0D0E + 0D46 എെ◌

TableA-1:ExcludedIn-ScriptVariantsDuetoInvalidCombination

InTableA-2,Column1:Thesevowelsignshaveglyphpieceswhichstandonbothsidesoftheconsonant;theyfollowtheconsonantinlogicalorder,andshouldbehandledasaunitformostprocessing.Column2:Although,Unicodedefinesthiscanonicaldecomposition,theStandardrecommendsnottousethesequence[107],p501.Therefore,itisnotadvisabletousetheminIDNlabels;theyareblockedherebyaksharaformationrule.

Code Point 1 + Glyph 1 Code Point 2 + Glyph 2

െ◌ാ (0D4A) െ◌ (0D46) + ◌ാ (0D3E)

േ◌ാ(0D4B) േ◌ (0D47) + ◌ാ (0D3E)

◌ൗ (0D57) െ◌ (0D46) + ◌ൗ (0D57)

TableA-2:SplitVowelCase

11. AppendixB:ConfusableCodePoints

Page 31: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

31

Thecode-pointsbelowarevisuallyconfusingonlyinsmallerfontsandcanbeexcludedfromconsiderationasvariantcodepoints.

Tamil Malayalam

ஸ (0BB8) സ (0D38)

TableB-1:Tamil-MalayalamConfusableCodePoints

Oriya Malayalam

◌ଂ (0B02) ◌ം (0D02)

◌ଃ (0B03) ◌ഃ (0D03)

TableB-2:Oriya-MalayalamConfusableCodePoints

AttheSriLankaface-to-facemeeting,itwasdecidedtoexcludethecodepointsbelowfromthevariantlistasthesedonotlookalike,duetoround/squarestructuraldifferences.

Kannada Malayalam

ಲ (0CB2) ല (0D32)

TableB-3:Kannada-MalayalamConfusableCodePoints

Telugu Malayalam

ల (0C32) ല (0D32)

TableB-4:Telugu-MalayalamConfusableCodePoints

AspercommentreceivedfromMyanmarGPandoncloseexamination,thefollowingcodepointsareconsideredasconfusablewithMalayalam.

Myanmar Malayalam

က (1000) ന (0D28)

ယ (101A) ധ (0D27)

ကာ (1000 + 102C) ന്ന (0D28 + 0D4D + 0D28)

TableB-5:Myanmar-MalayalamConfusableCodePoints

Code points in Table B-6, B-7, and B-8would qualify as cross-script code point variants buttherearenotenoughofthemtoformavariantlabels,thereforethesecasescanbeexcluded.(Ifonlycombiningmarksarevariants foragivenscript,no labelcanbeformedwithoutusingatleastonenon-variantcodepoint).InthecaseofSinhala,therelevantbasecharacterisdistinct.

Page 32: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

32

Kannada Malayalam

◌ಂ (0C82) ◌ം (0D02)

◌ಃ (0C83) ◌ഃ (0D03)

TableB-6:Kannada-MalayalamTooFewIdenticalCodePoints

Telugu Malayalam

◌ం (0C02) ◌ം (0D02)

◌ః (0C03) ◌ഃ (0D03)

TableB-7:Telugu-MalayalamTooFewIdenticalCodePoints

Sinhala Malayalam

◌ം (0D82) ◌ം (0D02)

◌ඃ (0D83) ◌ഃ (0D03)

TableB-8:Sinhala-MalayalamTooFewIdenticalCodePoints

NBGP also considers that 0D1F (ട) MALAYALAM LETTER TTA is similar to 0073 (s) LATIN SMALL

LETTER S and 0455 (ѕ) CYRILLIC SMALL LETTER DZE. However, Latin script and Cyrillic script are not derived from the Brahmi script. This case is out of scope of NBGP cross script variant analysis.

12. AppendixC:Caseofള(0D33)+ള(0D33)This appendix contains copies of all input related to the case of ള (0D33) + ള (0D33). For the

adopted solution see (Section 6.1).

The consonant ള (0D33) rarely follows another ള in Malayalam, except in the case of some place

names. The double conjunct of ള (0D33) formed by code points 0D33 + 0D4D + 0D33 is rendered as

the glyph � which looks visually very similar to a ള following another ള. This can result in spoofed

labels. For example, in Malayalam we write “vellam” as “െവ�ം” - 0D35 0D46 0D33 0D4D 0D33

0D02 (meaning: water), a spoofed label can write it as “െവളളം” - 0D35 0D46 0D33 0D33 0D02.

Combination Code points Glyph

Page 33: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

33

� + ള 0D33 + 0D4D + 0D33 �

ള + ള 0D33 + 0D33 ളള

TableC-1:Caseofള(0D33)+ള(0D33)

This has been restricted by a WLE rule 7. It allows the combination “�ള” (0D33 0D4D 0D33 0D33)

which is present in words like “ഉ�ള�” (meaning: inner dimension viz. volume), and blocks the

combination “ള�” (0D33 0D33 0D4D 0D33) which is rarely found in usage. The existence of “ളള”

(0D33 0D33 ) in considerable percentage on the web can be attributed to misspelling due to extreme visual similarity.

===================================================================

ProposedrecommendationfromtheIntegrationPanel

===================================================================

ProposedrecommendationforMalayalamDATE:2018-06-12

Overview

TheIPrecentlydiscoveredatechnicalissuewiththeproposedvariantsforMalayalam.

IssueStatement

TheMalayalamLGRdefinesthefollowingvariant

0D330D33<->0D330D4D0D33(i.e.:ളള<-->�)

ThispatterngivesrisetosomecomplicationsbecauseiteffectivelymakestheHalant(0D4D)avariantofa"nullposition", inthiscase,whenever itoccursbetweentwoinstancesof0D33ള

LLA. Variant definitions of that nature can lead to unexpected results because a label 0D330D4D0D330D4D0D33canbeanalyzedtwoways:

{0D330D4D0D33}{0D4D}{0D33}and

{0D33}{0D4D}{0D330D4D0D33}

As a result of this, variant definitions of this nature, although seeminglywell-defined on thecodepointlevelcanleadtounexpectedvariantrelationsamonglabels.

Therefore, such kinds of variant sequence definitions cannot be used without some furtherrestriction.BelowtheIPwillsuggesttwopossibleapproachesandrequeststhattheGPconsidertheminlightoftheknowledgeofhowthescriptisused.

Background:

LookingattheMalayalamsamplefiletheIPnotes:0D330D33ളള existsonce(1)insampleof60Klabels

Page 34: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

34

(it'spartofthelongerpattern:0D330D4D0D330D33or�ള)

0D330D330D33(ളളള) exists(0)times

0D330D4D0D33(�) exists523times,or.9%ofthetotal;ofthese:

● 1/10or52arefollowedbyan0D4D(Halant):0D330D4D0D330D4D(�്)

● none(0)isofthepattern0D330D4D0D330D4D0D33(orlonger)

Fromthisonecanconclude:

● � is quite frequent and can be spoofed byളള (which doesn't occur normally or at

leastnotfrequently)

● �് alsooccurswithsomefrequencyandcouldbespoofedbyള� (thelatteragainnot

seeninthesample)

● �ള doesoccur,ifrarely,andcanbespoofedbyള� orളളള,butnotby�്ള(wherethecodepointsare:0D330D4D0D330D33,0D330D330D4D0D33and0D330D4D0D330D4D0D33)

UnderthedefinitionintheproposedLGR�ള andള� arenotactuallyvariantlabelsofeachother,while�്ള isavariantof�ള eventhough it shouldn'tbe. (Thereasonwhythe lastlabel shouldn't be a variant label is because the second halant would be rendered visibly,makingitdistinct.)

Longerpatternsareeitherrareordonotoccurinstandardsample;theyseemquitelikelytobenonsensical (at least someof them). Therefore, the cases seen so farwould appear to be thetotalsetofcaseswherethereisapracticalneedforsomevariantsorotherrestriction.

Options

The IP identified two suggested options to resolve the issue.

Option One

Restrictingthevariantsoitcannotoccurfollowingan0D33ളorHalant.

If thevariantcanbe limited to thebeginningofacluster, that is,a requirementadded that itonly applies when not following an 0D33 of 0D4D, then we can take still care of the mostfrequentandsecondmostfrequentcase,andthesecasesproducevariantlabelsthatarerelatedin expected ways: longer strings of alternating 0D33 and 0D4D pose no problems as anyalternategroupingofcodepointsintosequencesdoesnotleadtoanyadditionalvariants.Onlytheleading{0D330D33}or{0D330D4D0D33}wouldcausevariants.Inparticular�്ള (with

avisibleHalant)wouldnotbecomeavariantof�ള,etc.However,caseslike�ള / ള� /

ളളള wouldstillnotfullyworkasintendedasthefirstandsecondlabelwouldnotbevariantsofeachother,andonlythefirstwouldbeavariantofthelast.

OptionTwo

Page 35: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

35

Restrictingvalidlabelstoexcludeളള

Restrictinglabelsfromcontainingtwo0D33ള thatarenotjoinedbyaHalantwouldrobustlypreventanyspoofing.However,itwouldalsodisallowasmallnumberofpotentiallymeaningfullabels. (About 0.0015% of the words in the test file are affected - or 1 in 60K). No variantdefinitionwouldbeneeded.

Recommendation

The IP requests the NBGP to study these options and to consider them in determining aproposedapproachtofixingtheissuewiththekindofvariantmappingmentionedattheheadofthedocument.

We realize that these represent a trade-off. For the Root Zone we feel comfortable thatrestrictionoftheallowedlabelstoavoidsomeproblemcasesisdefinitelyappropriate,eveniftheprocesscontainsaStringReviewphasethatwouldallowthemanualweedingoutofspecificbadcases.

However,wefeelthatanoptionthatleavessome,ifrare,opportunitiesforspoofingmaywellbeinappropriateforthesecondandotherlevelsaswell: forthoselevels,humanoversightoftheprocessisgoingtobeevenlessavailable.

The IPsuggests that theGPalsoweigh theextent towhichdecisions for theRootZoneaffectotherzones(byexample).

===================================================================

Feedbackfromcommunity

===================================================================

നീള&മുടി,neelallamudiishowpeoplesayനീളമു�മുടി,neelamullamudi[meaning:longhair,lit.hairwithlength],locallyinValluvanadareaofNorthKerala.Similarly,

നല�താള�പാ)്,nalla thaalalla paattu, is the same asനല�താളമു�പാ)്,nallathaalamullapaattu[meaning:(a)songwithgoodrhythm]

െവ��കിണ�,vellallakinaru,isെവ�മു�കിണ�,vellamullakinaru[meaning:(a) well with water]This label is not blocked because �ള is allowed.

I don't think these need to be considered, as theള&partin these labels is aspokencontractionofഉ&,ulla[meaning:having,with].

Inotherparts ofKerala, the spokendialect changes the contraction to "െളാ�" orേളാ�whichareallowedaspertherule.

Then there are someplace names likeമാള&.On doing aGoogle search, I got only asingleresult[google.co.in].

Page 36: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

36

Feedbackfromthecommunity:

I won't recommend adding such rules based on the existence of current (and popular)vocabularyof2018.Malayalamhasanactivepracticeofborrowingwordsfromotherlanguages(mainly fromEnglishnowadays)ratherthan inventingnativewords.Becauseof thisanythingthatisavalidconjunctcancomeintothelanguage.Hereisanexample:Youmayknow,Iamatypefacedesignertoo.WhensomeofourinitialfontsdidnothavetheOpenTyperulestohandle�+ബ ,�+ബു,itwasbecausenobodycouldfindawordthatcanhavesuchacombination.Later,around2010,Facebookbecameathing.PeoplestartedwritingitinMalayalam.Ourfontscouldnothandletherenderinggracefullyandthenweaddedtherequired ligaturesandrulesandreleasedanewversion.WhileIwasworkingonanothertypeface,anotherconjunct�+മwasnotsupportedonthethinkingthatthereisnoMalayalamwordwith�മ.Butlaterafriendcameandcomplainedhewantstohaveanerror-freerenderingforഅ�മീർ..Sothatisaboutthe 'reasoning of rare occurrence inMalayalam'. Btw, there are people and places withnameമാള&(Malalla) - tryagoogle search.Wepeople fromValluvanadareaoftenhas thisനല�നീള&മുടി,നല/താള&പാ2്,െവ&&കിണ7...

Agooglesearchforെവ&&showsmethatitisaplacenameinIdukki.

About the visual similarity, again, as a type designer, we consciously make them visuallydifferentwhiledesigning.�+ള ->�appearvery joinedwiththetails fusedtogether,Whileളളappearwithenoughspacingbetweenthelettersandnofusingoftails.

Also,ററis a similar casewhere peoplewrite twoRa together to get /tta/ , Almost all fontsnowadaysstackthemifitisfor/tta/.Butnotguaranteed.Sosimilarargumentscanbethereforthataswell.

Misspellinglikeമീററ7,ലാററൈററ7etc.comestomymind.

Inallthesecases,exclusionruleswouldbetheleastpreferredchoice.

രണ്ട് ള അടുപ്പിച്ചു വരുമ്പോൾ അത് ള്ള യുടെ വേരിയന്റായി

കണക്കാക്കാമെന്നായിരുന്നു പറഞ്ഞിരുന്നത്. തിരിച്ചും.

പക്ഷേ രണ്ട് ളകൾക്ക് ശേഷം ഒരു െ ചിഹ്നം വന്നാൽ അത് ളളെ എന്നാവും. അത്

ള്ളയുമായി ഒരു തരത്തിലും സാദൃശ്യമില്ലാത്തതുമാണ്. ളളെ എന്ന സീക്വൻസിനെ

ള്ളെ എന്നെ സ്വീക്വൻസിന്റെ വേരിയന്റായി കണക്കാക്കുന്നതായിരുന്നു

Page 37: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

37

നേരെത്തെയുള്ള പ്രൊപ്പോസൽ. അത് അനാവശ്യമായ നിയന്ത്രണമാണെന്നാണ്

കാണുന്നത്. അതിനാണ് പുതിയ ഒരു തിരുത്തൽ.

പ്രധാനമായും ള്ള , രണ്ട് ളയുടെ വാരിയന്റാവണമെങ്കിൽ അതിനു ശേഷം െ ചിഹ്നം

പാടില്ല, എന്ന ഒരു constraint കൂടി വെച്ച് ളളെ എന്ന സീക്വൻസ്

പ്രശ്നമൊന്നുമില്ലാതെ ലേബലിൽ അനുവദിക്കാനാണ് പുതിയറൂളുകൾ

വഴിയൊരുക്കുന്നത്. പ്രശ്നമൊന്നും കാണുന്നില്ല.

െ യ്ക്കു പുറമേ, േ, ോ, ൊ, എന്നിവയ്ക്കും ഇതേ സ്വഭാവമുണ്ട് - reordering.

ളയുടെ അതേ നിയമങ്ങൾ റ്റ യുടെ കേസിലും വരും.

ള + ള്ര എന്ന ഒരു സീക്വൻസ് പക്ഷേ ഈ ഡോഖ്യുമെന്റിൽ പരമാർശിച്ചിട്ടില്ല.

റീഓർഡറിങ്ങ് വരുന്ന ഒരു കേസാണത് - സ്വരചിഹ്നമല്ലാതെ. ള്ര = ള + ് + ര

ള്ള്ര <-> ളള്ര എന്ന ഒരു വാരിയന്റ് ഡെഫനിഷൻ എഫക്ടീവ് ആയി വരുന്നുണ്ട്

ഇപ്പോൾ - പുതിയ പ്രൊപ്പോസലിലും. കാരണം R എന്ന സെറ്റിൽ റീ ഓർഡർ

ചെയ്യുന്ന സ്വരചിഹ്നങ്ങൾ മാത്രമേ ഉള്ളൂ.

ള്ള്ര <-> ളള്ര visually similar അല്ലാത്തതുകൊണ്ട്

സ്വരചിഹ്നങ്ങളെപ്പോലെത്തന്നെ അനാവശ്യമായ constraint ആവുന്നുണ്ട്. അതേ

സമയം വളരെ വളരെ അപൂർവമാണ് ഈ സീക്വൻസ് എന്നത് വാസ്തവവുമാണ്.

ട്രാൻസിലിറ്ററേഷനിൽ ചിലപ്പോൾ വന്നേക്കാം.

അതുകൂടി R എന്ന സെറ്റിൽ ചേർക്കുന്നോ? അതായത് "Halant-followed-By-Ra" ?

Translation:

It was said that when two 0D33(ള) come in squence (ളള), it may be considered as a variant

of 0D33 Halant 0D33 (ള്ള) and vice versa. But the problem with this is that if a Matra comes

after two 0D33s, it reorders in rendering as 0D33 Matra 0D33 ( for example, ളളെ ) which is

not visually similar to ളള. According to the previous proposal, the sequence ള്ളെ ( 0D33

Matra 0D33) was considered a variant of ളളെ ( 0D33 0D33 Matra). It is an unnecessary

restriction and hence this correction.

Page 38: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

38

First of all, in order to make 0D33 Halant 0D33 (ള്ള) a variant of two 0D33 in sequence (ളള),

there shouldn't be any vowel sign (Matra) after 0D33 0D33. This constraint allows ളളെ in the

label without any issues whatsoever.

Same thing is applicable to other matras as well such as േ, ോ, ൊ.

The same rule is applicable for റ (0D31) and റ്റ (0D31 Halant 0D31).

Another similar case not mentioned in the document is the sequence ള + ള്ര = ളള്ര

Reordering is applicable to this one as well even though it is not a Matra sign.

ള്ര = ള + ് + ര (0D33 0D4D 0D30)

ളള്ര is 0D33 0D33 0D4D 0D30

This makes a ള്ള്ര <-> ളള്ര definition effective because in the new propsal R set only

contains the re-ordering vowel signs (Matra). But ള്ള്ര <-> ളള്ര aren't visually similar and

hence an unnecessary constraint just like the vowel signs. On the other hand, this sequence

is very rare and found in transliteration from time to time. Should this be added to the R set

as well, that is Halant followed by Ra (0D4D 0D30)?

Page 39: Proposal for a Malayalam Script Root Zone Label Generation ... · west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji,

39

13. AppendixD:NBGPCross-scriptVariantInclusionPolicyIf, in any two given scripts, all the potential cross-script variants consist of dependent (e.g.VowelSigns,Anusvara,Visarga,Chandrabinduetc.)charactersONLY,thenthatentiresetcanbeignoredandnocross-scriptvariantsbeproposedbetweenthosetwoscripts.

If,inanytwogivenscripts,thereisATLEASTONEnon-dependent(e.g.Consonant,Voweletc.)cross-script variant character/sequence present, all the potential cross-script variants beconsidered and proposed between the two scripts.This cross-script analysis has been restricted to the scripts that have descended from theBrahmiasmostof themsharesimilarusagepatterns.Byand large,allof thesescriptshaveacommonsetofcharactersthatexistedinBrahmiscriptandbearthesameidentities.However,as the scriptsbranchedout from theBrahmi,dependingonvarious factors, the shapesof thecharacterschanged.Thischangeintheshapewasnotuniformacrossallthecharactersandthescripts. Some characters shapes did change significantlywhereas some of them still retainedsimilarity.Thecross-scriptsimilarityanalysisalsoaimstoidentifysuchcaseswherethesamecharacterretainedalmostthesameshapedespitebeingpartofthedifferentscripts.Thesesetofcharacters are variants of each other in the true sense, rather than merely by co-incidentalvisualsimilarity.

Since, having such labels is a realistic possibility and the corresponding labels look almostexactlyalike,NBGPhasproposedthemasblockedvariants.

NBGPacknowledgestheconcernthatthisshapeisquitegenericandmayhaveparallelsinotherscriptsnotunderitsambit.However,asNBGPdoesnothaveanyexposureaboutactualusageof those characters in those particular scripts, NBGP desisted from including them in theanalysis.AsNBGPhasalreadyconsideredalltherelatedscriptsunderthecross-scriptvariantanalysis,thesimilarityofthecharactersbelongingtoNBGPscriptswithotherscriptsnotundertheNBGPambit,maybeofamereco-incidentalvisualnature.

Additionally,thisconcernisnotlimitedtothesetwocharactersbutforallthecharactersinallthescriptsunderthescopeoftheRootLGRprocedure.CarryingoutthisanalysiscanpracticallybedoneonlywiththeGenerationPanelsthatexistwhiletheNBGPisactive.Thisstillleavesoutthosescriptsoutof thescopewhichmaynothaveaGenerationPanelestablishedyet.Hence,carryingoutthisexerciseinentiretyisquiteimpracticable.Thisconundrumcanberesolvedifallthesuchcasesarehandledbythe"StringSimilarityAssessmentPanel"ofICANN.