computing and the qurʾān

22
Computing and the Qurʾān Some caveats Thomas Milo 1. Introduction - Arabic Input and Output Assessed In an essay entitled The Study of Tafsīr in the 21st Century: E-Texts And Their Scholarly Use, Andrew Rippin discusses the value of electronic edi- tions of Qurʾān and Tafsīr texts as published on the internet. i It contains strong praise for computer technology: “I suspect that some of these texts have been transformed into their electronic versions through Optical Character Recognition processes (rather than being inputted through simple keying). This, of course, speaks highly of the abilities of the technology and how much it has progressed over the last 10 years - the very fact that this can be done with Arabic strikes me as astounding.” Perhaps this view is a bit too optimistic: in reality Arabic OCR programs of- ten have difficulty even reading perfectly clear, simple, unvowelled, horizon- tal computer script without kerning and without traditional ligatures ii . As a tool for digitizing more fluid Arabic script, such OCR is unreliable, let alone that it can be used for recognizing sophisticated, kerned typesetting or any kind of handwritten Arabic text – including key historic source material. Similar uncritical faith in Digital Omnipotence is encountered in the ar- ticle Computers and the Qurʾān, by Herbert Berg in the Encyclopaedia of the Qurʾān iii : “Producing electronic versions of the Qurʾān presents no more of a technological difficulty than any other text, though the Arabic alpha- bet has several major encoding standards: ASMO449, ISO8859-6 and Unicode.” This statement does not take into consideration the fact that not one of the mentioned encoding standards handles the grapheme inventory of the con-

Upload: abbraxas

Post on 18-Jul-2016

16 views

Category:

Documents


4 download

DESCRIPTION

media

TRANSCRIPT

Page 1: Computing and the Qurʾān

Computing and the Qurʾān

Some caveats

Thomas Milo

1. Introduction - Arabic Input and Output Assessed In an essay entitled The Study of Tafsīr in the 21st Century: E-Texts And Their Scholarly Use, Andrew Rippin discusses the value of electronic edi-tions of Qurʾān and Tafsīr texts as published on the internet. i It contains strong praise for computer technology:

“I suspect that some of these texts have been transformed into their electronic versions through Optical Character Recognition processes (rather than being inputted through simple keying). This, of course, speaks highly of the abilities of the technology and how much it has progressed over the last 10 years - the very fact that this can be done with Arabic strikes me as astounding.”

Perhaps this view is a bit too optimistic: in reality Arabic OCR programs of-ten have difficulty even reading perfectly clear, simple, unvowelled, horizon-tal computer script without kerning and without traditional ligaturesii. As a tool for digitizing more fluid Arabic script, such OCR is unreliable, let alone that it can be used for recognizing sophisticated, kerned typesetting or any kind of handwritten Arabic text – including key historic source material.

Similar uncritical faith in Digital Omnipotence is encountered in the ar-ticle Computers and the Qurʾān, by Herbert Berg in the Encyclopaedia of the Qurʾāniii:

“Producing electronic versions of the Qurʾān presents no more of a technological difficulty than any other text, though the Arabic alpha-bet has several major encoding standards: ASMO449, ISO8859-6 and Unicode.”

This statement does not take into consideration the fact that not one of the mentioned encoding standards handles the grapheme inventory of the con-

Page 2: Computing and the Qurʾān

Computing and the Qurʾān

495

temporary Qurʾāniv. This inventory is larger and typographically far more complicated than the basic grapheme inventory of newspaper Arabic, the only variety covered by the first two, now technologically obsolete, code tables. Their successor, the Unicode standard, still misses some qurʾānic graphemes while existing ones lack coherence and guidelines for use. But while Unicode has at least the potential to be developed into a scholarly reli-able encoding standard, that is only true as far as contemporary Qurʾān or-thography is concerned. Early Arabic orthography is not at all covered, nor will it ever be unless concerned scholars take the initiative. It is important to realize that industry standards emerge from commercial and political con-cerns, not out of an awareness of the needs of the Qurʾān or, for that matter, Classical Literature.

It is an enormous challenge to cover Qurʾānic Arabic graphemes unam-biguously and exhaustively with Unicode as a non-linguistic compromise out of a plethora of legacy encodings on the one hand and the anarchy of substandard fonts to render it on the other. After all, the Unicode standard deals with scripts and not with typography. It is thus essentially a business initiative, and only active and concerted intervention on the part of scholars can bring it up to academic standards. For instance, all yāʾ variants should be covered by just yāʾ without dots (U+0649) and separate diacritical graph-emes like ‘two Arabic dots below’ (along with all other dots still missing from Unicode and fiercely opposed by a few technocrats) and ‘combining hamzä above’ (U+0654) and ‘combining hamzä below’ (U+0655). This last character, along with both variants of combining hamzä, superscript alif and many others, is absent from the other two encoding agreements (or rather disagreements, since their coexistence caused much confusion).

Arabic font behaviour is still inconsistent and unreliable as regards Uni-code characters. Bad font compatibility disorients researchers when map-ping characters – what looks correct may actually be mis-encoded and vice versa. For the user this means that what may appear on screen as identical words are in fact digitally different words, with all that that entails for text analysis, sorting, indexing, etc.v

Unicode uses a model resulting from earlier conferences about Arabic computing. There was much in-fighting about the status of lām-alif, whether to encode it as a single glyph (graphic encoding) or as two separate letters (graphemic encoding). Eventually it was agreed that all contextual shapes of one and the same letter should be covered by a single text code – this is the graphemic model. Therefore, in the encoded representation of

Page 3: Computing and the Qurʾān

Thomas Milo

496

Arabic script there would be no ligature lām-alif, but separate codes lām and alif for each constituting grapheme. The importance of this decision cannot be underestimated: it was a choice for encoding content of script, not form.

Early qur’anic orthography, with its unmarked letters and scarce, sepa-rately added disambiguation markers (little stripes for consonants and dots for vowels) is fully archigraphemic and not structurally supported by Un-icode graphemic model.vi By analogy with the archiphoneme in phonology, in the Arabic writing system an archigrapheme is the common element be-tween two or more graphemes minus their distinctive features. Therefore archigraphemes represent another type of script content than graphemes, though they relate to graphemes in a systematic way. Encoding Arabic ex-clusively as graphemes, without identifying the real basic elements is analog-ous to encoding lām-alif as a single code. This graphemic method places Arabic in a synchronic world without a history or a sense of continuity. It disconnects computerized Arabic from the diachronic aspects of writing, but also from less-than-bureaucratic orthography and, last but not least, from the culturally closely related non-Arabic use of the script – Persian, Urdu, Ottoman, etc.. This conceptual flaw is curiously matched by the ab-sence of a scholarly critical text edition of the Qur’ān documenting the transmission through the ages of this key historic text.

2. Type Design versus Script Analysis In the preface to the 2007 re-edition of Arthur Jeffery’s The Foreign Vo-cabulary of the Qur’ān, Gerhard Böwering and Jane Dammen McAuliffe quote Jeffery as saying:

“The ideal would be to print on one page a bare consonantal text in the Kufic script, based on the oldest MSS available to us, with a criti-cally edited Ḥafṣ text facing it on the opposite page, and with a com-plete collection of all known variant readings given at the foot of the page.”

Obviously a historical text can only be studied in the context of the writing system in which it is attested. However in the case of the early Arabic script in which the oldest and most relevant Qur’ān MSS are written, this is not standard practice. And to introduce it, there are some real hurdles to nego-tiate. For starters, the art of writing and indeed the knowledge of authentic

Page 4: Computing and the Qurʾān

Computing and the Qurʾān

497

“kūfi” is extinct.vii Contemporary calligraphers can only guess how the an-cient calligraphers constructed the letters.viii Moreover, on the scholarly side, no complete description of the early Arabic writing system or systems ex-ists.ix Not one publication mentions, let alone describes, the dissimilation rules that are typical of all Arabic manuscript styles – including “kūfi”.

Dissimilation is a graphic technique that appears to have been designed to improve the legibility of Arabic letter fusions. A fusion is the linking and assimilating of a letter block into a single script unit. In a fusion, the abstract graphemes that make up a letter block are visualised by allographs. In those cases where assimilation leads to ambiguities, dissimilation is applied, which implies the use of context-determined allographs. This pattern is just as regular as the basic break-down in initial, middle, final and unconnected forms and apparently without exception. Consequently it can be called the dissimilation rule. x

Dissimilation as a distinctive feature is critical for disambiguating groups of 4 or more B stubs and for identifying S archigraphemes:

GSBH

The archigrapheme S is char-acterized by stub triplets, over whose tops a virtual straight line can be drawn (added ex-plicitly by the author, here and in the first example BSA be-low).

-SB-

distinctive feature: raised middle form of B

In certain styles S takes the form of descending triplets:

BSA

The lowest stub in this example can-not be mistaken for B, since, like in the example GSBH above, a B in that position would be raised.

BS-

distinctive feature: raised initial form of B

BSA

A comparison with the flat style shows that regardless the angle of the S triplets, the same rules apply.

BS-

distinctive feature: raised initial form of B

Page 5: Computing and the Qurʾān

Thomas Milo

498

Unlike in modern Arabic calligraphy and typography, in the pre-mansūb scripts letters can be classified in vertical, horizontal, round and cascading categories. Each of these categories has a distinctly different fusion behav-iour. For instance, a repetition of two identical vertical letters (archi-graphemes) leads to a descending dissimilation – sloping twins:

BB

B is a vertical archigrapheme; two con-secutive identical vertical letters, like B in this example, dissimilate by applying descending height.

BB-

distinctive feature: first element of twin vertical letters is raised

Here is an example of the implications of the “sloping twin” pattern:

BY

Single B connecting to Y, a horizontal letter that only occurs in final position. By default, it connects with a curve.

B-

BBY

The B sloping twins overrule final Y causing it to lose the curve. This is the regular assimilation of Y to a vertical, non-initial letter. See also the letter block SBBLY below.

BB-

distinctive feature: sloping twins

Strings of uneven numbers of B archigraphemes are broken up in sloping twins, starting with single, lower B allograph, as can be seen in this com-pletely regular, yet revealing letter block:

LBBBBBH

la nubayyitannahu -BBBBB-

distinctive feature: normal initial form, repeated sloping twins

Page 6: Computing and the Qurʾān

Computing and the Qurʾān

499

It is the dissimilation rule that disambiguates S and BBB:

SA

S with three identical ‘teeth’. S-

BBBA

opposition BBB:S by sloping twin BBB-

distinctive feature: initial form followed by sloping twins

It is the sloping twins that disambiguate BBS and SBB:

BBS

dissimilation from S by sloping twin in initial position

BBS

LBBS

dissimilation from L and S by sloping twin in middle position

-BBS

distinctive feature: sloping twins

SBBLY

Sloping twin, in middle position. Note that the Y has no bridge, instead it assimilates vertically to vertical letters like L.

SBB-

GSBBM

Another example of dissimila-tion from S by sloping twin in middle position

-SBB-

distinctive feature: sloping twins

In this sloping twin dissimilation system, the letter block LLH receives com-pletely regular treatment. The rules apply independently of the word which is written with it:

Page 7: Computing and the Qurʾān

Thomas Milo

500

BBH LLH

LLD LLH

The second generation mansūb or proportionate scripts inherits the dissimi-lation system, but the rules are different. The method of sloping twin dis-similation is discontinued:

BBH LLH

LLD LLH

Only when the letter blocks LLH and FLLH are used to denote Allah, they retain the first-generation feature of sloping twin dissimilation, along with the more compact shape of lām typical of the early scripts. The resulting specialized ‘theograph’ adds to the second generation of scripts a novel functional contrast to the writing system that can be classified as a ‘kūfism’:

theograph LLH

Page 8: Computing and the Qurʾān

Computing and the Qurʾān

501

F + theograph FLLH

fa li l-lāhi qallalahu

In early script grammar, of which this dissimilation system is an aspect, also a variation factor can be isolated: mašq or elongation. A variation factor is a rule subsystem that governs additional shaping that is not obligatory and that does not influence the semantics of the (archi-) graphemes. Vertical letters (A, B, S, L, E, F, Q, N) and cascading letters (G) can stretch their base line connectors, horizontal letters (D, C, K, T, Y) stretch their body. Round letters or bumpers (R, W, M, H) remain passive: they accept connections, but do not participate in any stretching themselves. For example, the archi-grapheme D is a horizontal letter because the letter is horizontally ‘elastic’:

LLD BBD

Therefore, in early script, the letter block LLH receives completely regular treatment independent of the word which is written with it:xi

L_LH LL_H LLH

Page 9: Computing and the Qurʾān

Thomas Milo

502

However, in the second generation scripts, L, including the lām of God, can-not generate an elongated connection, therefore shape variation by means of elongation is ruled out:

theograph

3. Approaching Arabic Script with Linguistic Concepts

In his seminal Cours de Linguistique Générale, Ferdinand de Saussure writes:

Si nous pouvions embrasser la somme des images verbales emmagas-inées chez tous les individus, nous toucherions le lien social qui con-stitue la langue. C’est un trésor déposé par la pratique de la parole dans les sujets appartenant à une même communauté, un système grammatical existant virtuellemant dans chaque cerveau, ou plus ex-actement, dans les ceveaux d’un ensemble des individues; car la langue n’est complète dans aucun, elle n’existe parfaitement que dans la masse. … En séparant la langue de la parole, on sépare du même coup: 1o ce qui est social et ce qui est individuel; 2o ce qui est essentiel et ce qui est accessoire et plus ou moins accidentel.xii

Scripts or writing systems are generally not perceived as part of a language, let alone of grammar.xiii The grammar of Arabic, however, is incomplete without covering its sophisticated writing systemxiv. By substituting script images for “images verbales”, the distinction between competence and per-formance becomes applicable on the Arabic script systems as well: script and writing. From manuscript evidence of writing in a perceived style a script grammar (un système grammatical) can be inferred. By analogy with lin-guistics, such grammar can serve both as a theoretical model and as a yard-stick for understanding and systematically describing variations in perform-ance. Seen in this light, François Déroche’s classification using the metaphor of circles is in fact a case of mapping variations in performance without the explicit concept of shared competence underlying all instances of perform-ance:

Page 10: Computing and the Qurʾān

Computing and the Qurʾān

503

As a working hypothesis we have decided to consider each cluster of scripts as a circle whose centre is occupied by the manuscripts show-ing the greatest care, the greatest skill and the greatest regularity. The further one goes from the centre, the more examples ones finds in which the scribe has only loosely reproduced the letter shapes that distinguish the ‘ideal’ form of the script.xv

With the ‘ideal form of the script’ Déroche’s may imply a concept of gram-mar, with the ‘circle’ mapping various degrees of sophistication in its per-formance. However, from a Saussurian point of view, the centre of such a circle has no more relevance than the other positions in the circle. To estab-lish the competence behind a certain script, any performance can yield clues: car la langue [substitute: l’écriture] n’est complète dans aucun, elle n’existe parfaitement que dans la masse. For it is the method of fusing and dissimi-lating that evidences the grammar, whereas the scribe’s perfect execution of letter block fusions or stylized swashing of final forms is more relevant from an art historical perspective. ‘Regularity’, at least in terms of script grammar, remains problematic as it can only be assessed after all the rules have been identified – which is presently not yet the case. In other words, the complete contents of the ‘circle’ must be used to reconstruct the script grammar. Then, the ultimate test is to turn this grammar into a computer model. To get esthetically pleasing results, the material near the centre of the ‘circle’ should best be used to model the computer glyphs.

The fact that it is not just theoretically but also practically possible to make accurate computer models of Arabic script systems, creates a new scholarly obligation. Turning an analysis into a computer model is a unfor-giving method for exposing inconsistencies and shortcomings. xvi In terms of structural linguistics, a computer model of a script must be based on com-petence rather than performance, and by nature requires nothing less than complete, exhaustive analysis. The resulting computer synthesized script images enable visual verification of the model’s accuracy: reversible analysis.

The present body of publications about early Arabic does not contain sufficient information to make a computer model of a script in this sense.xvii Apart from the issue of dissimilation mentioned above, the description of allographic behavior is incomplete. For instance, no publication explicitly specifies that, though dāl and non-final kāf appear to have the exact same shape, this only happens in complementary distribution.xviii This shared shape occurs as kāf only in non-final position and as dāl, of course, exclu-sively in final position):xix

Page 11: Computing and the Qurʾān

Thomas Milo

504

BKSBW

K

BBD

D

distinctive feature: position

Only before a space, final kāf (connected and unconnected) is disambigu-ated from dāl by a vertical bar:

BK

K

BBD

D

distinctive feature: shape

This is just one example where early Arabic script differs in structure from contemporary Arabic script. Computer fonts that are marketed with the name or trade mark “Kufic” are not based on the characteristically different script grammar of early Arabic script. Such fonts are artists’ impressions rather than reliable computer models of the scripts that they are named af-ter. Therefore to print “bare consonantal text in the Kufic script” as pro-posed by Jeffery is not as trivial as it seems, because:

1. the encoding standard that is now at the heart of all software, lacks the concept of archigraphemes, the basic unit of early Ara-bic orthography;

2. for building a computer model of ‘kūfi’ script that is scientifically correct, all knowledge of early Arabic has to be built from scratch.xx For authentic script synthesis nothing less than an ex-haustive, reversible analysis will work.xxi

Page 12: Computing and the Qurʾān

Computing and the Qurʾān

505

4. Final Remarks

The EI2 article Computers and the Qurʾān continues: “The pages of the Qurʾān need only be scanned and preserved as im-ages or, alternatively, scanned and then encoded according to one of these standards using Optical Character Reader (OCR) software.”

Given the incomplete code set coverage of the Qurʾān, this paragraph unin-tentionally proposes what amounts to a breach of scholarly integrity – be it an innocent one as long as there exists no OCR that can do the job anyway. While straightforward image digitization – scanning - is primitive, it is at least not corrupt. However, given the fact that ASMO449 and ISO8859 (and dozens of alternatives) are not designed to cover the Qurʾān and that its coverage by Unicode is still incomplete and ambivalent, it is impossible to encode the Qurʾān without tampering with the text. Doing so regardless invalidates research before it is even started, as it will be based on unreliable text. Still quoting Computers and the Qurʾān:

“Many such electronic versions of the Qurʾān already exist … Nor does digitizing the Qurʾān present any significant theological diffi-culty.”

Apart from the theological implications, the situation sketched above should alarm any academic researcher. And, in fact, it does. In the quoted text An-drew Rippin remarks about e-texts and their scholarly use:

“The basic inaccuracy of the available texts is certainly problematic. This manifests itself in a number of ways: simple textual errors, un-explained textual changes, and lack of clarification in text-compre-hension matters and in text-critical matters.”

But there is more: even if every single grapheme attested in a particular type of Qurʾān recension, or any early manuscript for that matter, were covered by the Unicode Arabic character set, serious problems remain. For instance, the latest version of the Unicode standard (5.1, 2008) defines yāʾ without dots (U+0649) as a continuous letter with four-fold assimilation. Yet some fonts still program the alif maqṣūrä to disconnect in non-final position: their designers are unaware the qurʾānic occurrence of yāʾ without dots in initial or middle position (OCR programs work on the same assumption and will therefore fail). Others provide four-fold assimilation, but with errone-ous dots in the non-final position. And, of course, a few fonts actually com-ply with the standard.

Page 13: Computing and the Qurʾān

Thomas Milo

506

Overlooking the crucial role of typography in textual computing is another aspect of the article Computers and the Qurʾān:

“The importance of both Qurʾānic recitation and calligraphy demon-strates that Muslims accept the presentation of the Qurʾān in various media and even recitational requirements such as the taʿawwuḏ can be incorporated digitally”

Clearly the author refers to recorded human recitation, i.e., sound digitiza-tion, not to computer-synthesized voices. However, mentioning calligraphy in the same sentence with recitation leaves the reader with the impression that computers render digital Arabic text actually with calligraphic quality. But this is absolutely untrue: computerized Arabic scripts, i.e., fonts, are to calligraphy what computer squeaks are to real recitation. When dealing with Arabic historical orthography and Islamic calligraphy and text manufacture, the present state of the art font technology and Arabic computing in general is still more of an obstacle than a tool – thus leaving a vacuum that urgently needs to be filled in. To summarize, for reproducing any recension of the Qurʾān, tools can and must be made:

1. The grapheme inventory of early Arabic needs to be analyzed and added to the Unicode standard in the form of additional code points and the protocol for using these characters must be de-fined more precisely;

2. The script grammar of early Arabic writing needs to be recon-structed meticulously in order to create the required script im-ages.

Such a project creates not a ‘font’, but a computer model of an Islamic writ-ing system.

Page 14: Computing and the Qurʾān

Computing and the Qurʾān

507

SELECTED LITERATURE

Abbott, N., (1939), The Rise of the North Arabic Script and its Ḳurʾānic

Development, Chicago Blair, S., (2006), Islamic Calligraphy, Edinburgh Dammen McAuliffe, J., General Editor, (2001), Encyclopaedia of Islam, Vol-

ume One A-D, Leiden Déroche, F., (1992), The Abbasid Tradition: Qur'ans of the 8th to the 10th

Centuries AD, Oxford Déroche, F., (2005), Islamic Codicology, an Introduction to the Study of

Manuscripts in Arabic Script, Oxford Endress, G., (1982), Herkunft und Entwicklung der arabischen Schrift, in:

Grundriss der arabischen Philologie, Band I – Sprachwissen-schaft

Fendall, R., (2003), Islamic Calligraphy, Sam Fogg Catalogue 27 Flury, S., (1920), Islamische Schriftbänder Amida-Diarbekr Fraser, M. and Kwiatkowsky, W., (2006), Ink and Gold, Islamic Calligraphy,

Sam Fogg Catalogue, London Fuʾād Sayyid, A., (1997), al-Kitāb al-ʿArabī l-Maḫṭūṭ wa ʿIlm al-Maḫṭūṭāt Grohmann, A., (1967-1971) Arabische Paläographie, Band I/II Gruendler, B., (1993),The Development of the Arabic Scripts Ǧumʿä, I., (1969), Dirāsä fī Taṭawwur al-Kitābāt al-Kūfiyyä, ʿalà l-Aḥǧār fī

Miṣr fī l-Qurūn al-Ḫamsä l-Ūlà li l-Hiǧrä, Cairo Jeffery, A., (1938), The foreign Vocabulary of the Qurʾān, republished Lei-

den, 2007 Lions, J., (1968), Introduction to General Linguistics, Cambridge Lüling, G, (1974), Über den Ur-Qurʾān, Erlangen Milo, T., (1989), Fragments from the Koran, in: Design into Art – Drawings

for Architecture and Ornament – The Lodewijk Houthakker Collection, Volume II, London: Philip Wilson Publishers, re-published in: Mela Notes, No 62, 15–34, as The Koran Frag-ments of the The Lodewijk Houthakker Collection (1995),

Milo, T., (2002), Authentic Arabic: a Case Study. Right-to-Left Font Struc-ture, Font Design, and Typography, in: Manuscripta Orien-

talia, 8, No. 1, 49–61

Page 15: Computing and the Qurʾān

Thomas Milo

508

Mitchell, T.F., (1951), Writing Arabic, a Practical Introduction to Ruqʿah Script, Oxford

Rezvan, E.A., (2004),The Qurʾān of ʿUthmān, St. Petersburg Safwat, N., (1977), The Harmony of Letters, Islamic Calligraphy from the

Tareq Rajab Museum, Kuwait Saussure, F. de., (1916) Cours de Linguistique Générale. Eds. Charles Bally

and Albert Sechehaye, édition critique préparée par Tullio de Mauro, Paris

Schimmel, A., (1990), Calligraphy and Islamic Culture, London Stanley, T., (1996), Introductory Studies to: The Qurʾān and Calligraphy, a

Selection of Fine Manuscript Material, Bernard Quaritch Catalogue 1213

Thanoun, Y.,(1986), Old and New in the Origin of Arabic Script and its De-velopment in Various Ages in: Al-Mawrid, a Quarterly Jour-

nal of Culture And Heritage, Vol 15, nr 4, Ministry of Culture and Information, Baghdad

The Unicode Consortium, (2007), The Unicode Standard, Boston

Page 16: Computing and the Qurʾān

Computing and the Qurʾān

509

ANNEX I Archigraphemic transliteration scheme for Arabic

Arabic ء ا آ أ إ ٱ

archigraphemic A

بـ ب تـ ت ثـ ث نـ يـ ئـ ن ي ئ ىY N B

د ذ ر ز جـ ج حـ ح خـ خG R D

سـ س شـ ش صـ ص ضـ ض طـ ط ظـ ظ

T C S

ـق ق ف فـ ـ غغ ـ عع Q F E

كـ ك لـ ل مـ م هــ ه ة و ؤW H M L K

Page 17: Computing and the Qurʾān

Thomas Milo

510

ANNEX II The archigraphemic transliteration scheme in practice

Above let: Close-up of fragment from Ṣanʿāʾ manuscript A2-15-15. Most of the script grammatical examples are taken from this manuscript.

Below: Text with full paedagogical taǧwīd markings, in the recension of Ḥafṣ ʿan ʿĀṣim as published in the 1924 Cairo Qurʾān (page 366, lines 1-5, Q17:12-14). he text appearing on the fragment above is marked in black.his version is typeset with the computer model of nasḫ competence in classic Ottoman performance made by the author as member of the DecoType team.

Above right: Archigraphemic transliteration with second-generation plēnē spelling with alif marked as a dimmed letter a.

Below: Fragment of the author’s archigraphemic reduction of the complete Cairo 1924 Qurʾān, in the Ḥafṣ ʿan ʿĀṣim recension. It results in “pre-hamzä” spelling throughout, i.e., with alif still in its original role of representing glottal stop, in addition the function of marking tanwīn and plural forms (“otiose alef”).

�������� �� ��

��� �����

� �����

��� ���!

��"#$%��&'$��()* �$+� ,-./�

��0�1� �

234567��89)* :� $;�<-$=> �

? @2345�.

�ABCD#

�EF�!� GH

�C-$.�I)* :� a lbl w gelba a bh a lbhr mbcr h lbbbew a fcla mn r bkm

�;�J

5�.

�ABC

����

���� K?

�LM

NOPQ

�GR

��ST�U�!�

V�W 23XY.Z[ \]

�_*:� �!�

��$a

�b �

cZdI)* :� e

fgXYhi

���!

�jk

lBC

�D#

$%()* �!� w lbelmw a ed d a lsbbn w a lgsb w kl sy fclbh

mn ;� -$

op��%

�qi Lr

���� s

�+t u'

v Y.w

*x

�;�J

5�. �y��z

�9)*u� ���

J bZd�{ u�

�GH

�C|

�!� ١٢ �����$� -./�

���p$�� bfcbla w kl a bsn a lr mbh tbr h fy ebfh

�u��

�$���

:� ١٣ �

K���!

�N�Z[��'

<-.�

�;� J-�.$�p��BC

$�� 23���J 5$. ��

�� $;�

�� J ,5$.

$�p��)* :� ����!$�> s

�;��)* �

�����

,-��]��

� �!� w bgr g lh bw m a lfbmh kbba blfbh mbsw r a a fr a

JLM�

fg45$.

���

:� ��

��� ١٤ 23�-�.$�' bZd

\F

�BC�$�

�BC�i �� ��!

$��()* :� �BC bZ[

���p�%�(�

JLr��!����

�BC��

J 5$. ��

�� kbbk kfy bbfsk a lbw m elbk gsbba mn a hbd y

Note in the 1924 Cairo edition the seemingly random return to Urtext by replacing alif ṭawīlä with superscript alif. Ottoman Qurʾāns have alif ṭawīlä in most cases, whereas the inferred Urtext has none.Comparison with the Urtext reference model proves that this manuscript fragment is younger than the austere, archaic script suggests: its spelling uses alif ṭawīlä even in places where the editors of the 1924 Cairo Qurʾān removed it. In fact, in this manuscript fragment only the letter block blfbh is spelled in the archaic, oldest attested orthography. This text skeleton appears to be identical to that of Ottoman Qurʾāns.

-

-

-

-

-

Page 18: Computing and the Qurʾān

Computing and the Qurʾān

511

NOTES i Rippin 2000, internet search argument: rippin +e-texts ii Kerning is a technique to allow a letter block to extend into the white

areas above and below an adjacent letter block. iii Encyclopaedia of the Qurʾān, Brill, Leiden 2001 iv A grapheme is the smallest unambiguous unit in a writing system.

Ideally graphemes correspond to the plain text units of Unicode. In Arabic most of the graphemes correspond with a phoneme.

v With typography, and certainly computer typography, font defects have

influenced Arabic orthography. A case in point is the now widely seen practice of writing fatḥatān over alef instead of over the preceding letter that governs the fatḥatān. Another example is the disappearance of hamzä without chair – a frequently used letter in the Cairo and Medina Qurʾān editions that is not available in computer typography. The quick succession of different font techno-logies and changing encoding concepts have the unintentional result that different fonts may require different spellings for obtaining the same printed image. Notably, most fonts have problems with al-lāhu, ‘God’. While all contemporary Arab Qurʾān editions spell this word with a superscript fatḥä over šaddä, almost all fonts assume for the theograph a superscript alef over šaddä:

ALEF-FATHA-LAM-LAM-SHADDA-FATHA-

HEH-DAMMA

ALEF-FATHA-LAM-LAM-HEH-DAMMA

correct data structure, wrong image wrong data structure, wrong vowel

image

ا� ا���� For comparison, the correct image representing the above data

structures: complete vowels incomplete vowels

Page 19: Computing and the Qurʾān

Thomas Milo

512

A related phenomenon occurs where font technology does not handle

the combination of ligatures and vowels, forcing the users into

systematically misspelling even key words like the word al- islāmu ‘Islam’

and the word lā ‘no’ (and part of the word ‘Islam’) :

correct data structure,

wrong image wrong data structure,

approximate image

ما��� ا����م

For comparison, the correct image representing the above data

structures:

complete vowels incomplete and misplaced vowels

vi Without diacritic markers, early Arabic orthography becomes multi-

interpretable. In this kind of spelling the skeletons are not “defective”

graphemes, but valid archigraphemes. The majority of historic texts are

written with archigraphemes. Unicode does not yet have the data

structure to deal with archigraphemes and discrete markers as mea-

ningful text elements.

vii Even the name is problematic. See Nabil Safwat: “… these early scripts

were not known as Kufic, and indeed were not called Kufic. The city of

Kufa had almost nothing to do with the formation of these scripts and

Thanoon (quoting Yousuf Thanoun, Old and New in the Origin of Ara-bic Script and its Development in Various Ages in Al-Mawrid, a Quar-

terly Journal of Culture And Heritage, Vol 15, nr 4, Ministry of Culture

and Information, Baghdad 1986) argued that the term Kufic betrayed

dated knowledge (italics by TM) of Islamic calligraphy. (Nabil Safwat,

Page 20: Computing and the Qurʾān

Computing and the Qurʾān

513

The Harmony of Letters, Islamic Calligraphy from the Tareq Rajab Mu-seum, Kuwait 1977).

viii Private communication of Gerd-Rüdiger Puin. ix As for Arab sources, Schimmel 1990 on page 3, writes “The incoherent

statements found in Arabic and Persian sources are difficult to entangle”. x Günter Lüling 1974, page 381, bases a key argument on the unfounded

claim that the unpointed and therefore archigraphemic letter blocks (rasm) underlying the Arabic words tisʿä “nine” and sabʿä “seven” are exactly identical. This routine assumption has never been proven and contradicts the findings of this essay. It is therefore interesting that the opponents of the resulting radically different reading of Q74:30 (ʿalayhā sabʿäta aʿšuri-n “on it seven gates[of hell]” instead of ʿalayhā tisʿäta

ʿašara “over it [are] nineteen [guardian angels]” never pointed out that no manuscript evidence exists in support of this theory. All manuscripts meticulously execute Arabic script grammar to disambiguate such text skeletons. On the other hand, proponents of this approach overlook the implications of Lüling’s argument: that the ambiguity apparently must have existed in a much earlier, as yet unattested phase of the emerging text when this aspect of Arabic scrip grammar did not yet exist.

xi Page 26 of Abbott 1939 discusses the letter block LLH in general terms of the underlying script grammar, but she calls it the treatment of the word Allāh. On the other hand, in the Qurʾān the unconnected letter block LLH occurs exclusively as part of the spelling of al-lāhu. However, the let-ter block LLH does not behave differently than other, enclosed groups of LLH occurring in this text such as, e.g., LCLLH /ḍ-ḍalāläta/, BCLLH

/yuḍlilhu/. xii De Saussure 1916, page 30. xiii See for instance John Lions 1968: “Although a particular alphabet or a

particular syllabary may be more suitable for certain languages than for others, there is no correlation between the general structure of different

Page 21: Computing and the Qurʾān

Thomas Milo

514

spoken languages and the type of writing-system used to represent

them.”

xiv Mitchell 1951, page 2: “It is a curious fact that students of Arabic have in

the past strangely neglected those elements of grammar without which

there would be no grammar, viz. the letters. The infrequency with which

one encounters European scholars having a knowledge of the Arabic

script has often been observed, but we may go further and say that the

number of those who write Arabic in an acceptable manner is remarka-

bly small.”

xv Déroche 1992, page 16.

xvi Page 26 of Nabia Abbott 1939 mentions multitudinous and complex

rules regarding mašq, the stretching of the connecting stroke. This ob-

servation is underscored with sample rules that are a valuable contribu-

tion to our knowledge. But in order to get a complete picture, such rules

as provided by this publication need to be supplemented by all the other

rules that make up the script grammar in question.

xvii Déroche 1992, page 16 writes: “the descriptions should merely draw the

reader’s attention to the salient features of the script.”

xviii The script tables in Déroche 1992, e.g., on page 38, structurally omit kāf

and dāl. xix Also modern calligraphers tackling kūfi can overlook this positional

distinctive feature. In his analysis below, the kāf is correctly encircled in R BK (rabbika) and BSBKBR W N (yustakbirūna), however in the last exam-ple erroneously a dāl is identified as a kāf in EBA D BH (ʿibādatihi):

Page 22: Computing and the Qurʾān

Computing and the Qurʾān

515

(Taken from Arabic Calligraphy Instruction, The letter Kaf in Kufi scripts. http://www.sakkal.com/instrctn/Kaf01.gif http://www.sakkal.com/instrctn/Kufi_Kaf.html) A possible cause for this confusion is that in the second generation scripts, besides the modern shortened kāf, a stretched kāf is available as a calligraphic alternative. This surviving ‘kūfi’ kāf is based on the shape shared by non-final kāf and dāl in early Arabic, but since it no longer needs a contrastive opposition with dāl, the use of the vertical bar is not known. The apparent reluctance among later calligraphers to use the stretched kāf in final position may be related to this. Like the theograph, stretched kāf, too, can be considered a “kufism”.

xx An Arabic font is an industrial product designed to enable handling

Arabic with technology that is not designed for Arabic. In the design process, the structure and appearance of Arabic script and orthography can be changed for technical and esthetic reasons. The resulting font is a cultural innovation:

بتثبیتین بتـــثبیتینـــــــ xxi Arabic script synthesis is a scientific method to analyze and synthesize

traditional calligraphic styles and time-proven typesetting systems. In this approach the integrity of Arabic script needs to be preserved when it is reproduced in digital form in order to verify the accuracy of the analysis. The result is not a font but a computer model of a script: