Skip to content

Latest commit

 

History

History
452 lines (412 loc) · 41.3 KB

character-tables-devanagari.md

File metadata and controls

452 lines (412 loc) · 41.3 KB

Devanagari character tables

This document lists the per-character shaping information needed to shape Devanagari text.

Table of Contents

Devanagari character table

Devanagari glyphs should be classified as in the following table. Codepoints in the Devanagari block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+0900 Mark [Mn] BINDU TOP_POSITION ऀ Inverted Candrabindu
U+0901 Mark [Mn] BINDU TOP_POSITION ँ Candrabindu
U+0902 Mark [Mn] BINDU TOP_POSITION ं Anusvara
U+0903 Mark [Mc] VISARGA RIGHT_POSITION ः Visarga
U+0904 Letter VOWEL_INDEPENDENT null ऄ Short A
U+0905 Letter VOWEL_INDEPENDENT null अ A
U+0906 Letter VOWEL_INDEPENDENT null आ Aa
U+0907 Letter VOWEL_INDEPENDENT null इ I
U+0908 Letter VOWEL_INDEPENDENT null ई Ii
U+0909 Letter VOWEL_INDEPENDENT null उ U
U+090A Letter VOWEL_INDEPENDENT null ऊ Uu
U+090B Letter VOWEL_INDEPENDENT null ऋ Vocalic R
U+090C Letter VOWEL_INDEPENDENT null ऌ Vocalic L
U+090D Letter VOWEL_INDEPENDENT null ऍ Candra E
U+090E Letter VOWEL_INDEPENDENT null ऎ Short E
U+090F Letter VOWEL_INDEPENDENT null ए E
U+0910 Letter VOWEL_INDEPENDENT null ऐ Ai
U+0911 Letter VOWEL_INDEPENDENT null ऑ Candra O
U+0912 Letter VOWEL_INDEPENDENT null ऒ Short O
U+0913 Letter VOWEL_INDEPENDENT null ओ O
U+0914 Letter VOWEL_INDEPENDENT null औ Au
U+0915 Letter CONSONANT null क Ka
U+0916 Letter CONSONANT null ख Kha
U+0917 Letter CONSONANT null ग Ga
U+0918 Letter CONSONANT null घ Gha
U+0919 Letter CONSONANT null ङ Nga
U+091A Letter CONSONANT null च Ca
U+091B Letter CONSONANT null छ Cha
U+091C Letter CONSONANT null ज Ja
U+091D Letter CONSONANT null झ Jha
U+091E Letter CONSONANT null ञ Nya
U+091F Letter CONSONANT null ट Tta
U+0920 Letter CONSONANT null ठ Ttha
U+0921 Letter CONSONANT null ड Dda
U+0922 Letter CONSONANT null ढ Ddha
U+0923 Letter CONSONANT null ण Nna
U+0924 Letter CONSONANT null त Ta
U+0925 Letter CONSONANT null थ Tha
U+0926 Letter CONSONANT null द Da
U+0927 Letter CONSONANT null ध Dha
U+0928 Letter CONSONANT null न Na
U+0929 Letter CONSONANT null ऩ Nnna
U+092A Letter CONSONANT null प Pa
U+092B Letter CONSONANT null फ Pha
U+092C Letter CONSONANT null ब Ba
U+092D Letter CONSONANT null भ Bha
U+092E Letter CONSONANT null म Ma
U+092F Letter CONSONANT null य Ya
U+0930 Letter CONSONANT null र Ra
U+0931 Letter CONSONANT null ऱ Rra
U+0932 Letter CONSONANT null ल La
U+0933 Letter CONSONANT null ळ Lla
U+0934 Letter CONSONANT null ऴ Llla
U+0935 Letter CONSONANT null व Va
U+0936 Letter CONSONANT null श Sha
U+0937 Letter CONSONANT null ष Ssa
U+0938 Letter CONSONANT null स Sa
U+0939 Letter CONSONANT null ह Ha
U+093A Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ऺ Sign Oe
U+093B Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ऻ Sign Ooe
U+093C Mark [Mn] NUKTA BOTTOM_POSITION ़ Nukta
U+093D Letter AVAGRAHA null ऽ Avagraha
U+093E Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ा Sign Aa
U+093F Mark [Mc] VOWEL_DEPENDENT LEFT_POSITION ि Sign I
U+0940 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ी Sign Ii
U+0941 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ु Sign U
U+0942 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ू Sign Uu
U+0943 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ृ Sign Vocalic R
U+0944 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ॄ Sign Vocalic Rr
U+0945 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ॅ Sign Candra E
U+0946 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ॆ Sign Short E
U+0947 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION े Sign E
U+0948 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ै Sign Ai
U+0949 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ॉ Sign Candra O
U+094A Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ॊ Sign Short O
U+094B Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ो Sign O
U+094C Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ौ Sign Au
U+094D Mark [Mn] VIRAMA BOTTOM_POSITION ् Virama
U+094E Mark [Mc] VOWEL_DEPENDENT LEFT_POSITION ॎ Sign Prishthamatra E
U+094F Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ॏ Sign Aw
U+0950 Mark [Mc] null null ॐ Om
U+0951 Mark [Mn] CANTILLATION TOP_POSITION ॑ Udatta
U+0952 Mark [Mn] CANTILLATION BOTTOM_POSITION ॒ Anudatta
U+0953 Mark [Mn] SYLLABLE_MODIFIER TOP_POSITION ॓ Grave accent
U+0954 Mark [Mn] SYLLABLE_MODIFIER TOP_POSITION ॔ Acute accent
U+0955 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ॕ Sign Candra Long E
U+0956 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ॖ Sign Ue
U+0957 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ॗ Sign Uue
U+0958 Letter CONSONANT null क़ Qa
U+0959 Letter CONSONANT null ख़ Khha
U+095A Letter CONSONANT null ग़ Ghha
U+095B Letter CONSONANT null ज़ Za
U+095C Letter CONSONANT null ड़ Dddha
U+095D Letter CONSONANT null ढ़ Rha
U+095E Letter CONSONANT null फ़ Fa
U+095F Letter CONSONANT null य़ Yya
U+0960 Letter VOWEL_INDEPENDENT null ॠ Vocalic Rr
U+0961 Letter VOWEL_INDEPENDENT null ॡ Vocalic Ll
U+0962 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ॢ Sign Vocalic L
U+0963 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ॣ Sign Vocalic Ll
U+0964 Punctuation null null । Danda
U+0965 Punctuation null null ॥ Double Danda
U+0966 Number NUMBER null ० Digit Zero
U+0967 Number NUMBER null १ Digit One
U+0968 Number NUMBER null २ Digit Two
U+0969 Number NUMBER null ३ Digit Three
U+096A Number NUMBER null ४ Digit Four
U+096B Number NUMBER null ५ Digit Five
U+096C Number NUMBER null ६ Digit Six
U+096D Number NUMBER null ७ Digit Seven
U+096E Number NUMBER null ८ Digit Eight
U+096F Number NUMBER null ९ Digit Nine
U+0970 Punctuation null null ॰ Abbreviation Sign
U+0971 Punctuation null null ॱ Sign High Spacing Dot
U+0972 Letter VOWEL_INDEPENDENT null ॲ Candra Aa
U+0973 Letter VOWEL_INDEPENDENT null ॳ Oe
U+0974 Letter VOWEL_INDEPENDENT null ॴ Ooe
U+0975 Letter VOWEL_INDEPENDENT null ॵ Aw
U+0976 Letter VOWEL_INDEPENDENT null ॶ Ue
U+0977 Letter VOWEL_INDEPENDENT null ॷ Uue
U+0978 Letter CONSONANT null ॸ Marwari Dda
U+0979 Letter CONSONANT null ॹ Zha
U+097A Letter CONSONANT null ॺ Heavy Ya
U+097B Letter CONSONANT null ॻ Gga
U+097C Letter CONSONANT null ॼ Jja
U+097D Letter CONSONANT null ॽ Glottal Stop
U+097E Letter CONSONANT null ॾ Ddda
U+097F Letter CONSONANT null ॿ Bba

Devanagari Extended character table

Note: the cantillation marks of the "combining consonant" variety in the Devanagari Extended block are not considered consonants for shaping purposes (including syllable identification, the determination of the base consonant, or positioning "Reph").

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+A8E0 Mark [Mn] CANTILLATION TOP_POSITION ꣠ Combining Zero
U+A8E1 Mark [Mn] CANTILLATION TOP_POSITION ꣡ Combining One
U+A8E2 Mark [Mn] CANTILLATION TOP_POSITION ꣢ Combining Two
U+A8E3 Mark [Mn] CANTILLATION TOP_POSITION ꣣ Combining Three
U+A8E4 Mark [Mn] CANTILLATION TOP_POSITION ꣤ Combining Four
U+A8E5 Mark [Mn] CANTILLATION TOP_POSITION ꣥ Combining Five
U+A8E6 Mark [Mn] CANTILLATION TOP_POSITION ꣦ Combining Six
U+A8E7 Mark [Mn] CANTILLATION TOP_POSITION ꣧ Combining Seven
U+A8E8 Mark [Mn] CANTILLATION TOP_POSITION ꣨ Combining Eight
U+A8E9 Mark [Mn] CANTILLATION TOP_POSITION ꣩ Combining Nine
U+A8EA Mark [Mn] CANTILLATION TOP_POSITION ꣪ Combining A
U+A8EB Mark [Mn] CANTILLATION TOP_POSITION ꣫ Combining U
U+A8EC Mark [Mn] CANTILLATION TOP_POSITION ꣬ Combining Ka
U+A8ED Mark [Mn] CANTILLATION TOP_POSITION ꣭ Combining Na
U+A8EE Mark [Mn] CANTILLATION TOP_POSITION ꣮ Combining Pa
U+A8EF Mark [Mn] CANTILLATION TOP_POSITION ꣯ Combining Ra
U+A8F0 Mark [Mn] CANTILLATION TOP_POSITION ꣰ Combining Vi
U+A8F1 Mark [Mn] CANTILLATION TOP_POSITION ꣱ Combining Avagraha
U+A8F2 Letter SYMBOL null ꣲ Spacing Candrabindu
U+A8F3 Letter BINDU null ꣳ Candrabindu Virama
U+A8F4 Letter null null ꣴ Double Candrabindu Virama
U+A8F5 Letter null null ꣵ Candrabindu Two
U+A8F6 Letter null null ꣶ Candrabindu Three
U+A8F7 Letter SYMBOL null ꣷ Candrabindu Avagraha
U+A8F8 Punctuation null null ꣸ Pushpika
U+A8F9 Punctuation null null ꣹ Gap Filler
U+A8FA Punctuation null null ꣺ Caret
U+A8FB Letter null null ꣻ Headstroke
U+A8FC Punctuation null null ꣼ Siddham
U+A8FD Letter null null ꣽ Jain Om
U+A8FE Letter VOWEL_INDEPENDENT null ꣾ Ay
U+A8FF Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ꣿ Sign Ay

Devanagari Extended-A character table

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+11B00 Punctuation null null 𑬀 Head Mark
U+11B01 Punctuation null null 𑬁 Head Mark With Headstroke
U+11B02 Punctuation null null 𑬂 Sign Bhale
U+11B03 Punctuation null null 𑬃 Sign Bhale With Hook
U+11B04 Punctuation null null 𑬄 Sign Extended Bhale
U+11B05 Punctuation null null 𑬅 Sign Extended Bhale With Hook
U+11B06 Punctuation null null 𑬆 Sign Western Five-like Bhale
U+11B07 Punctuation null null 𑬇 Sign Western Nine-like Bhale
U+11B08 Punctuation null null 𑬈 Sign Reversed Nine-like Bhale
U+11B09 Punctuation null null 𑬉 Sign Mindu
U+11B0A unassigned
U+11B0B unassigned
U+11B0C unassigned
U+11B0D unassigned
U+11B0E unassigned
U+11B0F unassigned
U+11B10 unassigned
U+11B11 unassigned
U+11B12 unassigned
U+11B13 unassigned
U+11B14 unassigned
U+11B15 unassigned
U+11B16 unassigned
U+11B17 unassigned
U+11B18 unassigned
U+11B19 unassigned
U+11B1A unassigned
U+11B1B unassigned
U+11B1C unassigned
U+11B1D unassigned
U+11B1E unassigned
U+11B1F unassigned
U+11B20 unassigned
U+11B21 unassigned
U+11B22 unassigned
U+11B23 unassigned
U+11B24 unassigned
U+11B25 unassigned
U+11B26 unassigned
U+11B27 unassigned
U+11B28 unassigned
U+11B29 unassigned
U+11B2A unassigned
U+11B2B unassigned
U+11B2C unassigned
U+11B2D unassigned
U+11B2E unassigned
U+11B2F unassigned
U+11B30 unassigned
U+11B31 unassigned
U+11B32 unassigned
U+11B33 unassigned
U+11B34 unassigned
U+11B35 unassigned
U+11B36 unassigned
U+11B37 unassigned
U+11B38 unassigned
U+11B39 unassigned
U+11B3A unassigned
U+11B3B unassigned
U+11B3C unassigned
U+11B3D unassigned
U+11B3E unassigned
U+11B3F unassigned
U+11B40 unassigned
U+11B41 unassigned
U+11B42 unassigned
U+11B43 unassigned
U+11B44 unassigned
U+11B45 unassigned
U+11B46 unassigned
U+11B47 unassigned
U+11B48 unassigned
U+11B49 unassigned
U+11B4A unassigned
U+11B4B unassigned
U+11B4C unassigned
U+11B4D unassigned
U+11B4E unassigned
U+11B4F unassigned
U+11B50 unassigned
U+11B51 unassigned
U+11B52 unassigned
U+11B53 unassigned
U+11B54 unassigned
U+11B55 unassigned
U+11B56 unassigned
U+11B57 unassigned
U+11B58 unassigned
U+11B59 unassigned
U+11B5A unassigned
U+11B5B unassigned
U+11B5C unassigned
U+11B5D unassigned
U+11B5E unassigned
U+11B5F unassigned

Vedic Extensions character table

Sanskrit runs written in the Devanagari script may also include characters from the Vedic Extensions block. These characters should be classified as follows.

Note: See the Vedic Extensions document for additional information.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+1CD0 Mark [Mn] CANTILLATION TOP_POSITION ᳐ Tone Karshana
U+1CD1 Mark [Mn] CANTILLATION TOP_POSITION ᳑ Tone Shara
U+1CD2 Mark [Mn] CANTILLATION TOP_POSITION ᳒ Tone Prenkha
U+1CD3 Punctuation null null ᳓ Sign Nihshvasa
U+1CD4 Mark [Mn] CANTILLATION OVERSTRUCK ᳔ Tone Midline Svarita
U+1CD5 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳕ Tone Aggravated Independent Svarita
U+1CD6 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳖ Tone Independent Svarita
U+1CD7 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳗ Tone Kathaka Independent Svarita
U+1CD8 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳘ Tone Candra Below
U+1CD9 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳙ Tone Kathaka Independent Svarita Schroeder
U+1CDA Mark [Mn] CANTILLATION TOP_POSITION ᳚ Tone Double Svarita
U+1CDB Mark [Mn] CANTILLATION TOP_POSITION ᳛ Tone Triple Svarita
U+1CDC Mark [Mn] CANTILLATION BOTTOM_POSITION ᳜ Tone Kathaka Anudatta
U+1CDD Mark [Mn] CANTILLATION BOTTOM_POSITION ᳝ Tone Dot Below
U+1CDE Mark [Mn] CANTILLATION BOTTOM_POSITION ᳞ Tone Two Dots Below
U+1CDF Mark [Mn] CANTILLATION BOTTOM_POSITION ᳟ Tone Three Dots Below
U+1CE0 Mark [Mn] CANTILLATION TOP_POSITION ᳠ Tone Rigvedic Kashmiri Independent Svarita
U+1CE1 Mark [Mc] CANTILLATION RIGHT_POSITION ᳡ Tone Atharavedic Independent Svarita
U+1CE2 Mark [Mn] AVAGRAHA OVERSTRUCK ᳢ Sign Visarga Svarita
U+1CE3 Mark [Mn] null OVERSTRUCK ᳣ Sign Visarga Udatta
U+1CE4 Mark [Mn] null OVERSTRUCK ᳤ Sign Reversed Visarga Udatta
U+1CE5 Mark [Mn] null OVERSTRUCK ᳥ Sign Visarga Anudatta
U+1CE6 Mark [Mn] null OVERSTRUCK ᳦ Sign Reversed Visarga Anudatta
U+1CE7 Mark [Mn] null OVERSTRUCK ᳧ Sign Visarga Udatta With Tail
U+1CE8 Mark [Mn] AVAGRAHA OVERSTRUCK ᳨ Sign Visarga Anudatta With Tail
U+1CE9 Letter SYMBOL null ᳩ Sign Anusvara Antargomukha
U+1CEA Letter null null ᳪ Sign Anusvara Bahirgomukha
U+1CEB Letter null null ᳫ Sign Anusvara Vamagomukha
U+1CEC Letter SYMBOL null ᳬ Sign Anusvara Vamagomukha With Tail
U+1CED Mark [Mn] AVAGRAHA BOTTOM_POSITION ᳭ Sign Tiryak
U+1CEE Letter SYMBOL null ᳮ Sign Hexiform Long Anusvara
U+1CEF Letter null null ᳯ Sign Long Anusvara
U+1CF0 Letter null null ᳰ Sign Rthang Long Anusvara
U+1CF2 Letter CONSONANT_DEAD null ᳲ Sign Ardhavisarga
U+1CF3 Letter CONSONANT_DEAD null ᳳ Sign Rotated Ardhavisarga
U+1CF3 Mark [Mc] VISARGA null ᳳ Sign Rotated Ardhavisarga
U+1CF4 Mark [Mn] CANTILLATION TOP_POSITION ᳴ Tone Candra Above
U+1CF5 Letter CONSONANT_WITH_STACKER null ᳵ Sign Jihvamuliya
U+1CF6 Letter CONSONANT_WITH_STACKER null ᳶ Sign Upadhmaniya
U+1CF7 Mark [Mc] null null ᳷ Sign Atikrama
U+1CF8 Mark [Mn] CANTILLATION null ᳸ Tone Ring Above
U+1CF9 Mark [Mn] CANTILLATION null ᳹ Tone Double Ring Above
U+1CFA Letter PLACEHOLDER null ᳺ Sign Double Anusvara Antargomukha
U+1CFB unassigned
U+1CFC unassigned
U+1CFD unassigned
U+1CFE unassigned
U+1CFF unassigned

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Devanagari text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+00A0 Separator PLACEHOLDER null   No-break space
U+200C Other NON_JOINER null ‌ Zero-width non-joiner
U+200D Other JOINER null ‍ Zero-width joiner
U+2010 Punctuation PLACEHOLDER null ‐ Hyphen
U+2011 Punctuation PLACEHOLDER null ‑ No-break hyphen
U+2012 Punctuation PLACEHOLDER null ‒ Figure dash
U+2013 Punctuation PLACEHOLDER null – En dash
U+2014 Punctuation PLACEHOLDER null — Em dash
U+25CC Symbol DOTTED_CIRCLE null ◌ Dotted circle

The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a "Consonant,Halant,Consonant" sequence. The sequence "Consonant,Halant,ZWJ,Consonant" blocks the formation of a conjunct between the two consonants.

Note, however, that the "Consonant,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence "Consonant,Halant,ZWNJ,Consonant" should produce the first consonant in its standard form, followed by an explicit "Halant".

A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would.

The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,Consonant", "NBSP,mark", or "NBSP,matra".