Skip to content

Latest commit

 

History

History
308 lines (275 loc) · 26.2 KB

character-tables-gurmukhi.md

File metadata and controls

308 lines (275 loc) · 26.2 KB

Gurmukhi character tables

This document lists the per-character shaping information needed to shape Gurmukhi text.

Table of Contents

Gurmukhi character table

Gurmukhi glyphs should be classified as in the following table. Codepoints in the Gurmukhi block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+0A00 unassigned
U+0A01 Mark [Mn] BINDU TOP_POSITION ਁ Adak Bindi
U+0A02 Mark [Mn] BINDU TOP_POSITION ਂ Bindi
U+0A03 Mark [Mc] VISARGA RIGHT_POSITION ਃ Visarga
U+0A04 unassigned
U+0A05 Letter VOWEL_INDEPENDENT null ਅ A
U+0A06 Letter VOWEL_INDEPENDENT null ਆ Aa
U+0A07 Letter VOWEL_INDEPENDENT null ਇ I
U+0A08 Letter VOWEL_INDEPENDENT null ਈ Ii
U+0A09 Letter VOWEL_INDEPENDENT null ਉ U
U+0A0A Letter VOWEL_INDEPENDENT null ਊ Uu
U+0A0B unassigned
U+0A0C unassigned
U+0A0D unassigned
U+0A0E unassigned
U+0A0F Letter VOWEL_INDEPENDENT null ਏ Ee
U+0A10 Letter VOWEL_INDEPENDENT null ਐ Ai
U+0A11 unassigned
U+0A12 unassigned
U+0A13 Letter VOWEL_INDEPENDENT null ਓ Oo
U+0A14 Letter VOWEL_INDEPENDENT null ਔ Au
U+0A15 Letter CONSONANT null ਕ Ka
U+0A16 Letter CONSONANT null ਖ Kha
U+0A17 Letter CONSONANT null ਗ Ga
U+0A18 Letter CONSONANT null ਘ Gha
U+0A19 Letter CONSONANT null ਙ Nga
U+0A1A Letter CONSONANT null ਚ Ca
U+0A1B Letter CONSONANT null ਛ Cha
U+0A1C Letter CONSONANT null ਜ Ja
U+0A1D Letter CONSONANT null ਝ Jha
U+0A1E Letter CONSONANT null ਞ Nya
U+0A1F Letter CONSONANT null ਟ Tta
U+0A20 Letter CONSONANT null ਠ Ttha
U+0A21 Letter CONSONANT null ਡ Dda
U+0A22 Letter CONSONANT null ਢ Ddha
U+0A23 Letter CONSONANT null ਣ Nna
U+0A24 Letter CONSONANT null ਤ Ta
U+0A25 Letter CONSONANT null ਥ Tha
U+0A26 Letter CONSONANT null ਦ Da
U+0A27 Letter CONSONANT null ਧ Dha
U+0A28 Letter CONSONANT null ਨ Na
U+0A29 unassigned
U+0A2A Letter CONSONANT null ਪ Pa
U+0A2B Letter CONSONANT null ਫ Pha
U+0A2C Letter CONSONANT null ਬ Ba
U+0A2D Letter CONSONANT null ਭ Bha
U+0A2E Letter CONSONANT null ਮ Ma
U+0A2F Letter CONSONANT null ਯ Ya
U+0A30 Letter CONSONANT null ਰ Ra
U+0A31 unassigned
U+0A32 Letter CONSONANT null ਲ La
U+0A33 Letter CONSONANT null ਲ਼ Lla
U+0A34 unassigned
U+0A35 Letter CONSONANT null ਵ Va
U+0A36 Letter CONSONANT null ਸ਼ Sha
U+0A37 unassigned
U+0A38 Letter CONSONANT null ਸ Sa
U+0A39 Letter CONSONANT null ਹ Ha
U+0A3A unassigned
U+0A3B unassigned
U+0A3C Mark [Mn] NUKTA BOTTOM_POSITION ਼ Nukta
U+0A3D unassigned
U+0A3E Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ਾ Sign Aa
U+0A3F Mark [Mc] VOWEL_DEPENDENT LEFT_POSITION ਿ Sign I
U+0A40 Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ੀ Sign Ii
U+0A41 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ੁ Sign U
U+0A42 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ੂ Sign Uu
U+0A43 unassigned
U+0A44 unassigned
U+0A45 unassigned
U+0A46 unassigned
U+0A47 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ੇ Sign Ee
U+0A48 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ੈ Sign Ai
U+0A49 unassigned
U+0A4A unassigned
U+0A4B Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ੋ Sign Oo
U+0A4C Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ੌ Sign Au
U+0A4D Mark [Mn] VIRAMA BOTTOM_POSITION ੍ Virama
U+0A4E unassigned
U+0A4F unassigned
U+0A50 unassigned
U+0A51 Mark [Mn] CANTILLATION null ੑ Udaat
U+0A52 unassigned
U+0A53 unassigned
U+0A54 unassigned
U+0A55 unassigned
U+0A56 unassigned
U+0A57 unassigned
U+0A58 unassigned
U+0A59 Letter CONSONANT null ਖ਼ Khha
U+0A5A Letter CONSONANT null ਗ਼ Ghha
U+0A5B Letter CONSONANT null ਜ਼ Za
U+0A5C Letter CONSONANT null ੜ Rra
U+0A5D unassigned
U+0A5E Letter CONSONANT null ਫ਼ Fa
U+0A5F unassigned
U+0A60 unassigned
U+0A61 unassigned
U+0A62 unassigned
U+0A63 unassigned
U+0A64 unassigned
U+0A65 unassigned
U+0A66 Number NUMBER null ੦ Digit Zero
U+0A67 Number NUMBER null ੧ Digit One
U+0A68 Number NUMBER null ੨ Digit Two
U+0A69 Number NUMBER null ੩ Digit Three
U+0A6A Number NUMBER null ੪ Digit Four
U+0A6B Number NUMBER null ੫ Digit Five
U+0A6C Number NUMBER null ੬ Digit Six
U+0A6D Number NUMBER null ੭ Digit Seven
U+0A6E Number NUMBER null ੮ Digit Eight
U+0A6F Number NUMBER null ੯ Digit Nine
U+0A70 Mark [Mn] BINDU TOP_POSITION ੰ Tippi
U+0A71 Mark [Mn] GEMINATION_MARK TOP_POSITION ੱ Addak
U+0A72 Letter CONSONANT null ੲ Iri
U+0A73 Letter CONSONANT null ੳ Ura
U+0A74 Letter null null ੴ Ek Onkar
U+0A75 Mark [Mn] CONSONANT_MEDIAL BOTTOM_POSITION ੵ Yakash
U+0A76 Punctuation null null ੶ Abbreviation Sign
U+0A77 unassigned
U+0A78 unassigned
U+0A79 unassigned
U+0A7A unassigned
U+0A7B unassigned
U+0A7C unassigned
U+0A7D unassigned
U+0A7E unassigned
U+0A7F unassigned

Vedic Extensions character table

Sanskrit runs written in the Gurmukhi script may also include characters from the Vedic Extensions block. These characters should be classified as follows.

Note: See the Vedic Extensions document for additional information.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+1CD0 Mark [Mn] CANTILLATION TOP_POSITION ᳐ Tone Karshana
U+1CD1 Mark [Mn] CANTILLATION TOP_POSITION ᳑ Tone Shara
U+1CD2 Mark [Mn] CANTILLATION TOP_POSITION ᳒ Tone Prenkha
U+1CD3 Punctuation null null ᳓ Sign Nihshvasa
U+1CD4 Mark [Mn] CANTILLATION OVERSTRUCK ᳔ Tone Midline Svarita
U+1CD5 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳕ Tone Aggravated Independent Svarita
U+1CD6 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳖ Tone Independent Svarita
U+1CD7 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳗ Tone Kathaka Independent Svarita
U+1CD8 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳘ Tone Candra Below
U+1CD9 Mark [Mn] CANTILLATION BOTTOM_POSITION ᳙ Tone Kathaka Independent Svarita Schroeder
U+1CDA Mark [Mn] CANTILLATION TOP_POSITION ᳚ Tone Double Svarita
U+1CDB Mark [Mn] CANTILLATION TOP_POSITION ᳛ Tone Triple Svarita
U+1CDC Mark [Mn] CANTILLATION BOTTOM_POSITION ᳜ Tone Kathaka Anudatta
U+1CDD Mark [Mn] CANTILLATION BOTTOM_POSITION ᳝ Tone Dot Below
U+1CDE Mark [Mn] CANTILLATION BOTTOM_POSITION ᳞ Tone Two Dots Below
U+1CDF Mark [Mn] CANTILLATION BOTTOM_POSITION ᳟ Tone Three Dots Below
U+1CE0 Mark [Mn] CANTILLATION TOP_POSITION ᳠ Tone Rigvedic Kashmiri Independent Svarita
U+1CE1 Mark [Mc] CANTILLATION RIGHT_POSITION ᳡ Tone Atharavedic Independent Svarita
U+1CE2 Mark [Mn] AVAGRAHA OVERSTRUCK ᳢ Sign Visarga Svarita
U+1CE3 Mark [Mn] null OVERSTRUCK ᳣ Sign Visarga Udatta
U+1CE4 Mark [Mn] null OVERSTRUCK ᳤ Sign Reversed Visarga Udatta
U+1CE5 Mark [Mn] null OVERSTRUCK ᳥ Sign Visarga Anudatta
U+1CE6 Mark [Mn] null OVERSTRUCK ᳦ Sign Reversed Visarga Anudatta
U+1CE7 Mark [Mn] null OVERSTRUCK ᳧ Sign Visarga Udatta With Tail
U+1CE8 Mark [Mn] AVAGRAHA OVERSTRUCK ᳨ Sign Visarga Anudatta With Tail
U+1CE9 Letter SYMBOL null ᳩ Sign Anusvara Antargomukha
U+1CEA Letter null null ᳪ Sign Anusvara Bahirgomukha
U+1CEB Letter null null ᳫ Sign Anusvara Vamagomukha
U+1CEC Letter SYMBOL null ᳬ Sign Anusvara Vamagomukha With Tail
U+1CED Mark [Mn] AVAGRAHA BOTTOM_POSITION ᳭ Sign Tiryak
U+1CEE Letter SYMBOL null ᳮ Sign Hexiform Long Anusvara
U+1CEF Letter null null ᳯ Sign Long Anusvara
U+1CF0 Letter null null ᳰ Sign Rthang Long Anusvara
U+1CF2 Letter CONSONANT_DEAD null ᳲ Sign Ardhavisarga
U+1CF3 Letter CONSONANT_DEAD null ᳳ Sign Rotated Ardhavisarga
U+1CF3 Mark [Mc] VISARGA null ᳳ Sign Rotated Ardhavisarga
U+1CF4 Mark [Mn] CANTILLATION TOP_POSITION ᳴ Tone Candra Above
U+1CF5 Letter CONSONANT_WITH_STACKER null ᳵ Sign Jihvamuliya
U+1CF6 Letter CONSONANT_WITH_STACKER null ᳶ Sign Upadhmaniya
U+1CF7 Mark [Mc] null null ᳷ Sign Atikrama
U+1CF8 Mark [Mn] CANTILLATION null ᳸ Tone Ring Above
U+1CF9 Mark [Mn] CANTILLATION null ᳹ Tone Double Ring Above
U+1CFA Letter PLACEHOLDER null ᳺ Sign Double Anusvara Antargomukha
U+1CFB unassigned
U+1CFC unassigned
U+1CFD unassigned
U+1CFE unassigned
U+1CFF unassigned

Miscellaneous character table

In addition to general punctuation, runs of Gurmukhi text often use the danda (U+0964) and double danda (U+0965) punctuation marks from the Devanagari block. Gurmukhi text can also incorporate the udatta (U+0951) and anudatta (U+0952) signs from the Devanagari block.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+0951 Mark [Mn] CANTILLATION TOP_POSITION ॑ Udatta
U+0952 Mark [Mn] CANTILLATION BOTTOM_POSITION ॒ Anudatta
U+0964 Punctuation null null । Danda
U+0965 Punctuation null null ॥ Double Danda

Other important characters that may be encountered when shaping runs of Gurmukhi text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+00A0 Separator PLACEHOLDER null   No-break space
U+200C Other NON_JOINER null ‌ Zero-width non-joiner
U+200D Other JOINER null ‍ Zero-width joiner
U+2010 Punctuation PLACEHOLDER null ‐ Hyphen
U+2011 Punctuation PLACEHOLDER null ‑ No-break hyphen
U+2012 Punctuation PLACEHOLDER null ‒ Figure dash
U+2013 Punctuation PLACEHOLDER null – En dash
U+2014 Punctuation PLACEHOLDER null — Em dash
U+25CC Symbol DOTTED_CIRCLE null ◌ Dotted circle

The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a "Consonant,Halant,Consonant" sequence. The sequence "Consonant,Halant,ZWJ,Consonant" blocks the formation of a conjunct between the two consonants.

Note, however, that the "Consonant,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence "Consonant,Halant,ZWNJ,Consonant" should produce the first consonant in its standard form, followed by an explicit "Halant".

A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would.

The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,Consonant", "NBSP,mark", or "NBSP,matra".