This document lists the per-character shaping information needed to shape Tamil text.
Table of Contents
- Tamil character table
- Tamil Supplement character table
- Grantha marks character table
- Vedic Extensions character table
- Miscellaneous character table
Tamil glyphs should be classified as in the following table. Codepoints in the Tamil block with no assigned meaning are designated as unassigned in the Unicode category column.
Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.
Note: the
NUMBER
andSYMBOL
Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.
The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.
Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+0B80 |
unassigned | |||
U+0B81 |
unassigned | |||
U+0B82 |
Mark [Mn] | BINDU | TOP_POSITION | ஂ Anusvara |
U+0B83 |
Letter | MODIFYING_LETTER | null | ஃ Visarga |
U+0B84 |
unassigned | |||
U+0B85 |
Letter | VOWEL_INDEPENDENT | null | அ A |
U+0B86 |
Letter | VOWEL_INDEPENDENT | null | ஆ Aa |
U+0B87 |
Letter | VOWEL_INDEPENDENT | null | இ I |
U+0B88 |
Letter | VOWEL_INDEPENDENT | null | ஈ Ii |
U+0B89 |
Letter | VOWEL_INDEPENDENT | null | உ U |
U+0B8A |
Letter | VOWEL_INDEPENDENT | null | ஊ Uu |
U+0B8B |
unassigned | |||
U+0B8C |
unassigned | |||
U+0B8D |
unassigned | |||
U+0B8E |
Letter | VOWEL_INDEPENDENT | null | எ E |
U+0B8F |
Letter | VOWEL_INDEPENDENT | null | ஏ Ee |
U+0B90 |
Letter | VOWEL_INDEPENDENT | null | ஐ Ai |
U+0B91 |
unassigned | |||
U+0B92 |
Letter | VOWEL_INDEPENDENT | null | ஒ O |
U+0B93 |
Letter | VOWEL_INDEPENDENT | null | ஓ Oo |
U+0B94 |
Letter | VOWEL_INDEPENDENT | null | ஔ Au |
U+0B95 |
Letter | CONSONANT | null | க Ka |
U+0B96 |
unassigned | |||
U+0B97 |
unassigned | |||
U+0B98 |
unassigned | |||
U+0B99 |
Letter | CONSONANT | null | ங Nga |
U+0B9A |
Letter | CONSONANT | null | ச Ca |
U+0B9B |
unassigned | |||
U+0B9C |
Letter | CONSONANT | null | ஜ Ja |
U+0B9D |
unassigned | |||
U+0B9E |
Letter | CONSONANT | null | ஞ Nya |
U+0B9F |
Letter | CONSONANT | null | ட Tta |
U+0BA0 |
unassigned | |||
U+0BA1 |
unassigned | |||
U+0BA2 |
unassigned | |||
U+0BA3 |
Letter | CONSONANT | null | ண Nna |
U+0BA4 |
Letter | CONSONANT | null | த Ta |
U+0BA5 |
unassigned | |||
U+0BA6 |
unassigned | |||
U+0BA7 |
unassigned | |||
U+0BA8 |
Letter | CONSONANT | null | ந Na |
U+0BA9 |
Letter | CONSONANT | null | ன Nnna |
U+0BAA |
Letter | CONSONANT | null | ப Pa |
U+0BAB |
unassigned | |||
U+0BAC |
unassigned | |||
U+0BAD |
unassigned | |||
U+0BAE |
Letter | CONSONANT | null | ம Ma |
U+0BAF |
Letter | CONSONANT | null | ய Ya |
U+0BB0 |
Letter | CONSONANT | null | ர Ra |
U+0BB1 |
Letter | CONSONANT | null | ற Rra |
U+0BB2 |
Letter | CONSONANT | null | ல La |
U+0BB3 |
Letter | CONSONANT | null | ள Lla |
U+0BB4 |
Letter | CONSONANT | null | ழ Llla |
U+0BB5 |
Letter | CONSONANT | null | வ Va |
U+0BB6 |
Letter | CONSONANT | null | ஶ Sha |
U+0BB7 |
Letter | CONSONANT | null | ஷ Ssa |
U+0BB8 |
Letter | CONSONANT | null | ஸ Sa |
U+0BB9 |
Letter | CONSONANT | null | ஹ Ha |
U+0BBA |
unassigned | |||
U+0BBB |
unassigned | |||
U+0BBC |
unassigned | |||
U+0BBD |
unassigned | |||
U+0BBE |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ா Sign Aa |
U+0BBF |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ி Sign I |
U+0BC0 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ீ Sign Ii |
U+0BC1 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ு Sign U |
U+0BC2 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ூ Sign Uu |
U+0BC3 |
unassigned | |||
U+0BC4 |
unassigned | |||
U+0BC5 |
unassigned | |||
U+0BC6 |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ெ Sign E |
U+0BC7 |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ே Sign Ee |
U+0BC8 |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ை Sign Ai |
U+0BC9 |
unassigned | |||
U+0BCA |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ொ Sign O |
U+0BCB |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ோ Sign Oo |
U+0BCC |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_AND_RIGHT_POSITION | ௌ Sign Au |
U+0BCD |
Mark [Mn] | VIRAMA | TOP_POSITION | ் Virama |
U+0BCE |
unassigned | |||
U+0BCF |
unassigned | |||
U+0BD0 |
Letter | null | null | ௐ Om |
U+0BD1 |
unassigned | |||
U+0BD2 |
unassigned | |||
U+0BD3 |
unassigned | |||
U+0BD4 |
unassigned | |||
U+0BD5 |
unassigned | |||
U+0BD6 |
unassigned | |||
U+0BD7 |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ௗ Au Length Mark |
U+0BD8 |
unassigned | |||
U+0BD9 |
unassigned | |||
U+0BDA |
unassigned | |||
U+0BDB |
unassigned | |||
U+0BDC |
unassigned | |||
U+0BDD |
unassigned | |||
U+0BDE |
unassigned | |||
U+0BDF |
unassigned | |||
U+0BE0 |
unassigned | |||
U+0BE1 |
unassigned | |||
U+0BE2 |
unassigned | |||
U+0BE3 |
unassigned | |||
U+0BE4 |
unassigned | |||
U+0BE5 |
unassigned | |||
U+0BE6 |
Number | NUMBER | null | ௦ Digit Zero |
U+0BE7 |
Number | NUMBER | null | ௧ Digit One |
U+0BE8 |
Number | NUMBER | null | ௨ Digit Two |
U+0BE9 |
Number | NUMBER | null | ௩ Digit Three |
U+0BEA |
Number | NUMBER | null | ௪ Digit Four |
U+0BEB |
Number | NUMBER | null | ௫ Digit Five |
U+0BEC |
Number | NUMBER | null | ௬ Digit Six |
U+0BED |
Number | NUMBER | null | ௭ Digit Seven |
U+0BEE |
Number | NUMBER | null | ௮ Digit Eight |
U+0BEF |
Number | NUMBER | null | ௯ Digit Nine |
U+0BF0 |
Number | NUMBER | null | ௰ Number Ten |
U+0BF1 |
Number | NUMBER | null | ௱ Number One Hundred |
U+0BF2 |
Number | NUMBER | null | ௲ Number One Thousand |
U+0BF3 |
Symbol | SYMBOL | null | ௳ Day Sign |
U+0BF4 |
Symbol | SYMBOL | null | ௴ Month Sign |
U+0BF5 |
Symbol | SYMBOL | null | ௵ Year Sign |
U+0BF6 |
Symbol | SYMBOL | null | ௶ Debit Sign |
U+0BF7 |
Symbol | SYMBOL | null | ௷ Credit Sign |
U+0BF8 |
Symbol | SYMBOL | null | ௸ As Above Sign |
U+0BF9 |
Symbol | SYMBOL | null | ௹ Tamil Rupee Sign |
U+0BFA |
Symbol | SYMBOL | null | ௺ Number Sign |
U+0BFB |
unassigned | |||
U+0BFC |
unassigned | |||
U+0BFD |
unassigned | |||
U+0BFE |
unassigned | |||
U+0BFF |
unassigned |
Tamil text runs may also include historical symbols and fractions from the Tamil Supplement block. These characters should be classified as follows.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+11FC0 |
Number | NUMBER | null | 𑿀 Fraction One Three-Hundred-And-Twentieth |
U+11FC1 |
Number | NUMBER | null | 𑿁 Fraction One One-Hundred-And-Sixtieth |
U+11FC2 |
Number | NUMBER | null | 𑿂 Fraction One Eightieth |
U+11FC3 |
Number | NUMBER | null | 𑿃 Fraction One Sixty-Fourth |
U+11FC4 |
Number | NUMBER | null | 𑿄 Fraction One Fortieth |
U+11FC5 |
Number | NUMBER | null | 𑿅 Fraction One Thirty-Second |
U+11FC6 |
Number | NUMBER | null | 𑿆 Fraction Three Eightieths |
U+11FC7 |
Number | NUMBER | null | 𑿇 Fraction Three Sixty-Fourths |
U+11FC8 |
Number | NUMBER | null | 𑿈 Fraction One Twentieth |
U+11FC9 |
Number | NUMBER | null | 𑿉 Fraction One Sixteenth-1 |
U+11FCA |
Number | NUMBER | null | 𑿊 Fraction One Sixteenth-2 |
U+11FCB |
Number | NUMBER | null | 𑿋 Fraction One Tenth |
U+11FCC |
Number | NUMBER | null | 𑿌 Fraction One Eighth |
U+11FCD |
Number | NUMBER | null | 𑿍 Fraction Three Twentieths |
U+11FCE |
Number | NUMBER | null | 𑿎 Fraction Three Sixteenths |
U+11FCF |
Number | NUMBER | null | 𑿏 Fraction One Fifth |
U+11FD0 |
Number | NUMBER | null | 𑿐 Fraction One Quarter |
U+11FD1 |
Number | NUMBER | null | 𑿑 Fraction One Half-1 |
U+11FD2 |
Number | NUMBER | null | 𑿒 Fraction One Half-2 |
U+11FD3 |
Number | NUMBER | null | 𑿓 Fraction Three Quarters |
U+11FD4 |
Number | NUMBER | null | 𑿔 Fraction Downscaling Factor Kiizh |
U+11FD5 |
Symbol | SYMBOL | null | 𑿕 Sign Nel |
U+11FD6 |
Symbol | SYMBOL | null | 𑿖 Sign Cevitu |
U+11FD7 |
Symbol | SYMBOL | null | 𑿗 Sign Aazhaakku |
U+11FD8 |
Symbol | SYMBOL | null | 𑿘 Sign Uzhakku |
U+11FD9 |
Symbol | SYMBOL | null | 𑿙 Sign Muuvuzhakku |
U+11FDA |
Symbol | SYMBOL | null | 𑿚 Sign Kuruni |
U+11FDB |
Symbol | SYMBOL | null | 𑿛 Sign Pathakku |
U+11FDC |
Symbol | SYMBOL | null | 𑿜 Sign Mukkuruni |
U+11FDD |
Symbol | SYMBOL | null | 𑿝 Sign Kaacu |
U+11FDE |
Symbol | SYMBOL | null | 𑿞 Sign Panam |
U+11FDF |
Symbol | SYMBOL | null | 𑿟 Sign Pon |
U+11FE0 |
Symbol | SYMBOL | null | 𑿠 Sign Varaakan |
U+11FE1 |
Symbol | SYMBOL | null | 𑿡 Sign Paaram |
U+11FE2 |
Symbol | SYMBOL | null | 𑿢 Sign Kuzhi |
U+11FE3 |
Symbol | SYMBOL | null | 𑿣 Sign Veli |
U+11FE4 |
Symbol | SYMBOL | null | 𑿤 Wet Cultivation Sign |
U+11FE5 |
Symbol | SYMBOL | null | 𑿥 Dry Cultivation Sign |
U+11FE6 |
Symbol | SYMBOL | null | 𑿦 Land Sign |
U+11FE7 |
Symbol | SYMBOL | null | 𑿧 Salt Pan Sign |
U+11FE8 |
Symbol | SYMBOL | null | 𑿨 Traditional Credit Sign |
U+11FE9 |
Symbol | SYMBOL | null | 𑿩 Traditional Number Sign |
U+11FEA |
Symbol | SYMBOL | null | 𑿪 Current Sign |
U+11FEB |
Symbol | SYMBOL | null | 𑿫 And Odd Sign |
U+11FEC |
Symbol | SYMBOL | null | 𑿬 Spent Sign |
U+11FED |
Symbol | SYMBOL | null | 𑿭 Total Sign |
U+11FEE |
Symbol | SYMBOL | null | 𑿮 In Possession Sign |
U+11FEF |
Symbol | SYMBOL | null | 𑿯 Starting From Sign |
U+11FF0 |
Symbol | SYMBOL | null | 𑿰 Sign Muthaliya |
U+11FF1 |
Symbol | SYMBOL | null | 𑿱 Sign Vakaiyaraa |
U+11FF2 |
unassigned | |||
U+11FF3 |
unassigned | |||
U+11FF4 |
unassigned | |||
U+11FF5 |
unassigned | |||
U+11FF6 |
unassigned | |||
U+11FF7 |
unassigned | |||
U+11FF8 |
unassigned | |||
U+11FF9 |
unassigned | |||
U+11FFA |
unassigned | |||
U+11FFB |
unassigned | |||
U+11FFC |
unassigned | |||
U+11FFD |
unassigned | |||
U+11FFE |
unassigned | |||
U+11FFF |
Punctuation | null | null | 𑿿 End Of Text |
Tamil text runs may also include diacritical and syllable-modifier marks from the Grantha block. These characters should be classified as follows.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+11301 |
Mark [Mn] | BINDU | TOP_POSITION | 𑌁 Grantha Candrabindu |
U+11303 |
Mark [Mc] | VISARGA | RIGHT_POSITION | 𑌃 Grantha Visarga |
U+1133B |
Mark [Mn] | NUKTA | BOTTOM_POSITION | 𑌻 Combining Bindu Below |
U+1133C |
Mark [Mn] | NUKTA | BOTTOM_POSITION | 𑌼 Grantha Nukta |
Sanskrit runs written in the Tamil script may also include characters from the Vedic Extensions block. These characters should be classified as follows.
Note: See the Vedic Extensions document for additional information.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+1CD0 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳐ Tone Karshana |
U+1CD1 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳑ Tone Shara |
U+1CD2 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳒ Tone Prenkha |
U+1CD3 |
Punctuation | null | null | ᳓ Sign Nihshvasa |
U+1CD4 |
Mark [Mn] | CANTILLATION | OVERSTRUCK | ᳔ Tone Midline Svarita |
U+1CD5 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳕ Tone Aggravated Independent Svarita |
U+1CD6 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳖ Tone Independent Svarita |
U+1CD7 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳗ Tone Kathaka Independent Svarita |
U+1CD8 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳘ Tone Candra Below |
U+1CD9 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳙ Tone Kathaka Independent Svarita Schroeder |
U+1CDA |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳚ Tone Double Svarita |
U+1CDB |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳛ Tone Triple Svarita |
U+1CDC |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳜ Tone Kathaka Anudatta |
U+1CDD |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳝ Tone Dot Below |
U+1CDE |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳞ Tone Two Dots Below |
U+1CDF |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ᳟ Tone Three Dots Below |
U+1CE0 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳠ Tone Rigvedic Kashmiri Independent Svarita |
U+1CE1 |
Mark [Mc] | CANTILLATION | RIGHT_POSITION | ᳡ Tone Atharavedic Independent Svarita |
U+1CE2 |
Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳢ Sign Visarga Svarita |
U+1CE3 |
Mark [Mn] | null | OVERSTRUCK | ᳣ Sign Visarga Udatta |
U+1CE4 |
Mark [Mn] | null | OVERSTRUCK | ᳤ Sign Reversed Visarga Udatta |
U+1CE5 |
Mark [Mn] | null | OVERSTRUCK | ᳥ Sign Visarga Anudatta |
U+1CE6 |
Mark [Mn] | null | OVERSTRUCK | ᳦ Sign Reversed Visarga Anudatta |
U+1CE7 |
Mark [Mn] | null | OVERSTRUCK | ᳧ Sign Visarga Udatta With Tail |
U+1CE8 |
Mark [Mn] | AVAGRAHA | OVERSTRUCK | ᳨ Sign Visarga Anudatta With Tail |
U+1CE9 |
Letter | SYMBOL | null | ᳩ Sign Anusvara Antargomukha |
U+1CEA |
Letter | null | null | ᳪ Sign Anusvara Bahirgomukha |
U+1CEB |
Letter | null | null | ᳫ Sign Anusvara Vamagomukha |
U+1CEC |
Letter | SYMBOL | null | ᳬ Sign Anusvara Vamagomukha With Tail |
U+1CED |
Mark [Mn] | AVAGRAHA | BOTTOM_POSITION | ᳭ Sign Tiryak |
U+1CEE |
Letter | SYMBOL | null | ᳮ Sign Hexiform Long Anusvara |
U+1CEF |
Letter | null | null | ᳯ Sign Long Anusvara |
U+1CF0 |
Letter | null | null | ᳰ Sign Rthang Long Anusvara |
U+1CF2 |
Letter | CONSONANT_DEAD | null | ᳲ Sign Ardhavisarga |
U+1CF3 |
Letter | CONSONANT_DEAD | null | ᳳ Sign Rotated Ardhavisarga |
U+1CF3 |
Mark [Mc] | VISARGA | null | ᳳ Sign Rotated Ardhavisarga |
U+1CF4 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ᳴ Tone Candra Above |
U+1CF5 |
Letter | CONSONANT_WITH_STACKER | null | ᳵ Sign Jihvamuliya |
U+1CF6 |
Letter | CONSONANT_WITH_STACKER | null | ᳶ Sign Upadhmaniya |
U+1CF7 |
Mark [Mc] | null | null | ᳷ Sign Atikrama |
U+1CF8 |
Mark [Mn] | CANTILLATION | null | ᳸ Tone Ring Above |
U+1CF9 |
Mark [Mn] | CANTILLATION | null | ᳹ Tone Double Ring Above |
U+1CFA |
Letter | PLACEHOLDER | null | ᳺ Sign Double Anusvara Antargomukha |
U+1CFB |
unassigned | |||
U+1CFC |
unassigned | |||
U+1CFD |
unassigned | |||
U+1CFE |
unassigned | |||
U+1CFF |
unassigned |
In addition to general punctuation, runs of Tamil text often use the
danda (U+0964
) and double danda (U+0965
) punctuation marks from
the Devanagari block. Tamil text can also incorporate the udatta
(U+0951
) and anudatta (U+0952
) signs from the Devanagari block.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+0951 |
Mark [Mn] | CANTILLATION | TOP_POSITION | ॑ Udatta |
U+0952 |
Mark [Mn] | CANTILLATION | BOTTOM_POSITION | ॒ Anudatta |
U+0964 |
Punctuation | null | null | । Danda |
U+0965 |
Punctuation | null | null | ॥ Double Danda |
Other important characters that may be encountered when shaping runs
of Tamil text include the dotted-circle placeholder (U+25CC
), the
zero-width joiner (U+200D
) and zero-width non-joiner (U+200C
), and
the no-break space (U+00A0
).
The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+00A0 |
Separator | PLACEHOLDER | null | No-break space |
U+00B2 |
Number | SYLLABLE_MODIFIER | TOP | ² Superscript Two |
U+00B3 |
Number | SYLLABLE_MODIFIER | TOP | ³ Superscript Three |
U+200C |
Other | NON_JOINER | null | Zero-width non-joiner |
U+200D |
Other | JOINER | null | Zero-width joiner |
U+2010 |
Punctuation | PLACEHOLDER | null | ‐ Hyphen |
U+2011 |
Punctuation | PLACEHOLDER | null | ‑ No-break hyphen |
U+2012 |
Punctuation | PLACEHOLDER | null | ‒ Figure dash |
U+2013 |
Punctuation | PLACEHOLDER | null | – En dash |
U+2014 |
Punctuation | PLACEHOLDER | null | — Em dash |
U+2074 |
Number | SYLLABLE_MODIFIER | TOP | ⁴ Superscript Four |
U+2082 |
Number | SYLLABLE_MODIFIER | TOP | ₂ Subscript Two |
U+2083 |
Number | SYLLABLE_MODIFIER | TOP | ₃ Subscript Three |
U+2084 |
Number | SYLLABLE_MODIFIER | TOP | ₄ Subscript Four |
U+25CC |
Symbol | DOTTED_CIRCLE | null | ◌ Dotted circle |
The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a "Consonant,Halant,Consonant" sequence. The sequence "Consonant,Halant,ZWJ,Consonant" blocks the formation of a conjunct between the two consonants.
Note, however, that the "Consonant,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence "Consonant,Halant,ZWNJ,Consonant" should produce the first consonant in its standard form, followed by an explicit "Halant".
A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would.
The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,Consonant", "NBSP,mark", or "NBSP,matra".
Tamil text sometimes uses the Latin numerals 2, 3, and 4 in
superscript or subscript positions to annotate Sanskrit. When used in
this fashion, the superscripts and subscripts are treated as
SYLLABLE_MODIFIER
signs for shaping purposes.