This document lists the per-character shaping information needed to shape Syriac text.
Table of Contents
Syriac glyphs should be classified as in the following table. Codepoints in the Syriac block with no assigned meaning are designated as unassigned in the Unicode category column.
The Joining type column indicates whether each codepoint is defined as joining with adjacent characters on the left side, right side, left and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints designated TRANSPARENT in the Joining type column do not join with adjacent characters and, in addition, do not affect the joining behavior of surrounding characters. Non-spacing marks are of type TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent characters to join.
The Joining group column lists the fundamental letter that the listed codepoint behaves like for joining purposes.
Assigned codepoints with a null in the Joining group column evoke no special behavior from the shaping engine during the join-computation stage.
The Mark class column indicates the Canonical Combining Class for the codepoint. Marks are assigned non-zero combining classes so that sequences of adjacent marks can be reordered as required by the orthography.
For Syriac, a subset of marks in the 220 and 230 classes are also designated Modifier Combining Marks (MCM). These are denoted with 220_MCM and 230_MCM in the Mark class column. The MCM marks are treated differently during the mark-reordering stage.
Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
---|---|---|---|---|---|
U+0700 |
Punctuation | NON_JOINING | null | 0 | ܀ End of Paragraph |
U+0701 |
Punctuation | NON_JOINING | null | 0 | ܁ Supralinear Full Stop |
U+0702 |
Punctuation | NON_JOINING | null | 0 | ܂ Sublinear Full Stop |
U+0703 |
Punctuation | NON_JOINING | null | 0 | ܃ Supralinear Colon |
U+0704 |
Punctuation | NON_JOINING | null | 0 | ܄ Sublinear Colon |
U+0705 |
Punctuation | NON_JOINING | null | 0 | ܅ Horizontal Colon |
U+0706 |
Punctuation | NON_JOINING | null | 0 | ܆ Colon Skewed Left |
U+0707 |
Punctuation | NON_JOINING | null | 0 | ܇ Colon Skewed Right |
U+0708 |
Punctuation | NON_JOINING | null | 0 | ܈ Supralinear Colon Skewed Left |
U+0709 |
Punctuation | NON_JOINING | null | 0 | ܉ Sublinear Colon Skewed Right |
U+070A |
Punctuation | NON_JOINING | null | 0 | ܊ Contraction |
U+070B |
Punctuation | NON_JOINING | null | 0 | ܋ Harklean Obelus |
U+070C |
Punctuation | NON_JOINING | null | 0 | ܌ Harklean Metobelus |
U+070D |
Punctuation | NON_JOINING | null | 0 | ܍ Harklean Asteriscus |
U+070E |
unassigned | ||||
U+070F |
Other | TRANSPARENT | null | 0 | Syriac Abbreviation Mark |
U+0710 |
Letter | RIGHT | ALAPH | 0 | ܐ Alaph |
U+0711 |
Mark [Mn] | TRANSPARENT | null | 36 | ܑ Superscript Alaph |
U+0712 |
Letter | DUAL | BETH | 0 | ܒ Beth |
U+0713 |
Letter | DUAL | GAMAL | 0 | ܓ Gamal |
U+0714 |
Letter | DUAL | GAMAL | 0 | ܔ Gamal Garshuni |
U+0715 |
Letter | RIGHT | DALATH_RISH | 0 | ܕ Dalath |
U+0716 |
Letter | RIGHT | DALATH_RISH | 0 | ܖ Dotless Dalath Rish |
U+0717 |
Letter | RIGHT | HE | 0 | ܗ He |
U+0718 |
Letter | RIGHT | SYRIAC_WAW | 0 | ܘ Waw |
U+0719 |
Letter | RIGHT | ZAIN | 0 | ܙ Zain |
U+071A |
Letter | DUAL | HETH | 0 | ܚ Heth |
U+071B |
Letter | DUAL | TETH | 0 | ܛ Teth |
U+071C |
Letter | DUAL | TETH | 0 | ܜ Teth Garshuni |
U+071D |
Letter | DUAL | YUDH | 0 | ܝ Yudh |
U+071E |
Letter | RIGHT | YUDH_HE | 0 | ܞ Yudh He |
U+071F |
Letter | DUAL | KAPH | 0 | ܟ Kaph |
U+0720 |
Letter | DUAL | LAMADH | 0 | ܠ Lamadh |
U+0721 |
Letter | DUAL | MIM | 0 | ܡ Mim |
U+0722 |
Letter | DUAL | NUN | 0 | ܢ Nun |
U+0723 |
Letter | DUAL | SEMKATH | 0 | ܣ Semkath |
U+0724 |
Letter | DUAL | FINAL_SEMKATH | 0 | ܤ Final Semkath |
U+0725 |
Letter | DUAL | E | 0 | ܥ E |
U+0727 |
Letter | DUAL | PE | 0 | ܧ Pe |
U+0727 |
Letter | DUAL | REVERSED_PE | 0 | ܧ Reversed Pe |
U+0728 |
Letter | RIGHT | SADHE | 0 | ܨ Sadhe |
U+0729 |
Letter | DUAL | QAPH | 0 | ܩ Qaph |
U+072A |
Letter | RIGHT | DALATH_RISH | 0 | ܪ Rish |
U+072B |
Letter | DUAL | SHIN | 0 | ܫ Shin |
U+072C |
Letter | RIGHT | TAW | 0 | ܬ Taw |
U+072D |
Letter | DUAL | BETH | 0 | ܭ Persian Bheth |
U+072E |
Letter | DUAL | GAMAL | 0 | ܮ Persian Ghamal |
U+072F |
Letter | RIGHT | DALATH_RISH | 0 | ܯ Persian Dhalath |
U+0730 |
Mark [Mn] | TRANSPARENT | null | 230 | ܰ Pthaha Above |
U+0731 |
Mark [Mn] | TRANSPARENT | null | 220 | ܱ Pthaha Below |
U+0732 |
Mark [Mn] | TRANSPARENT | null | 230 | ܲ Pthaha Dotted |
U+0733 |
Mark [Mn] | TRANSPARENT | null | 230 | ܳ Zqapha Above |
U+0734 |
Mark [Mn] | TRANSPARENT | null | 220 | ܴ Zqapha Below |
U+0735 |
Mark [Mn] | TRANSPARENT | null | 230 | ܵ Zqapha Dotted |
U+0736 |
Mark [Mn] | TRANSPARENT | null | 230 | ܶ Rbasa Above |
U+0737 |
Mark [Mn] | TRANSPARENT | null | 220 | ܷ Rbasa Below |
U+0738 |
Mark [Mn] | TRANSPARENT | null | 220 | ܸ Dotted Zlama Horizontal |
U+0739 |
Mark [Mn] | TRANSPARENT | null | 220 | ܹ Dotted Zlama Angular |
U+073A |
Mark [Mn] | TRANSPARENT | null | 230 | ܺ Hbasa Above |
U+073B |
Mark [Mn] | TRANSPARENT | null | 220 | ܻ Hbasa Below |
U+073C |
Mark [Mn] | TRANSPARENT | null | 220 | ܼ Hbasa-Esasa Dotted |
U+073D |
Mark [Mn] | TRANSPARENT | null | 230 | ܽ Esasa Above |
U+073E |
Mark [Mn] | TRANSPARENT | null | 220 | ܾ Esasa Below |
U+073F |
Mark [Mn] | TRANSPARENT | null | 230 | ܿ Rwaha |
U+0740 |
Mark [Mn] | TRANSPARENT | null | 230 | ݀ Feminine Dot |
U+0741 |
Mark [Mn] | TRANSPARENT | null | 230 | ݁ Qushshaya |
U+0742 |
Mark [Mn] | TRANSPARENT | null | 220 | ݂ Rukkakha |
U+0743 |
Mark [Mn] | TRANSPARENT | null | 230 | ݃ Two Vertical Dots Above |
U+0744 |
Mark [Mn] | TRANSPARENT | null | 220 | ݄ Two Vertical Dots Below |
U+0745 |
Mark [Mn] | TRANSPARENT | null | 230 | ݅ Three Dots Above |
U+0746 |
Mark [Mn] | TRANSPARENT | null | 220 | ݆ Three Dots Below |
U+0747 |
Mark [Mn] | TRANSPARENT | null | 220 | ݇ Oblique Line Above |
U+0748 |
Mark [Mn] | TRANSPARENT | null | 230 | ݈ Oblique Line Below |
U+0749 |
Mark [Mn] | TRANSPARENT | null | 230 | ݉ Music |
U+074A |
Mark [Mn] | TRANSPARENT | null | 230 | ݊ Barrekh |
U+074B |
unassigned | ||||
U+074C |
unassigned | ||||
U+074D |
Letter | RIGHT | ZHAIN | 0 | ݍ Sogdian Zhain |
U+074E |
Letter | DUAL | KHAPH | 0 | ݎ Sogdian Khaph |
U+074F |
Letter | DUAL | FE | 0 | ݏ Sogdian Fe |
The Syriac Supplement block includes letters needed to write Suriyani Malayalam, also known as Garshuni or Syriac Malayalam.
Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
---|---|---|---|---|---|
U+0860 |
Letter | DUAL | MALAYALAM_NGA | 0 | ࡠ Malayalam Nga |
U+0861 |
Letter | NON_JOINING | MALAYALAM_JA | 0 | ࡡ Malayalam Ja |
U+0862 |
Letter | DUAL | MALAYALAM_NYA | 0 | ࡢ Malayalam Nya |
U+0863 |
Letter | DUAL | MALAYALAM_TTA | 0 | ࡣ Malayalam Tta |
U+0864 |
Letter | DUAL | MALAYALAM_NNA | 0 | ࡤ Malayalam Nna |
U+0865 |
Letter | DUAL | MALAYALAM_NNNA | 0 | ࡥ Malayalam Nnna |
U+0866 |
Letter | NON_JOINING | MALAYALAM_BHA | 0 | ࡦ Malayalam Bha |
U+0867 |
Letter | RIGHT | MALAYALAM_RA | 0 | ࡧ Malayalam Ra |
U+0868 |
Letter | DUAL | MALAYALAM_LLA | 0 | ࡨ Malayalam Lla |
U+0869 |
Letter | RIGHT | MALAYALAM_LLLA | 0 | ࡩ Malayalam Llla |
U+086A |
Letter | RIGHT | MALAYALAM_SSA | 0 | ࡪ Malayalam Ssa |
U+086B |
unassigned | ||||
U+086C |
unassigned | ||||
U+086D |
unassigned | ||||
U+086E |
unassigned | ||||
U+086F |
unassigned |
Other important characters that may be encountered when shaping runs
of Syriac text include the dotted-circle placeholder (U+25CC
), the
combining grapheme joiner (U+034F
), the zero-width joiner (U+200D
)
and zero-width non-joiner (U+200C
), the left-to-right text marker
(U+200E
) and right-to-left text marker (U+200F
), and the no-break
space (U+00A0
).
The dotted-circle placeholder is frequently used when displaying a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.
In addition, Syriac text runs may include the "Tatweel" or kashida
codepoint (U+0640
) from the Arabic block, because the Syriac block
does not encode a separate kashida character.
Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
---|---|---|---|---|---|
U+00A0 |
Separator | NON_JOINING | null | 0 | No-break space |
U+034F |
Other | NON_JOINING | null | 0 | ͏ Combining grapheme joiner |
U+0640 |
Letter modifier | JOIN_CAUSING | null | 0 | ـ Arabic Tatweel |
U+200C |
Other | NON_JOINING | null | 0 | Zero-width non-joiner |
U+200D |
Other | JOIN_CAUSING | null | 0 | Zero-width joiner |
U+200E |
Other | NON_JOINING | null | 0 | Left-to-Right marker |
U+200F |
Other | NON_JOINING | null | 0 | Right-to-Left marker |
U+2010 |
Punctuation | NON_JOINING | null | 0 | ‐ Hyphen |
U+2011 |
Punctuation | NON_JOINING | null | 0 | ‑ No-break hyphen |
U+2012 |
Punctuation | NON_JOINING | null | 0 | ‒ Figure dash |
U+2013 |
Punctuation | NON_JOINING | null | 0 | – En dash |
U+2014 |
Punctuation | NON_JOINING | null | 0 | — Em dash |
U+25CC |
Symbol | NON_JOINING | null | 0 | ◌ Dotted circle |
The combining grapheme joiner (CGJ) is primarily used to alter the order in which adjacent marks are positioned during the mark-reordering stage, in order to adhere to the needs of a non-default language orthography.
The zero-width joiner (ZWJ) is primarily used to force the usage of the cursive connecting form of a letter even when the context of the adjoining letters would not trigger the connecting form.
For example, to show the initial form of a letter in isolation (such as for dislaying it in a table of forms), the sequence "Letter,ZWJ" would be used. To show the medial form of a letter in isolation, the sequence "ZWJ,Letter,ZWJ" would be used.
The right-to-left mark (RLM) and left-to-right mark (LRM) are used by the Unicode bidirectionality algorithm (BiDi) to indicate the points in a text run at which the writing direction changes.
The no-break space is primarily used to display those codepoints that are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder.