Skip to content

Latest commit

 

History

History
229 lines (190 loc) · 19.3 KB

character-tables-syriac.md

File metadata and controls

229 lines (190 loc) · 19.3 KB

Syriac character tables

This document lists the per-character shaping information needed to shape Syriac text.

Table of Contents

Syriac character table

Syriac glyphs should be classified as in the following table. Codepoints in the Syriac block with no assigned meaning are designated as unassigned in the Unicode category column.

The Joining type column indicates whether each codepoint is defined as joining with adjacent characters on the left side, right side, left and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints designated TRANSPARENT in the Joining type column do not join with adjacent characters and, in addition, do not affect the joining behavior of surrounding characters. Non-spacing marks are of type TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent characters to join.

The Joining group column lists the fundamental letter that the listed codepoint behaves like for joining purposes.

Assigned codepoints with a null in the Joining group column evoke no special behavior from the shaping engine during the join-computation stage.

The Mark class column indicates the Canonical Combining Class for the codepoint. Marks are assigned non-zero combining classes so that sequences of adjacent marks can be reordered as required by the orthography.

For Syriac, a subset of marks in the 220 and 230 classes are also designated Modifier Combining Marks (MCM). These are denoted with 220_MCM and 230_MCM in the Mark class column. The MCM marks are treated differently during the mark-reordering stage.

Codepoint Unicode category Joining type Joining group Mark class Glyph
U+0700 Punctuation NON_JOINING null 0 ܀ End of Paragraph
U+0701 Punctuation NON_JOINING null 0 ܁ Supralinear Full Stop
U+0702 Punctuation NON_JOINING null 0 ܂ Sublinear Full Stop
U+0703 Punctuation NON_JOINING null 0 ܃ Supralinear Colon
U+0704 Punctuation NON_JOINING null 0 ܄ Sublinear Colon
U+0705 Punctuation NON_JOINING null 0 ܅ Horizontal Colon
U+0706 Punctuation NON_JOINING null 0 ܆ Colon Skewed Left
U+0707 Punctuation NON_JOINING null 0 ܇ Colon Skewed Right
U+0708 Punctuation NON_JOINING null 0 ܈ Supralinear Colon Skewed Left
U+0709 Punctuation NON_JOINING null 0 ܉ Sublinear Colon Skewed Right
U+070A Punctuation NON_JOINING null 0 ܊ Contraction
U+070B Punctuation NON_JOINING null 0 ܋ Harklean Obelus
U+070C Punctuation NON_JOINING null 0 ܌ Harklean Metobelus
U+070D Punctuation NON_JOINING null 0 ܍ Harklean Asteriscus
U+070E unassigned
U+070F Other TRANSPARENT null 0 ܏ Syriac Abbreviation Mark
U+0710 Letter RIGHT ALAPH 0 ܐ Alaph
U+0711 Mark [Mn] TRANSPARENT null 36 ܑ Superscript Alaph
U+0712 Letter DUAL BETH 0 ܒ Beth
U+0713 Letter DUAL GAMAL 0 ܓ Gamal
U+0714 Letter DUAL GAMAL 0 ܔ Gamal Garshuni
U+0715 Letter RIGHT DALATH_RISH 0 ܕ Dalath
U+0716 Letter RIGHT DALATH_RISH 0 ܖ Dotless Dalath Rish
U+0717 Letter RIGHT HE 0 ܗ He
U+0718 Letter RIGHT SYRIAC_WAW 0 ܘ Waw
U+0719 Letter RIGHT ZAIN 0 ܙ Zain
U+071A Letter DUAL HETH 0 ܚ Heth
U+071B Letter DUAL TETH 0 ܛ Teth
U+071C Letter DUAL TETH 0 ܜ Teth Garshuni
U+071D Letter DUAL YUDH 0 ܝ Yudh
U+071E Letter RIGHT YUDH_HE 0 ܞ Yudh He
U+071F Letter DUAL KAPH 0 ܟ Kaph
U+0720 Letter DUAL LAMADH 0 ܠ Lamadh
U+0721 Letter DUAL MIM 0 ܡ Mim
U+0722 Letter DUAL NUN 0 ܢ Nun
U+0723 Letter DUAL SEMKATH 0 ܣ Semkath
U+0724 Letter DUAL FINAL_SEMKATH 0 ܤ Final Semkath
U+0725 Letter DUAL E 0 ܥ E
U+0727 Letter DUAL PE 0 ܧ Pe
U+0727 Letter DUAL REVERSED_PE 0 ܧ Reversed Pe
U+0728 Letter RIGHT SADHE 0 ܨ Sadhe
U+0729 Letter DUAL QAPH 0 ܩ Qaph
U+072A Letter RIGHT DALATH_RISH 0 ܪ Rish
U+072B Letter DUAL SHIN 0 ܫ Shin
U+072C Letter RIGHT TAW 0 ܬ Taw
U+072D Letter DUAL BETH 0 ܭ Persian Bheth
U+072E Letter DUAL GAMAL 0 ܮ Persian Ghamal
U+072F Letter RIGHT DALATH_RISH 0 ܯ Persian Dhalath
U+0730 Mark [Mn] TRANSPARENT null 230 ܰ Pthaha Above
U+0731 Mark [Mn] TRANSPARENT null 220 ܱ Pthaha Below
U+0732 Mark [Mn] TRANSPARENT null 230 ܲ Pthaha Dotted
U+0733 Mark [Mn] TRANSPARENT null 230 ܳ Zqapha Above
U+0734 Mark [Mn] TRANSPARENT null 220 ܴ Zqapha Below
U+0735 Mark [Mn] TRANSPARENT null 230 ܵ Zqapha Dotted
U+0736 Mark [Mn] TRANSPARENT null 230 ܶ Rbasa Above
U+0737 Mark [Mn] TRANSPARENT null 220 ܷ Rbasa Below
U+0738 Mark [Mn] TRANSPARENT null 220 ܸ Dotted Zlama Horizontal
U+0739 Mark [Mn] TRANSPARENT null 220 ܹ Dotted Zlama Angular
U+073A Mark [Mn] TRANSPARENT null 230 ܺ Hbasa Above
U+073B Mark [Mn] TRANSPARENT null 220 ܻ Hbasa Below
U+073C Mark [Mn] TRANSPARENT null 220 ܼ Hbasa-Esasa Dotted
U+073D Mark [Mn] TRANSPARENT null 230 ܽ Esasa Above
U+073E Mark [Mn] TRANSPARENT null 220 ܾ Esasa Below
U+073F Mark [Mn] TRANSPARENT null 230 ܿ Rwaha
U+0740 Mark [Mn] TRANSPARENT null 230 ݀ Feminine Dot
U+0741 Mark [Mn] TRANSPARENT null 230 ݁ Qushshaya
U+0742 Mark [Mn] TRANSPARENT null 220 ݂ Rukkakha
U+0743 Mark [Mn] TRANSPARENT null 230 ݃ Two Vertical Dots Above
U+0744 Mark [Mn] TRANSPARENT null 220 ݄ Two Vertical Dots Below
U+0745 Mark [Mn] TRANSPARENT null 230 ݅ Three Dots Above
U+0746 Mark [Mn] TRANSPARENT null 220 ݆ Three Dots Below
U+0747 Mark [Mn] TRANSPARENT null 220 ݇ Oblique Line Above
U+0748 Mark [Mn] TRANSPARENT null 230 ݈ Oblique Line Below
U+0749 Mark [Mn] TRANSPARENT null 230 ݉ Music
U+074A Mark [Mn] TRANSPARENT null 230 ݊ Barrekh
U+074B unassigned
U+074C unassigned
U+074D Letter RIGHT ZHAIN 0 ݍ Sogdian Zhain
U+074E Letter DUAL KHAPH 0 ݎ Sogdian Khaph
U+074F Letter DUAL FE 0 ݏ Sogdian Fe

Syriac Supplement character table

The Syriac Supplement block includes letters needed to write Suriyani Malayalam, also known as Garshuni or Syriac Malayalam.

Codepoint Unicode category Joining type Joining group Mark class Glyph
U+0860 Letter DUAL MALAYALAM_NGA 0 ࡠ Malayalam Nga
U+0861 Letter NON_JOINING MALAYALAM_JA 0 ࡡ Malayalam Ja
U+0862 Letter DUAL MALAYALAM_NYA 0 ࡢ Malayalam Nya
U+0863 Letter DUAL MALAYALAM_TTA 0 ࡣ Malayalam Tta
U+0864 Letter DUAL MALAYALAM_NNA 0 ࡤ Malayalam Nna
U+0865 Letter DUAL MALAYALAM_NNNA 0 ࡥ Malayalam Nnna
U+0866 Letter NON_JOINING MALAYALAM_BHA 0 ࡦ Malayalam Bha
U+0867 Letter RIGHT MALAYALAM_RA 0 ࡧ Malayalam Ra
U+0868 Letter DUAL MALAYALAM_LLA 0 ࡨ Malayalam Lla
U+0869 Letter RIGHT MALAYALAM_LLLA 0 ࡩ Malayalam Llla
U+086A Letter RIGHT MALAYALAM_SSA 0 ࡪ Malayalam Ssa
U+086B unassigned
U+086C unassigned
U+086D unassigned
U+086E unassigned
U+086F unassigned

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Syriac text include the dotted-circle placeholder (U+25CC), the combining grapheme joiner (U+034F), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), the left-to-right text marker (U+200E) and right-to-left text marker (U+200F), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

In addition, Syriac text runs may include the "Tatweel" or kashida codepoint (U+0640) from the Arabic block, because the Syriac block does not encode a separate kashida character.

Codepoint Unicode category Joining type Joining group Mark class Glyph
U+00A0 Separator NON_JOINING null 0   No-break space
U+034F Other NON_JOINING null 0 ͏ Combining grapheme joiner
U+0640 Letter modifier JOIN_CAUSING null 0 ـ Arabic Tatweel
U+200C Other NON_JOINING null 0 ‌ Zero-width non-joiner
U+200D Other JOIN_CAUSING null 0 ‍ Zero-width joiner
U+200E Other NON_JOINING null 0 ‎ Left-to-Right marker
U+200F Other NON_JOINING null 0 ‏ Right-to-Left marker
U+2010 Punctuation NON_JOINING null 0 ‐ Hyphen
U+2011 Punctuation NON_JOINING null 0 ‑ No-break hyphen
U+2012 Punctuation NON_JOINING null 0 ‒ Figure dash
U+2013 Punctuation NON_JOINING null 0 – En dash
U+2014 Punctuation NON_JOINING null 0 — Em dash
U+25CC Symbol NON_JOINING null 0 ◌ Dotted circle

The combining grapheme joiner (CGJ) is primarily used to alter the order in which adjacent marks are positioned during the mark-reordering stage, in order to adhere to the needs of a non-default language orthography.

The zero-width joiner (ZWJ) is primarily used to force the usage of the cursive connecting form of a letter even when the context of the adjoining letters would not trigger the connecting form.

For example, to show the initial form of a letter in isolation (such as for dislaying it in a table of forms), the sequence "Letter,ZWJ" would be used. To show the medial form of a letter in isolation, the sequence "ZWJ,Letter,ZWJ" would be used.

The right-to-left mark (RLM) and left-to-right mark (LRM) are used by the Unicode bidirectionality algorithm (BiDi) to indicate the points in a text run at which the writing direction changes.

The no-break space is primarily used to display those codepoints that are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder.