Skip to content

Latest commit

 

History

History
221 lines (200 loc) · 22.2 KB

character-tables-thai.md

File metadata and controls

221 lines (200 loc) · 22.2 KB

Thai character tables

This document lists the per-character shaping information needed to shape Thai text.

Table of Contents

Thai character table

Thai glyphs should be classified as in the following table. Codepoints in the Thai block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Codepoint Unicode category Shaping class Mark-placement subclass Combining class PUA Glyph
U+0E00 unassigned
U+0E01 Letter CONSONANT null 0 NC ก Ko Kai
U+0E02 Letter CONSONANT null 0 NC ข Kho Khai
U+0E03 Letter CONSONANT null 0 NC ฃ Kho Khuat
U+0E04 Letter CONSONANT null 0 NC ค Kho Khwai
U+0E05 Letter CONSONANT null 0 NC ฅ Kho Khon
U+0E06 Letter CONSONANT null 0 NC ฆ Kho Rakhang
U+0E07 Letter CONSONANT null 0 NC ง Ngo Ngu
U+0E08 Letter CONSONANT null 0 NC จ Cho Chan
U+0E09 Letter CONSONANT null 0 NC ฉ Cho Ching
U+0E0A Letter CONSONANT null 0 NC ช Cho Chang
U+0E0B Letter CONSONANT null 0 NC ซ So So
U+0E0C Letter CONSONANT null 0 NC ฌ Cho Choe
U+0E0D Letter CONSONANT null 0 RC ญ Yo Ying
U+0E0E Letter CONSONANT null 0 DC ฎ Do Chada
U+0E0F Letter CONSONANT null 0 DC ฏ To Patak
U+0E10 Letter CONSONANT null 0 RC ฐ Tho Than
U+0E11 Letter CONSONANT null 0 NC ฑ Tho Nangmontho
U+0E12 Letter CONSONANT null 0 NC ฒ Tho Phuthao
U+0E13 Letter CONSONANT null 0 NC ณ No Nen
U+0E14 Letter CONSONANT null 0 NC ด Do Dek
U+0E15 Letter CONSONANT null 0 NC ต To Tao
U+0E16 Letter CONSONANT null 0 NC ถ Tho Thung
U+0E17 Letter CONSONANT null 0 NC ท Tho Thahan
U+0E18 Letter CONSONANT null 0 NC ธ Tho Thong
U+0E19 Letter CONSONANT null 0 NC น No Nu
U+0E1A Letter CONSONANT null 0 NC บ Bo Baimai
U+0E1B Letter CONSONANT null 0 AC ป Po Pla
U+0E1C Letter CONSONANT null 0 NC ผ Pho Phung
U+0E1D Letter CONSONANT null 0 AC ฝ Fo Fa
U+0E1E Letter CONSONANT null 0 NC พ Pho Phan
U+0E1F Letter CONSONANT null 0 AC ฟ Fo Fan
U+0E20 Letter CONSONANT null 0 NC ภ Pho Samphao
U+0E21 Letter CONSONANT null 0 NC ม Mo Ma
U+0E22 Letter CONSONANT null 0 NC ย Yo Yak
U+0E23 Letter CONSONANT null 0 NC ร Ro Rua
U+0E24 Letter CONSONANT null 0 NC ฤ Ru
U+0E25 Letter CONSONANT null 0 NC ล Lo Ling
U+0E26 Letter CONSONANT null 0 NC ฦ Lu
U+0E27 Letter CONSONANT null 0 NC ว Wo Waen
U+0E28 Letter CONSONANT null 0 NC ศ So Sala
U+0E29 Letter CONSONANT null 0 NC ษ So Rusi
U+0E2A Letter CONSONANT null 0 NC ส So Sua
U+0E2B Letter CONSONANT null 0 NC ห Ho Hip
U+0E2C Letter CONSONANT null 0 NC ฬ Lo Chula
U+0E2D Letter CONSONANT null 0 NC อ O Ang
U+0E2E Letter CONSONANT null 0 NC ฮ Ho Nokhuk
U+0E2F Letter CONSONANT null 0 null ฯ Paiyannoi
U+0E30 Letter VOWEL_DEPENDENT RIGHT_POSITION 0 CV ะ Sara A
U+0E31 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 AV ั Mai Han-akat
U+0E32 Letter VOWEL_DEPENDENT RIGHT_POSITION 0 CV า Sara Aa
U+0E33 Letter VOWEL_DEPENDENT RIGHT_POSITION 0 null ำ Sara Am
U+0E34 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 AV ิ Sara I
U+0E35 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 AV ี Sara Ii
U+0E36 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 AV ึ Sara Ue
U+0E37 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 AV ื Sara Uee
U+0E38 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION 3 BV ุ Sara U
U+0E39 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION 3 BV ู Sara Uu
U+0E3A Mark [Mn] PURE_KILLER BOTTOM_POSITION 9 BV ฺ Phinthu
U+0E3B unassigned
U+0E3C unassigned
U+0E3D unassigned
U+0E3E unassigned
U+0E3F Symbol SYMBOL null 0 null ฿ Currency symbol Baht
U+0E40 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 CV เ Sara E
U+0E41 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 CV แ Sara Ae
U+0E42 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 CV โ Sara O
U+0E43 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 CV ใ Sara Ai Maimuan
U+0E44 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 CV ไ Sara Ai Maimalai
U+0E45 Letter VOWEL_DEPENDENT RIGHT_POSITION 0 CV ๅ Lakkhangyao
U+0E46 Letter Modifier null null 0 null ๆ Maiyamok
U+0E47 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 AV ็ Maitaikhu
U+0E48 Mark [Mn] TONE_MARKER TOP_POSITION 107 TV ่ Mai Ek
U+0E49 Mark [Mn] TONE_MARKER TOP_POSITION 107 TV ้ Mai Tho
U+0E4A Mark [Mn] TONE_MARKER TOP_POSITION 107 TV ๊ Mai Tri
U+0E4B Mark [Mn] TONE_MARKER TOP_POSITION 107 TV ๋ Mai Chattawa
U+0E4C Mark [Mn] CONSONANT_KILLER TOP_POSITION 0 TV ์ Thanthakhat
U+0E4D Mark [Mn] BINDU TOP_POSITION 0 AV ํ Nikhahit
U+0E4E Mark [Mn] PURE_KILLER TOP_POSITION 0 AV ๎ Yamakkan
U+0E4F Punctuation null null 0 null ๏ Fongman
U+0E50 Number NUMBER null 0 null ๐ Digit zero
U+0E51 Number NUMBER null 0 null ๑ Digit one
U+0E52 Number NUMBER null 0 null ๒ Digit two
U+0E53 Number NUMBER null 0 null ๓ Digit three
U+0E54 Number NUMBER null 0 null ๔ Digit four
U+0E55 Number NUMBER null 0 null ๕ Digit five
U+0E56 Number NUMBER null 0 null ๖ Digit six
U+0E57 Number NUMBER null 0 null ๗ Digit seven
U+0E58 Number NUMBER null 0 null ๘ Digit eight
U+0E59 Number NUMBER null 0 null ๙ Digit nine
U+0E5A Punctuation null null 0 null ๚ Angkhankhu
U+0E5B Punctuation null null 0 null ๛ Khomut
U+0E5C unassigned
U+0E5D unassigned
U+0E5E unassigned
U+0E5F unassigned
U+0E60 unassigned
U+0E61 unassigned
U+0E62 unassigned
U+0E63 unassigned
U+0E64 unassigned
U+0E65 unassigned
U+0E66 unassigned
U+0E67 unassigned
U+0E68 unassigned
U+0E69 unassigned
U+0E6A unassigned
U+0E6B unassigned
U+0E6C unassigned
U+0E6D unassigned
U+0E6E unassigned
U+0E6F unassigned
U+0E70 unassigned
U+0E71 unassigned
U+0E72 unassigned
U+0E73 unassigned
U+0E74 unassigned
U+0E75 unassigned
U+0E76 unassigned
U+0E77 unassigned
U+0E78 unassigned
U+0E79 unassigned
U+0E7A unassigned
U+0E7B unassigned
U+0E7C unassigned
U+0E7D unassigned
U+0E7E unassigned
U+0E7F unassigned

Miscellaneous character table

In addition to general punctuation, runs of Thai text often use the combining macron below (U+0331 ), combining tilde (U+0303), modifier letter apostrophe (U+02BC), and modifier letter minus sign (U+02D7), from the Combining Diacritical Marks block, particularly when used to write minority languages.

In addition, Thai text typically does not insert spaces between words. Consequently, the Zero-Width Space (U+200B) character is often used to insert invisible break points that may be converted to line breaks.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+02BC Mark [Mn] TONE_MARKER TOP_POSITION ʼ Modifier apostrophe
U+02D7 Mark [Mn] TONE_MARKER BOTTOM_POSITION ˗ Modifier minus sign
U+0303 Mark [Mn] TONE_MARKER TOP_POSITION ̃ Combining tilde
U+0331 Mark [Mn] TONE_MARKER TOP_POSITION ̱ Combining macron below
U+200B Separator PLACEHOLDER null ​ Zero-width space

Other important characters that may be encountered when shaping runs of Thai text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+00A0 Separator PLACEHOLDER null   No-break space
U+200C Other NON_JOINER null ‌ Zero-width non-joiner
U+200D Other JOINER null ‍ Zero-width joiner
U+2010 Punctuation PLACEHOLDER null ‐ Hyphen
U+2011 Punctuation PLACEHOLDER null ‑ No-break hyphen
U+2012 Punctuation PLACEHOLDER null ‒ Figure dash
U+2013 Punctuation PLACEHOLDER null – En dash
U+2014 Punctuation PLACEHOLDER null — Em dash
U+25CC Symbol DOTTED_CIRCLE null ◌ Dotted circle