This document lists the per-character shaping information needed to shape Thai text.
Table of Contents
Thai glyphs should be classified as in the following table. Codepoints in the Thai block with no assigned meaning are designated as unassigned in the Unicode category column.
Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.
Note: the
NUMBER
andSYMBOL
Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.
The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.
Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Combining class | PUA | Glyph |
---|---|---|---|---|---|---|
U+0E00 |
unassigned | |||||
U+0E01 |
Letter | CONSONANT | null | 0 | NC | ก Ko Kai |
U+0E02 |
Letter | CONSONANT | null | 0 | NC | ข Kho Khai |
U+0E03 |
Letter | CONSONANT | null | 0 | NC | ฃ Kho Khuat |
U+0E04 |
Letter | CONSONANT | null | 0 | NC | ค Kho Khwai |
U+0E05 |
Letter | CONSONANT | null | 0 | NC | ฅ Kho Khon |
U+0E06 |
Letter | CONSONANT | null | 0 | NC | ฆ Kho Rakhang |
U+0E07 |
Letter | CONSONANT | null | 0 | NC | ง Ngo Ngu |
U+0E08 |
Letter | CONSONANT | null | 0 | NC | จ Cho Chan |
U+0E09 |
Letter | CONSONANT | null | 0 | NC | ฉ Cho Ching |
U+0E0A |
Letter | CONSONANT | null | 0 | NC | ช Cho Chang |
U+0E0B |
Letter | CONSONANT | null | 0 | NC | ซ So So |
U+0E0C |
Letter | CONSONANT | null | 0 | NC | ฌ Cho Choe |
U+0E0D |
Letter | CONSONANT | null | 0 | RC | ญ Yo Ying |
U+0E0E |
Letter | CONSONANT | null | 0 | DC | ฎ Do Chada |
U+0E0F |
Letter | CONSONANT | null | 0 | DC | ฏ To Patak |
U+0E10 |
Letter | CONSONANT | null | 0 | RC | ฐ Tho Than |
U+0E11 |
Letter | CONSONANT | null | 0 | NC | ฑ Tho Nangmontho |
U+0E12 |
Letter | CONSONANT | null | 0 | NC | ฒ Tho Phuthao |
U+0E13 |
Letter | CONSONANT | null | 0 | NC | ณ No Nen |
U+0E14 |
Letter | CONSONANT | null | 0 | NC | ด Do Dek |
U+0E15 |
Letter | CONSONANT | null | 0 | NC | ต To Tao |
U+0E16 |
Letter | CONSONANT | null | 0 | NC | ถ Tho Thung |
U+0E17 |
Letter | CONSONANT | null | 0 | NC | ท Tho Thahan |
U+0E18 |
Letter | CONSONANT | null | 0 | NC | ธ Tho Thong |
U+0E19 |
Letter | CONSONANT | null | 0 | NC | น No Nu |
U+0E1A |
Letter | CONSONANT | null | 0 | NC | บ Bo Baimai |
U+0E1B |
Letter | CONSONANT | null | 0 | AC | ป Po Pla |
U+0E1C |
Letter | CONSONANT | null | 0 | NC | ผ Pho Phung |
U+0E1D |
Letter | CONSONANT | null | 0 | AC | ฝ Fo Fa |
U+0E1E |
Letter | CONSONANT | null | 0 | NC | พ Pho Phan |
U+0E1F |
Letter | CONSONANT | null | 0 | AC | ฟ Fo Fan |
U+0E20 |
Letter | CONSONANT | null | 0 | NC | ภ Pho Samphao |
U+0E21 |
Letter | CONSONANT | null | 0 | NC | ม Mo Ma |
U+0E22 |
Letter | CONSONANT | null | 0 | NC | ย Yo Yak |
U+0E23 |
Letter | CONSONANT | null | 0 | NC | ร Ro Rua |
U+0E24 |
Letter | CONSONANT | null | 0 | NC | ฤ Ru |
U+0E25 |
Letter | CONSONANT | null | 0 | NC | ล Lo Ling |
U+0E26 |
Letter | CONSONANT | null | 0 | NC | ฦ Lu |
U+0E27 |
Letter | CONSONANT | null | 0 | NC | ว Wo Waen |
U+0E28 |
Letter | CONSONANT | null | 0 | NC | ศ So Sala |
U+0E29 |
Letter | CONSONANT | null | 0 | NC | ษ So Rusi |
U+0E2A |
Letter | CONSONANT | null | 0 | NC | ส So Sua |
U+0E2B |
Letter | CONSONANT | null | 0 | NC | ห Ho Hip |
U+0E2C |
Letter | CONSONANT | null | 0 | NC | ฬ Lo Chula |
U+0E2D |
Letter | CONSONANT | null | 0 | NC | อ O Ang |
U+0E2E |
Letter | CONSONANT | null | 0 | NC | ฮ Ho Nokhuk |
U+0E2F |
Letter | CONSONANT | null | 0 | null | ฯ Paiyannoi |
U+0E30 |
Letter | VOWEL_DEPENDENT | RIGHT_POSITION | 0 | CV | ะ Sara A |
U+0E31 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | AV | ั Mai Han-akat |
U+0E32 |
Letter | VOWEL_DEPENDENT | RIGHT_POSITION | 0 | CV | า Sara Aa |
U+0E33 |
Letter | VOWEL_DEPENDENT | RIGHT_POSITION | 0 | null | ำ Sara Am |
U+0E34 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | AV | ิ Sara I |
U+0E35 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | AV | ี Sara Ii |
U+0E36 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | AV | ึ Sara Ue |
U+0E37 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | AV | ื Sara Uee |
U+0E38 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 3 | BV | ุ Sara U |
U+0E39 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 3 | BV | ู Sara Uu |
U+0E3A |
Mark [Mn] | PURE_KILLER | BOTTOM_POSITION | 9 | BV | ฺ Phinthu |
U+0E3B |
unassigned | |||||
U+0E3C |
unassigned | |||||
U+0E3D |
unassigned | |||||
U+0E3E |
unassigned | |||||
U+0E3F |
Symbol | SYMBOL | null | 0 | null | ฿ Currency symbol Baht |
U+0E40 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | CV | เ Sara E |
U+0E41 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | CV | แ Sara Ae |
U+0E42 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | CV | โ Sara O |
U+0E43 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | CV | ใ Sara Ai Maimuan |
U+0E44 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | CV | ไ Sara Ai Maimalai |
U+0E45 |
Letter | VOWEL_DEPENDENT | RIGHT_POSITION | 0 | CV | ๅ Lakkhangyao |
U+0E46 |
Letter Modifier | null | null | 0 | null | ๆ Maiyamok |
U+0E47 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | AV | ็ Maitaikhu |
U+0E48 |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ่ Mai Ek |
U+0E49 |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ้ Mai Tho |
U+0E4A |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ๊ Mai Tri |
U+0E4B |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 107 | TV | ๋ Mai Chattawa |
U+0E4C |
Mark [Mn] | CONSONANT_KILLER | TOP_POSITION | 0 | TV | ์ Thanthakhat |
U+0E4D |
Mark [Mn] | BINDU | TOP_POSITION | 0 | AV | ํ Nikhahit |
U+0E4E |
Mark [Mn] | PURE_KILLER | TOP_POSITION | 0 | AV | ๎ Yamakkan |
U+0E4F |
Punctuation | null | null | 0 | null | ๏ Fongman |
U+0E50 |
Number | NUMBER | null | 0 | null | ๐ Digit zero |
U+0E51 |
Number | NUMBER | null | 0 | null | ๑ Digit one |
U+0E52 |
Number | NUMBER | null | 0 | null | ๒ Digit two |
U+0E53 |
Number | NUMBER | null | 0 | null | ๓ Digit three |
U+0E54 |
Number | NUMBER | null | 0 | null | ๔ Digit four |
U+0E55 |
Number | NUMBER | null | 0 | null | ๕ Digit five |
U+0E56 |
Number | NUMBER | null | 0 | null | ๖ Digit six |
U+0E57 |
Number | NUMBER | null | 0 | null | ๗ Digit seven |
U+0E58 |
Number | NUMBER | null | 0 | null | ๘ Digit eight |
U+0E59 |
Number | NUMBER | null | 0 | null | ๙ Digit nine |
U+0E5A |
Punctuation | null | null | 0 | null | ๚ Angkhankhu |
U+0E5B |
Punctuation | null | null | 0 | null | ๛ Khomut |
U+0E5C |
unassigned | |||||
U+0E5D |
unassigned | |||||
U+0E5E |
unassigned | |||||
U+0E5F |
unassigned | |||||
U+0E60 |
unassigned | |||||
U+0E61 |
unassigned | |||||
U+0E62 |
unassigned | |||||
U+0E63 |
unassigned | |||||
U+0E64 |
unassigned | |||||
U+0E65 |
unassigned | |||||
U+0E66 |
unassigned | |||||
U+0E67 |
unassigned | |||||
U+0E68 |
unassigned | |||||
U+0E69 |
unassigned | |||||
U+0E6A |
unassigned | |||||
U+0E6B |
unassigned | |||||
U+0E6C |
unassigned | |||||
U+0E6D |
unassigned | |||||
U+0E6E |
unassigned | |||||
U+0E6F |
unassigned | |||||
U+0E70 |
unassigned | |||||
U+0E71 |
unassigned | |||||
U+0E72 |
unassigned | |||||
U+0E73 |
unassigned | |||||
U+0E74 |
unassigned | |||||
U+0E75 |
unassigned | |||||
U+0E76 |
unassigned | |||||
U+0E77 |
unassigned | |||||
U+0E78 |
unassigned | |||||
U+0E79 |
unassigned | |||||
U+0E7A |
unassigned | |||||
U+0E7B |
unassigned | |||||
U+0E7C |
unassigned | |||||
U+0E7D |
unassigned | |||||
U+0E7E |
unassigned | |||||
U+0E7F |
unassigned |
In addition to general punctuation, runs of Thai text often use the
combining macron below (U+0331
), combining tilde (U+0303
), modifier letter
apostrophe (U+02BC
), and modifier letter minus sign (U+02D7
), from the
Combining Diacritical Marks block, particularly when used to write minority
languages.
In addition, Thai text typically does not insert spaces between words.
Consequently, the Zero-Width Space (U+200B
) character is often used to insert
invisible break points that may be converted to line breaks.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+02BC |
Mark [Mn] | TONE_MARKER | TOP_POSITION | ʼ Modifier apostrophe |
U+02D7 |
Mark [Mn] | TONE_MARKER | BOTTOM_POSITION | ˗ Modifier minus sign |
U+0303 |
Mark [Mn] | TONE_MARKER | TOP_POSITION | ̃ Combining tilde |
U+0331 |
Mark [Mn] | TONE_MARKER | TOP_POSITION | ̱ Combining macron below |
U+200B |
Separator | PLACEHOLDER | null | Zero-width space |
Other important characters that may be encountered when shaping runs
of Thai text include the dotted-circle placeholder (U+25CC
), the
zero-width joiner (U+200D
) and zero-width non-joiner (U+200C
), and
the no-break space (U+00A0
).
The dotted-circle placeholder is frequently used when displaying a dependent vowel or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+00A0 |
Separator | PLACEHOLDER | null | No-break space |
U+200C |
Other | NON_JOINER | null | Zero-width non-joiner |
U+200D |
Other | JOINER | null | Zero-width joiner |
U+2010 |
Punctuation | PLACEHOLDER | null | ‐ Hyphen |
U+2011 |
Punctuation | PLACEHOLDER | null | ‑ No-break hyphen |
U+2012 |
Punctuation | PLACEHOLDER | null | ‒ Figure dash |
U+2013 |
Punctuation | PLACEHOLDER | null | – En dash |
U+2014 |
Punctuation | PLACEHOLDER | null | — Em dash |
U+25CC |
Symbol | DOTTED_CIRCLE | null | ◌ Dotted circle |