Skip to content

Latest commit

 

History

History
73 lines (55 loc) · 3.09 KB

Unicode.md

File metadata and controls

73 lines (55 loc) · 3.09 KB

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

Many common characters, including numerals, punctuation, and other symbols, are unified within the standard and are not treated as specific to any given writing system.

Unicode encodes thousands of emoji, with the continued development thereof conducted by the Consortium as a part of the standard. Moreover, the widespread adoption of Unicode was in large part responsible for the initial popularization of emoji outside of Japan. Unicode is ultimately capable of encoding more than 1.1 million characters (ca. 2^20 but more complex).

Unicode has largely supplanted the previous environment of myriad incompatible character sets, each used within different locales and on different computer architectures. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and Unicode support has become a common consideration in contemporary software development.

The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code identical with one another.

However, The Unicode Standard is more than just a repertoire within which characters are assigned. To aid developers and designers, the standard also provides charts and reference data, as well as annexes explaining concepts like scripts, providing guidance for their implementation.

Topics covered by these annexes include character normalization, character composition and decomposition, collation, and directionality.

Unicode text is processed and stored as binary data using one of several encodings, which define how to translate the standard's abstracted codes for characters into sequences of bytes.

The Unicode Standard itself defines three encodings: UTF-8, UTF-16, and UTF-32, though several others exist. Of these, UTF-8 is the most widely used by a large margin, due to its backwards-compatibility with ASCII and its space-conservation for [[Language/Lang~Family/LangFamily-Indo-European/LangFamily-Germanic/Lang-en|English]] Text and [[Base64]] . Before that [[Latin1]] was the most common [[Encoding]]

Wikipedia

[[Unicode]]

Finding a Unicode Character

https://shapecatcher.com/ is very useful: Draw the Character and find similar ones.

Alternatively, each Unicode Character has a textual Description that can be searched.

All Contents

type: folder_brief_live

Confidential Links & Embeds: