-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full basic multilingual plane unifont #2535
Comments
There is an external project which did this job: https://github.com/stgiga/UnifontEX/blob/main/UnifontExMonoU8G2.c In general you can create u8g2 font files by yourself with "bdfconv.exe": Lines 255 to 291 in 4b17158
|
That solved my problem. But in my opinion there should be a pre-made full basic multilingual plane font. I guess I'll change my issue to that. |
I guess most embedded systems don't have the memory for this |
ESP32 development boards typically have 4MB of flash. The whole basic multilingual plane takes up 2034754 bytes. |
Pruning combining characters and other non-characters seems like a good way to save a little bit of space, since they can't be rendered by U8g2 anyway. The original character will still be visible without diacritics or whatever. Here's what I've pruned so far:
These may also be pruned (in whole or parts)
|
so, what had been the remaining size of the unifont then? |
Hello, UnifontEX developer here: There ARE Arduinos with 8MiB of RAM, like the Portenta H7. Also I had to use a specific version of bdfconv. Converting everything was done so that even song titles with emoji (which DO exist) can be displayed. You're welcome to compile it yourself as you need, I just did everything for the sake of completeness. Also, I targeted MANY more formats than regular Unifont does, in fact even this library, as well as its siblings. I really do want Unicode dot-matrix LCDs/VFDs/OLEDs. I just figured I'd chime in here. |
I tried to preserve combining characters that are large and to the side of characters. I wrote a little JS to generate the ranges. const exclusions = [
[0x0300, 0x036F], // Combining Diacritical Marks
[0x1AB0, 0x1AFF], // Combining Diacritical Marks Extended
[0x1CD0, 0x1CD2], // Vedic Extensions
[0x1CD4, 0x1CE8], // Vedic Extensions
[0x1CED, 0x1CED], // Vedic Extensions
[0x1CD4, 0x1CF4], // Vedic Extensions
[0x1CF8, 0x1CF9], // Vedic Extensions
[0x1CFB, 0x1CFF], // Vedic Extensions
[0x1DC0, 0x1DFF], // Combining Diacritical Marks Supplement
[0x20D0, 0x20FF], // Combining Diacritical Marks for Symbols
[0x2DE0, 0x2DFF], // Cyrillic Extended-A
[0xA802, 0xA802], // Syloti Nagri
[0xA806, 0xA806], // Syloti Nagri
[0xA80B, 0xA80B], // Syloti Nagri
[0xA825, 0xA826], // Syloti Nagri
[0xA82C, 0xA82F], // Syloti Nagri
[0xA8B6, 0xA8B6], // Saurashtra
[0xA8C4, 0xA8CD], // Saurashtra
[0xA8DA, 0xA8DF], // Saurashtra
[0xA8E0, 0xA8F1], // Devanagari Extended
[0xA8FF, 0xA8FF], // Devanagari Extended
[0xA926, 0xA92D], // Kayah Li
[0xA947, 0xA95E], // Rejang
[0xA980, 0xA982], // Javanese
[0xA9B3, 0xA9B3], // Javanese
[0xA9B6, 0xA9B9], // Javanese
[0xA9BC, 0xA9BD], // Javanese
[0xA9CE, 0xA9CE], // Javanese
[0xA9DA, 0xA9DD], // Javanese
[0xA9E5, 0xA9E5], // Myanmar Extended-B
[0xA9FF, 0xA9FF], // Myanmar Extended-B
[0xAA28, 0xAA2E], // Cham
[0xAA31, 0xAA32], // Cham
[0xAA35, 0xAA3F], // Cham
[0xAA43, 0xAA43], // Cham
[0xAA4C, 0xAA4C], // Cham
[0xAA4E, 0xAA4F], // Cham
[0xAA5A, 0xAA5B], // Cham
[0xAA7C, 0xAA7C], // Myanmar Extended-A
[0xAAB0, 0xAAB0], // Tai Viet
[0xAAB2, 0xAAB4], // Tai Viet
[0xAAB7, 0xAAB8], // Tai Viet
[0xAABE, 0xAABF], // Tai Viet
[0xAAC1, 0xAAC1], // Tai Viet
[0xAAC3, 0xAADA], // Tai Viet
[0xAAEC, 0xAAED], // Meetei Mayek Extensions
[0xAAF6, 0xAAFF], // Meetei Mayek Extensions
[0xABE5, 0xABE5], // Meetei Mayek
[0xABE8, 0xABEA], // Meetei Mayek
[0xABED, 0xABEF], // Meetei Mayek
[0xABFA, 0xABFF], // Meetei Mayek
[0xD800, 0xDB7F], // High Surrogate
[0xDB80, 0xDBFF], // High Private Use Surrogates
[0xDC00, 0xDFFF], // Low Surrogates
[0xE000, 0xF8FF], // Private Use Area
[0xFE20, 0xFE2F] // Combining Half Marks
];
exclusions.sort((a, b) => a[0] - b[0]);
console.log('Sorted exclusion ranges:');
console.log(exclusions);
const merged = [exclusions[0]];
for(let i = 1; i < exclusions.length; i++){
const lastRange = merged[merged.length - 1];
const currentRange = exclusions[i];
if(currentRange[0] <= lastRange[1] + 1){
lastRange[1] = Math.max(lastRange[1], currentRange[1]);
}else{
merged.push(currentRange);
}
}
console.log('Merged exclusion ranges:');
console.log(merged);
let range = '0-';
for(let exclusion of merged){
range += String(exclusion[0] - 1) + ',' + String(exclusion[1] + 1) + '-';
}
range += '65535';
console.log('bfdconv ranges:');
console.log(range); This got it down to 2023516 bytes. Honestly removing combining characters isn't worth saving the space more than it is to fix font rendering by ignoring them. |
UnifontEX has the SMP in it, and the way it fits it under 65535 characters (the base versions used is a factor too) is by removing ALL black hex box placeholders, which allows Plane 1 to fit. |
Also the LVGL version of UnifontEX is 2MiB. |
Plane 1. Oh and UnifontEX also has some Plane 2 and Plane 3 Han characters (what Westerners would call Chinese characters, and what Japanese users would call Kanji.) Most emoji live in Plane 1. Most "Fancy Text" (as the West calls it) lives in Plane 1. Musical notation lives in Plane 1. |
Can UnifontEX be used in u8g2? |
Yes, and I've made a version for it, though it's 6MiB, so it effectively requires an Arduino Portenta H7. But I had converted the whole font. It's the C file that's 6MiB, so the compiled version should be easier: |
I'd love to see what the finished music player looks like. |
This is just a proof of concept UI. poc.mp4 |
UnifontEX actually supports Also I'm loving what you have, it looks so cool! It reminds me of a car music display. What you've made so far is very beautiful. |
Honestly, this is exactly one of the intended use cases. |
Probably this is known, but one limitation in u8g2 is, that only base plane (plane 0) is supported as of now. |
@stgiga How big would UnifontEX be if only the BMP is included? |
It generated fine lol. Also, most characters are in the BMP so the savings just ain't there. |
I specifically used this converter: This specific build did NOT give assert errors when trying to do the entire font, yet RLE still worked. It seems that the other versions of bdfconv have trouble during the RLE step when dealing with the whole font unabridged. If you open the UnifontEX U8G2 C file I provide in my UnifontEX repo, it says that it converted ALL 65414 characters in the BDF, the Plane 1, 2, 3, and 14 stuff included. So U8G2's format supports stuff above Plane 0, but if olikraus is correct, the actual library won't display any of it. The Arduino Portenta H7 is an Arduino with 8MiB of RAM and 16MiB of flash memory, but it's still an Arduino. Now, the ONLY thing that has a chance at running the Adafruit_GFX version is the Portenta X8, which is more-or-less a Raspberry Pi and Arduino fusion (it can run Linux). Is that even an Arduino anymore? At least the Portenta H7 is a more-conventional Arduino, but it just has a LOT more memory. And yes, I checked to make sure the display libraries I target support it. Basically, U8G2 UnifontEX can work if olikraus enables stuff above Plane 0, and if the Arduino you use is a Portenta H7 or Portenta X8, assuming you don't do anything over a Raspberry Pi GPIO. Also bdfconv outputs UCGLIB, and UnifontEX exported as THAT will also fit in a Portenta H7's RAM, but with a lot less breathing-room than the U8G2 version. The LVGL version may run on a non-Pro (Portentas are Arduino's pro line) Arduino since it's only 2MiB when compiled. The Adafruit_GFX version is in the C Out of the four display libraries I support (U8G2, UCGLIB, LVGL, and Adafruit_GFX), the most ideal one is the LVGL version. Keep in mind that different libraries support different displays. For the people who think even an Arduino with 2MiB of RAM (LVGL) is too much to put in your project, there is a fifth way of LCD usage (other than the BDF), and that is using the TTF2PNG version in a character generator IC. You know those ER3301 font ICs you can buy, well, UnifontEX flashed to a 1MiB SPI flash chip like you can buy from Microchip Technology would be the same package but have MANY more characters available, and I'd outright just buy and flash a bunch and then make them available somewhere as a new font IC that supports pretty much the vast majority of Unicode. The circuitry to display its contents would be up to you, but would likely involve a DEFLATE decoder IC too. Nothing too wild though. Basically, if you don't like the overhead of using an Arduino but you want a dot-matrix Unicode LCD/VFD/OLED, then there are options. If you want a VFD, I should mention that the VFD (and to a lesser extent other technologies) company Noritake makes VFDs (getting fancy with their other technologies takes a bit more convincing) that you can bake in a 16x16 font of your choice into the firmware of, AND you can customize the driver circuitry AND there is no minimum order quantity, so for 3 years I've wanted to order a VFD from Noritake that has UnifontEX as the display font, no extra hardware required, and I'd get it in that beautiful green glow. Unfortunately finding a non-VFD analogue to this was not successful because all the character LCD and character OLED people are still obsessed with 5x7, which just ain't enough. Let's just say that I'm all for more-or-less legally obsoleting said 5x7 text-only displays in favor of ones that use UnifontEX for the purposes of better language support. And yes, Noritake provides Arduino stuff for their displays. The best base display of theirs you could use would be this one https://www.noritake-elec.com/products/model?part=GU256X128D-D903M which even has touch support. I wish I had the funds to actually do any of this. |
U8g2 only supports 16 bit unsigned integers as glyph encoding, so the u8g2 font format is limited to 65536 glyphs. |
But there is only 65417 glyphs. Also according to the files generated by the program, The reason why UnifontEX's Plane 1+14 is Unifont 11.0.01 Upper when Plane 0 is a modern version is because of the 65535 limit of pre-2022-HarfBuzz TrueType. You can merge Unifont 11.0.01 with Unifont 11.0.01 Upper fine, but Unifont 11.0.02 Upper or higher won't work with even Unifont 11.0.01 as base, and then Unifont kept adding to Plane 1 long after 2018. Meanwhile Plane 0 additions were more-gradual, to the point where using the final TrueType build for Plane 0 (Unifont-JP 15.0.06) possible. If I had been able to compile 15.1.01-JP as TrueType that year, that would have worked and it would have been the final, but it has the side effect of making the Hangul worse (15.0.05 and 15.0.06 both use Galmuri Gothic Hangul which are inspired by DS Korean fonts. I thus like it better than the 15.1+ Hangul, which looks a bit derpy.) So using Unifont-JP 15.1.01 as a base would have broken any text art with certain symbols, which is bad because UnifontEX is in part intended for Unicode art. Adding the 5 new Ideographic Description Characters (Yes, only 5 glyphs between those versions) without re-basing ended up being just barely possible because 15.1.01-JP as a TrueType made FontForge unstable and persistently so. The resulting font, after VDMX addition and WOFF1 plus Zopfli had an unfortunate side effect of breaking an Easter egg in that version, something easy to do. Basically, UnifontEX's basis of Unifont 15.0.06-JP+11.0.01 Upper just works, especially for text art. Meanwhile Unifont upstream has been narrowing quite a few characters as of 16.0.02. Unstable widths may explain why Unifont is less-used for text art. But I never changed any glyph shapes, AND the only bump possible (15.1.01) specifically breaks text art that actually exists in the wild by changing the widths of certain symbols. I'd be part of the problem if I went that path. I opt for stability here. Anyways, UnifontEX at 65417 glyphs fits in the max glyphs of pre-2022-HarfBuzz TrueType. HarfBuzz got fed up with the 65535 limit and via clever tricks got TrueType to support much more glyphs as well as cubic outlines in the glyf table. Via these, I can graft upstream Unifont onto UnifontEX wherever something isn't present, without preventing older apps from seeing the font. UnifontEX2 is this idea, but the software needed to realize it doesn't exist. Since WOFF and this may be dicey, we aren't considering webfonts and the Easter egg thus isn't a factor. Anything not using 2022+ HarfBuzz sees UnifontEX's 65417 glyphs plus the 5 extra characters. Anything supporting those sees every character in upstream Unifont plus the CSUR and UCSUR glyphs in Unifont CSUR. Also the reason why 15.1 usage cannot go above 15.1.01 is because 15.1.02 drastically increased Plane 2 and Plane 3 from Unifont-JP's 303 Plane 2 Kanji and by late Unifont 15, the Biang and Taito Han characters (the Biang characters got redrawn in Unifont 16, not that I can implement those given the problem with overwriting. Heck, overwriting Plane 1 would break U+1F72C horribly). I only have 122 slots left at max (not counting the 3 slots mentioned earlier, or the 5 Ideographic Description Characters in 15.1.01), and 15.1.02 added way more Han characters than 122. So 15.1.01 is the mathematical limit, but 15.0.06 is A: easier to utilize due to not needing to manually compile, B: safe on text art, C: the version with Hangul that Korean gamers would be nostalgic for I feel like Unifont 11.0.02 Upper is where Plane 1 started looking bad (the late additions to Unifont 11 were where 16x16 bit worst), and 15.1.01 is when Plane 0 started looking bad, due to the symbol width change and going for an entirely-new and less-iconic Hangul set. 15.0.06-JP + 11.0.01 Upper looks the best and still covers everything. Since 15.1 only added 5 characters that are more-or-less control characters, and that I didn't need them for making an Ideographic Description Sequence for a 533-stroke Han character, and that Unicode 16 was released in September 2024, UnifontEX is 2024 Plane 0 and 2018 Plane 1, AKA an amazing upgrade for older devices, and an excellent LCD font. My 2013 Mac is a lot happier, as is a 2015 phone from an acquaintance. Compiling Unifont barely works at all. UnifontEX2 would also be offered in BDF and SVG because they don't have a max glyph count. Basically, UnifontEX is under 65,535 glyphs and locked at its glyph count of 65417 to not break a WOFF1 webfont feature, but its successor won't be. Oh and it will have a special vendor ID of Also, saying "UnifontEX is based on Unifont-JP 15.0.06 and 11.0.01 Upper and some 15.1.01 glyphs" just doesn't sound right, compared to "UnifontEX is based on Unifont-JP 15.0.06 and 11.0.01 Upper", which honestly sounds more polished/professional, not like I had pulled glyphs willy-nilly. Not to mention that 15.1.01 changed the characters neighboring the new characters to match, so it would look wrong and out-of-place anyways. Not something I'm eager to do. |
I spent the better part of today trying to replicate the TrueType and WOFF1 alignment (the latter important for the Easter egg) for the 15.1.01 additions, and everything just went badly as it did when I originally attempted this. Adding just enough characters to make the TrueType hit a multiple of 16 didn't work. Vendor ID changes didn't do anything. Doing both didn't do anything. Whatever I did on February 2nd, 2024 was somehow extremely lucky, and extremely sensitive to changes. I mean, 16byte alignment in general has uses, but for WOFF1 it's essential so I can fit my Easter egg in a way that can be extracted. Whatever I did, I got extremely lucky. The Easter egg is important because it utilizes a rare feature of WOFF, and utilizing rare features is one of the ways in which I go more all-in than upstream Unifont. Heck, even the display ports fit the bill. And then I got the I don't know what to tell you. Your format is better than you think it is. |
Ok, I tried to understand the above statements, but I guess my knowledge on Unifont is probably limited. Anyhow, I understand your request is to add full plane 0 into the u8g2 distribution. Is there any other request, what I should consider? |
And that's what I did, I just merged Unifont and Unifont Upper prior to converting to your format, and the result fits into your program's maximum of 65535 characters. @olikraus |
Yeah just that, minus combining characters since they aren't supported. |
Oh and just so you know, UnifontEX now supports Unicode 15.1. |
My project involves displaying song titles and artists, and I'm looking for a font that has the most coverage. The most important characters for me are characters with diacritics, Japanese, and Cyrillic. The closest ones I've found are:
Unifont, which doesn't have one font with all the characters,
Efont, which doesn't have diacritics, and
Boutique, which is too small (should be 15/16px tall).
Would it be possible to combine all the Unifont fonts into one big font? A 500kB monstrosity wouldn't really be a problem for me.
The text was updated successfully, but these errors were encountered: