There are even code points like U+FDFD (๏ทฝ) which are often rendered multiple columns wide. In fact, in my monospace font in my text editor, that character is rendered almost 12 columns wide. Yes, โalmostโ, subsequent characters get offset a tiny bit. I donโt know why.
I do have this list of scripts I mentally check against whenever I'm reasoning about Unicode:
Arabic or Hebrew for RTL and beginning/medial/end forms (arabic also has "isolated" forms) Arabic for ligatureyness/glyph complexity Some Indic script for ligatureyness/glyph complexity, and massive use of combining characters, including the double-ended virama combiner. Infinite length combining sequences. Korean (Hangul) for the combining jamo system. Infinite length combining sequences (though these are never displayed beyond standard Korean syllable blocks, so it's less important) Han scripts for variation selectors, halfwidth/fullwidth, and language disambiguation troubles. Also omg so many glyphs. If dealing with displaying text, think of a Han script and Mongolian, which are written in different directions (vertical, sideways, etc) Thai or other scripts from that peninsula (not counting Vietnamese scripts), because they don't use spaces to break words. Emoji because despite the immense complexity of human language, Emoji still managed to get a bunch of special casing in various parts of the unicode spec. Infinite length combining sequences. Latin for locale-dependent case operations (Turkish i, German ร)
Very nice article that shows clearly that the problem is Unicode itself.
There shouldn't be any of this. No code points, no grapheme clusters, no extended grapheme clusters, no combining of characters etc.
All that should exist is an array of different symbols. Each symbol should be on its own. So you want, for example, 'ฯ'? Different character from 'o', not 'o' combined with '.
๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ I claim these dubz in the name of the United States of America ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ๐บ๐ธ
1. Overly specific. Is the proposed character overly specific? For example, ๐ฃ SUSHI represents sushi in general, although images frequently show a specific type, such as Maguro. Adding SABA, HAMACHI, SAKE, AMAEBI and others would be overly specific.
2. Open-ended. Is it just one of many, with no special reason to favor it over others of that type?
3. Already Representable. Can the concept be represented by another emoji or sequence?
ALL MENSES ARE PROGRAM WITH COMPUTER. SPECIFICALLY HASKELL. 21.
Name:
Anonymous2017-02-03 21:15
>>539 The three first factors for exclusion say the same thing, isn't that overly specific? And don't get me started with the rest because they forbid emoji all together.