Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

UTF-8LE - a more efficient, saner version of UTF-8

Name: Cudder !MhMRSATORI 2014-08-29 12:03

I just realised that UTF-8 is stupidly defined as big-endian!

U+20AC = 0010 000010 101100
In UTF-8 -> 11100010 10000010 10101100

...Meaning that to convert a codepoint into a series of bytes you have to shift the value before anding/adding the offset, viz.

b0 = (n >> 12) + 0xe0;
b1 = ((n >> 6) & 63) + 0x80;
b2 = (n & 63) + 0x80;


Just looking at the expression it doesn't seem so bad, but shifting right before means throwing away perfectly good bits in a register! The worst thing is, the bits thrown away are exactly the ones needed in the upcoming computations, so you have to needlessly waste storage to preserve the entire value of the codepoint throughout the computation. Observe:

push eax
shr eax, 12
add al, 224
stosb
pop eax
push eax
shr eax, 6
and al, 63
add al, 128
stosb
pop eax
and al, 63
add al, 128
stosb


14 instructions, 23 bytes. Not so bad, but what if we stored the pieces the other way around, i.e. "UTF-8LE"?

U+20AC = 001000 001010 1100
In UTF-8LE -> 11101100 10001010 10001000

b0 = (n&15) + 224;
b1 = ((n>>6)&63) + 128;
b2 = (n>>12) + 128;


Observe that each time bits are picked off n, the next step's shift removes them, so there is no need to keep around another copy of n (including bits that wouldn't be used anymore).

shl eax, 4
shr al, 4
add al, 224
stosb
mov al, ah
and al, 63
add al, 128
stosb
shr eax, 14
add al, 128
stosb


11 instructions, 22 bytes. Clearly superior.

The UTF-8 BOM is EF BB BF; the UTF-8LE BOM similarly will be EF AF BF.

Name: Cudder !MhMRSATORI 2014-08-31 3:55

>>25
Idle loops need optimisation too! Putting hlt in a loop is acceptable but for lowest idle power consumption you'd ideally want to lower the clock frequencies and voltages too (and put them back up at the right time.) Especially for mobile devices this is extremely important, but even on a desktop where an idle core is a few watts but one spinning in a tight loop takes more than 10x power, the savings are significant. I believe this is automatic (or at least accomplished via SMI) on x86, but on ARM SoCs it must be done manually:

http://processors.wiki.ti.com/index.php/AM335x_Linux_Power_Management_User_Guide

https://blogs.oracle.com/bholler/entry/the_most_executed_code_in

And saying that UTF8 is "fast enough" misses the whole point: every little bit of time saved running that code is potentially more time the CPU can go into low-power mode. You're like those idiots who thought petrol was "cheap enough" - it is until it isn't, and then you have a huge problem because by then the old inefficient standard has already become so widely adopted.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List