Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Here's a problem that came up recently

Name: Anonymous 2015-03-03 14:03

I have a list of 330 items, each of which is in one of three states (S0, S1, and S2). I want to communicate these states over a URL query string that's as short as possible. Most lists will have a lot of one or two states, and few of the rest. Also, the maximum length of these lists will grow in the future, and I don't want to invalidate old URLs when that happens (all missing items at the end of a given list are assumed to be S0).
I've taken three approaches to this problem at this point:

Initially I just used two bits per item and packed them three items to a base64 character (64 = 26). This gave me query strings of length 110, which was too long.

Then I noticed URL query strings allow 81 different characters (RFC3986; technically 82, but the percent sign is only legal for percent encoding), which is 34, so I could pack four items to each character. ⌈330 / 4⌉ = 83.

For my third attempt I used Huffman coding and run-length encoding. I assign codes 0, 10, and 11 based on frequency, encode my item list, and prepend a two-bit digit 0-2 to indicate which state is the most frequent and a 10-bit digit indicating the list length (ten bits so the two things together would take up two base64 characters; need at least 9 because ⌈log2(330)⌉ = 9). Then I base64-encode that and apply RLE to the resulting string.
(The list length is needed because I can't just zero-pad, as 0 might not correspond to S0, and if it's not, padding with the code for S0 won't ever get to a number of bits divisible by 6 in 60% of cases.)
The maximum length for this scheme is ⌈(2 + 10 + 110 × 1 + 110 × 2 + 110 × 2) / 6⌉ = 93, which is worse than the second scheme, but the average length is a lot lower, with most of them being below 50.

In every case I also trimmed S0 items from the end of the list, but because of how these lists are generated that ended up not making much of a difference.

Does anyone have any better suggestions? The fact that a lot of these lists have roughly equal amounts of two states is fucking with Huffman, and I feel I could do better there. I don't really want to use two different schemes depending on whether there's a lot of one state or two, but I guess I could.

For reference, here are a few representative lists: http://sprunge.us/HjZj

Name: Anonymous 2015-03-03 15:03

Someone doesn't know about URL shortening services.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List