Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Here's a problem that came up recently

Name: Anonymous 2015-03-03 14:03

I have a list of 330 items, each of which is in one of three states (S0, S1, and S2). I want to communicate these states over a URL query string that's as short as possible. Most lists will have a lot of one or two states, and few of the rest. Also, the maximum length of these lists will grow in the future, and I don't want to invalidate old URLs when that happens (all missing items at the end of a given list are assumed to be S0).
I've taken three approaches to this problem at this point:

Initially I just used two bits per item and packed them three items to a base64 character (64 = 26). This gave me query strings of length 110, which was too long.

Then I noticed URL query strings allow 81 different characters (RFC3986; technically 82, but the percent sign is only legal for percent encoding), which is 34, so I could pack four items to each character. ⌈330 / 4⌉ = 83.

For my third attempt I used Huffman coding and run-length encoding. I assign codes 0, 10, and 11 based on frequency, encode my item list, and prepend a two-bit digit 0-2 to indicate which state is the most frequent and a 10-bit digit indicating the list length (ten bits so the two things together would take up two base64 characters; need at least 9 because ⌈log2(330)⌉ = 9). Then I base64-encode that and apply RLE to the resulting string.
(The list length is needed because I can't just zero-pad, as 0 might not correspond to S0, and if it's not, padding with the code for S0 won't ever get to a number of bits divisible by 6 in 60% of cases.)
The maximum length for this scheme is ⌈(2 + 10 + 110 × 1 + 110 × 2 + 110 × 2) / 6⌉ = 93, which is worse than the second scheme, but the average length is a lot lower, with most of them being below 50.

In every case I also trimmed S0 items from the end of the list, but because of how these lists are generated that ended up not making much of a difference.

Does anyone have any better suggestions? The fact that a lot of these lists have roughly equal amounts of two states is fucking with Huffman, and I feel I could do better there. I don't really want to use two different schemes depending on whether there's a lot of one state or two, but I guess I could.

For reference, here are a few representative lists: http://sprunge.us/HjZj

Name: Anonymous 2015-03-03 14:23

>>2
The idea is that these are permalinks.
These lists are lists of collectable things that can be collected in two ways, and this script is meant to let people keep track of their collection and to show it off by passing around the URL.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List