Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Here's a problem that came up recently

Name: Anonymous 2015-03-03 14:03

I have a list of 330 items, each of which is in one of three states (S0, S1, and S2). I want to communicate these states over a URL query string that's as short as possible. Most lists will have a lot of one or two states, and few of the rest. Also, the maximum length of these lists will grow in the future, and I don't want to invalidate old URLs when that happens (all missing items at the end of a given list are assumed to be S0).
I've taken three approaches to this problem at this point:

Initially I just used two bits per item and packed them three items to a base64 character (64 = 26). This gave me query strings of length 110, which was too long.

Then I noticed URL query strings allow 81 different characters (RFC3986; technically 82, but the percent sign is only legal for percent encoding), which is 34, so I could pack four items to each character. ⌈330 / 4⌉ = 83.

For my third attempt I used Huffman coding and run-length encoding. I assign codes 0, 10, and 11 based on frequency, encode my item list, and prepend a two-bit digit 0-2 to indicate which state is the most frequent and a 10-bit digit indicating the list length (ten bits so the two things together would take up two base64 characters; need at least 9 because ⌈log2(330)⌉ = 9). Then I base64-encode that and apply RLE to the resulting string.
(The list length is needed because I can't just zero-pad, as 0 might not correspond to S0, and if it's not, padding with the code for S0 won't ever get to a number of bits divisible by 6 in 60% of cases.)
The maximum length for this scheme is ⌈(2 + 10 + 110 × 1 + 110 × 2 + 110 × 2) / 6⌉ = 93, which is worse than the second scheme, but the average length is a lot lower, with most of them being below 50.

In every case I also trimmed S0 items from the end of the list, but because of how these lists are generated that ended up not making much of a difference.

Does anyone have any better suggestions? The fact that a lot of these lists have roughly equal amounts of two states is fucking with Huffman, and I feel I could do better there. I don't really want to use two different schemes depending on whether there's a lot of one state or two, but I guess I could.

For reference, here are a few representative lists: http://sprunge.us/HjZj

Name: Anonymous 2015-03-03 14:18

URL query string
Have you considered POST requests?

Name: Anonymous 2015-03-03 14:23

>>2
The idea is that these are permalinks.
These lists are lists of collectable things that can be collected in two ways, and this script is meant to let people keep track of their collection and to show it off by passing around the URL.

Name: Anonymous 2015-03-03 14:28

>>3
if statespace isn't random
1.generate unique (hash)key from states
2.store in db+ states
3.when db hash request:select #2 states
330x3 states=1000 bits
http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers

Name: Anonymous 2015-03-03 14:43

>>4
A list of 330 items, each of which has three possible states, has 3330 possible states, not 330*3. Even if it were 330*3 and even if that somehow translated to 1000 bits of information (as opposed to log2(990) = 9.95), 1000 bits is 167 base64 characters, which is worse than any of the approaches in >>1.

Name: Anonymous 2015-03-03 14:48

>>5
what i mean you have 2048 chars for url and you want to save bits.
thats 16k bits

Name: Anonymous 2015-03-03 14:51

You could even encode 2K states as "1""2""3" IN ascii you autist.

Name: Anonymous 2015-03-03 15:01

>>6-7
These URLs are meant to be passed around in a character-limited chat system. It's important that they're as short as possible, and how long URLs are technically allowed to be isn't relevant.
I posted this because I thought it'd be an interesting problem for people who enjoy solving problems. Go back to scrubbing toilets if that's more your thing.

Name: Anonymous 2015-03-03 15:01

I think OP has the delusion everyone would type out the url from memory and shorter urls save something.
In fact any script processing of huffman/base64 over plain url would add time, giant long url + simple 3-line client side script to set the states will always win over this autistic bitsaving bullshit.

Name: Anonymous 2015-03-03 15:03

Someone doesn't know about URL shortening services.

Name: Anonymous 2015-03-03 15:04

>>8
Back to stackoverflow autist.

Name: Anonymous 2015-03-03 15:13

>>10
If the URLs are sent as frequently as they're accessed, and they change all the time, this is a good way to get banned from a shortening service. However, OP could set up his own "shortening service" that caches short URL names for previously encountered states. That, or just not pass around URLs to share state updates. I mean what the hell is he thinking?

Name: Anonymous 2015-03-03 15:15

>>12
Isn't this what was suggested in >>4 ?

Name: Anonymous 2015-03-03 15:17

>>11
If he had a good reputation on SO, he would not be here seeking tech support.

Name: Anonymous 2015-03-03 16:14

>>1
Store two lists of items numbers that have S1 and S2 state instead. Obviously you should sort each list first and store the successive increments, which you could expect to be mostly small. Use 0 to separate the lists. For actually encoding increments use Huffman of course, with either some sane precomputed dictionary or store the dictionary as well.

Furthermore you can use special increments (negative or offset by 1000, whatever, you're passing them through Huffman anyway) to store filled ranges of items instead, to accommodate people with mostly full collections.

Another angle would be to move to arithmetic encoding, you can do this both with your both scheme and with mine.

Instead of storing the full dictionary you can make a bunch of profiles (Huffman or arithmetic weights) and select between them, yes.

Name: >>1 2015-03-03 17:45

>>10
There are a few other people who provide the same service, and that's actually what one of them does: it generates a permalink like I do (a longer one than mine, on average) and then gives people links to URL-shortening services. That's just giving up, though.

>>15
There's a thought. I'll give that a shot and see if it actually outperforms what I have, but it sounds promising.
I did consider adaptive arithmetic coding instead of Huffman for my third scheme, but it's significantly more effort to implement than Huffman, and good libraries are hard to find.

Name: Anonymous 2015-03-03 23:43

Where are the users getting these urls from? Are you running a server? Why not provide your own equivalent to a url shortening service? If you store the state that is encoded in these urls on the server and then give the user a url that contains an id for that state. Now it doesn't matter how large the state is, except for server storage.

Don't change these.
Name: Email:
Entire Thread Thread List