Wow, the old /prog/ is completely dead right now. All threads on the front page are spam and capped at 1000 posts.
Name:
Anonymous2013-09-03 12:00
>>11 The trick he used is replying to post already at 1k, with Lain, and manually wrote the captcha. Whatever.
Name:
Anonymous2013-09-04 3:39
I made a new thread (`123456789') and it was deleted, so the mods definitely know. Question is, how did they fuck that up for new threads, but not for replies when logically they would use the same variable? Also, reporting works again, where as before it was giving a MySQL error for a while. moot must be trying to rewrite Shiitchan or something.
>>17 RedCream and the /lounge/ crew have tablecat. /vip/ probably won't be bothered too much. I don't know what the deal with /lang/ is, as I haven't even bothered to check up on them in years. /abbc/ is ('-`)b and doesn't seem concerned. /newnew/ and /newpol/ were filled with transients anyway I think, so they don't matter. /comp/ and /tech/ always belonged to /g/, so fuck 'em. /sjis/ might have a problem. The rest are unimportant, but I'm sure the SEO shitheads will miss the easy backlinks from 4chan.
By the way, we should probably get a scrape of the rest of the boards, in case worse comes to worse. I don't have very much bandwidth, so can I ask this as a favor please?
Name:
Anonymous2013-09-04 4:11
>>20 Yeah, I'm on it now. It could take a very long time. Hopefully I get it all before it is taken down.
>>23 I doubt it, but you never know. moot may decide if his incompetent code monkey can't add captcha to the text boards without breaking the thread submission box, then maybe they aren't worth the trouble.
Name:
Anonymous2013-09-04 5:19
>>25 I'll get the shitty small boards tomorrow, just in case they have anything interesting: /food/, /book/, /sports/, /anime/, /sci/, /music/, /carcom/, /img/, and /tele/. I doubt anyone else will want them, but I'll archive and upload them for anyone who does.
Name:
FrozenVoid2013-09-04 5:23
>>26 With a diverse userbase some might want to post on topics which aren't programming.(/sci/, /music/ and /anime/ for example)
Name:
Anonymous2013-09-04 5:26
>>27 Fuck off, retard. We aren't trying to build the next reddit here.
Name:
FrozenVoid2013-09-04 5:29
>>28 So you suggest that your /prog/ will be filled with off-topic posts? IF you want a good signal-to-noise ratio you have to segregate the content.
frozenvoid, you're being really annoying again. I'm happy to have you back, but go smoke some weed or something and come back later.
Name:
Anonymous2013-09-04 5:49
Isn't it possible to write a user script to find the new thread form and correct the action URL so it's not %FORUMURL% ? I would look into this, but I can't be bothered when the site is full of spammers and necrobumpers anyway...
>>30 I'm fine with adding a lounge board, but I won't add single interest boards, the site has just started and it's best not to fragment it right from the beginning.
>>33 That's cool. I'm still going to archive all of world4ch, compress it, and upload it somewhere so those left there can view something of their past.
>>37 No, it's a program/file format that implements the LZMA2 compression algorithm. Comes standard on most Linux distributions, and is far better than 7zip.
>>26 /sports/ has some very autistic dedicated users.
Name:
Anonymous2013-09-06 3:42
I'm done scraping lounge. That took a while. I don't know why though, it was only 591MB of data uncompressed. Maybe they added a throttle or something.
>>44 I got that too while scraping /sports/; it wrecked the whole scrape. I upped the delay to five seconds and everything goes through. I think it was cloudflare shenanigans.
Name:
Anonymous2013-09-06 4:48
Just curious, are you guys storing the boards as unmodified html? Or are you doing any non-destructive processing before archiving?
Name:
Anonymous2013-09-06 5:02
Reposting this script, because it was posted in a deleted thread. It just saves all threads under a board as files in a directory.
if [ -z "${num_threads}" ] then sed 's/[^<]*<>[^<]*<>[^<]*<>\([^<]*\)<>.*/\1/' < subject.txt > threads.tt else # head gets the top num_threads threads # sed extracts the thread number head -n ${num_threads} subject.txt \ | sed 's/[^<]*<>[^<]*<>[^<]*<>\([^<]*\)<>.*/\1/' \ > threads.tt fi
# check the validity of the thread number. It should be all digits. # this also protects against shell injection. grep '^[0-9]\+$' threads.tt > wellformed-threads.tt grep -v '^[0-9]\+$' threads.tt > non-wellformed-threads.tt
for thread in `cat wellformed-threads.tt` do echo "Downloading thread ${board}/${thread}:" wget https://dis.4chan.org/read/${board}/${thread} -O ${board}/${thread}.html if [ $? -ne 0 ] then echo "Error downloading thread ${board}/${thread}" | tee -a errors.tt fi sleep ${delay} done
if [ -s non-wellformed-threads.tt ] then echo "These threads could not be downloaded because the subject was messed up." cat non-wellformed-threads.tt echo status=1 fi
if [ -s errors.tt ] then echo "These threads could not be downloaded." cat errors.tt echo status=1 fi
>>51 Why? It's not like we hold a grudge against our own past. It could also backfire, you could become a non-ironical shitposter, always post there and never come back, which is something I (we?) wouldn't want to happen.
>>67 world4ch is pretty scary right now. Someone (or maybe a couple of people) are seriously just shitposting and manually spamming the board, constantly, and have been doing so for weeks now.
What kind of insanity do you have to fall into where that is what becomes of your life? Terrifying.