Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

A project

Name: Anonymous 2014-04-19 21:23

Get the URLs of all 4chan.org textboards (dis.) and scrap and archive them. Use archive.org

Name: Anonymous 2014-04-19 21:44

I have a fair amount of data that was collected last august. We can split up the boards and collect them using a well tested script. Preferrably using the json api to reduce the burden on their servers. Also, do it slowly.

I have copies of threads from newpol, newnew, lang, lounge, and sjis in html

Where is the json sync bot again?

Name: Anonymous 2014-04-19 23:30

Name: Anonymous 2014-04-20 1:28

>>2
Why are you concerned about straining joot's servers? A single popular /b/ thread probably exceeds the bandwidth of all of w4c a hundred times over.

Name: Anonymous 2014-04-20 3:25

>>4
Downloading all of /prog/ raw is 1.5 GB, so it's kind of a lot. I just want to be polite.

Name: Anonymous 2014-04-20 7:26

/lounge/ has been archived. Scraping went much more smoothly. There was only one instance of malformed html. /prog/ was infested with it. Either someone pwned world4ch and was inserting raw data into the database, or someone with access to it was messing around. There are poster dates from 1969. Stuff like that is hard to explain otherwise.

https://archive.org/details/lounge-20140420.db.xz

Name: Anonymous 2014-04-20 13:05

>>6
Those dates are from the \n in the name field bug, could be other poasting bugs too.

Name: Anonymous 2014-04-21 4:33

>>7
How strange

does anyone remember the screens that would appear after making a post? We should compile a list of them before we forget what they were.

Name: Anonymous 2014-04-21 4:43

>>8
Weren't there only two? The VIP poem, and the `here comes the chopper' one.

Name: Anonymous 2014-04-21 9:49

>>9
That's what I remember.

/sjis/, /lang/, and /newnew/ are now on archive.org

https://archive.org/details/sjis-20140420.db.xz
https://archive.org/details/lang-20140420.db.xz
https://archive.org/details/newnew-20140420.db.xz

The remaining boards will take much longer. I'm partially through /newpol/ now.

If you'd to change any of the metadata for any of these items make a request here.

Name: Anonymous 2014-04-21 10:02

>>10
Don't forget /vip/

Name: Anonymous 2014-04-21 10:09

>>11
that'll be next.

Name: Anonymous 2014-04-21 10:56

>>12
thank you!

Name: Anonymous 2014-04-21 11:19

>>10
thanks

Name: Anonymous 2014-04-22 11:56

Name: Anonymous 2014-04-22 14:35

Are you archiving /anime/ and /food/?

Name: Anonymous 2014-04-22 16:43

After you are done, feel free to post a list of all archives in this thread and I will include them in https://progrider.org/files/archives/

Name: Anonymous 2014-04-22 19:45

>>16
yes, all of them eventually. It will take some time.

Name: Anonymous 2014-04-22 20:33

Name: Anonymous 2014-04-23 8:08

/anime/ has been uploaded.

https://archive.org/details/anime-20140420.db.xz

/food/ will be next.

>>17
Ok. Everything uploaded so far has been listed here.

Which one of you spammed /anime/?

https://dis.4chan.org/read/anime/1359200161

Name: Anonymous 2014-04-23 15:38

>>20
That doesn't look like /prog/lodyte spam, that look like actual vengeful spam. Maybe the guy who thought Admin-kin was FN-kin and spammed here.

Name: Anonymous 2014-04-23 18:51

By the way, if you guys find any really good old threads, it might be a fun idea to replay them in /lounge/ (with proper introductions, so that people don't get confused). These are like scripts of old plays, and it would be nice to put on a performance once in a while.

Name: Anonymous 2014-04-23 19:54

I farted and no-one liked the smell

Name: Anonymous 2014-04-24 1:12

>>22
What is the time complexity of a PCRE?

Name: Anonymous 2014-04-24 5:30

>>24
exponential in the worst case. What expression are you considering?

Name: Anonymous 2014-04-24 7:12

scrape, not scrap! One letter makes a huge difference...

Name: L33tUK 2014-04-24 8:56

I hope it was a /prog/rider who hacked 4chan recently as retaliation against this.

Name: Anonymous 2014-04-24 10:57

>>27
From what I've gathered, an imageredditor got obsessed with some random slut and hacked 4chan for that reason. That doesn't seem like something a true /frog/anus would do.

You're a milkribs!

Name: Anonymous 2014-04-24 11:00

>>24
O(anger)

Name: Anonymous 2014-04-24 13:21

>>28
What if that slut was Leah?

Name: Anonymous 2014-04-25 3:58

>>27
If I had access to 4chan I would have done something more interesting than brag and leak ip addresses. Still, it was fun to see moot likes cocks from the account he uses to admin post on 4chan attention whore from a mass of moot worshippers. Inserting obfuscated code to world4ch to continue working but appear frozen from ips used by staff would have been nice. But again, this just continues reliance on a site operated by people who have no respect for us.

Name: Anonymous 2014-04-25 20:20

>>31
The one who gained access (or one pretending to be him) has stated that his access was limited to a few actions, and that he did as much as he figured out how to do. He didn't even have access to board-creation.

Name: Anonymous 2014-04-26 0:22

>>32
He should have been more patient. If you get stuck you don't just stop and blow everything. You wait until you observe more helpful information.

Name: Anonymous 2014-05-04 21:20

Name: Anonymous 2014-05-04 22:15

>>34
thank you so much, kind anon

Name: Anonymous 2014-05-04 22:40

>>35
No problem. I'm glad it's over. It took a long time. I stopped posting updates after the subject.txt page went down temporarily. I was afraid a fagshit working for moot for free might have found this thread and was fucking with us, so I thought it was better to provide the illusion that the project was abandoned until it was completed. Now if world4ch is taken down in spite we still have the data, so it doesn't matter. We (someone) can even host a replacement and enable posting. Are there any cool free web hosts that allow a db that's circa 2-3 GB?

Name: Anonymous 2014-05-04 23:05

>>36
Sadly, Heliohost doesn't, and any paid SOLUTION will probably cost you more than $10/mo.

Is anyone willing to continue world3ch on their VPS or seedbox? Keep in mind you're hosting fresh spammer bait.

Name: Anonymous 2014-05-04 23:07

>>37
heliohost could work for each board with multiple fake accounts. Though I would feel guilty for exploiting them like that.

Name: Anonymous 2014-05-04 23:09

>>38
It's pretty hard to create accounts on Heliohost. They seem to have previous attempt at this and limited the maximum number of new accounts per day to 2x10-5.

Name: Anonymous 2014-05-04 23:12

>>39
Just create a new account right after midnight pacific time a couple days in a row.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List