DistBB [PART I] got deleted (probably because my Tor used the same exit node as the spammer). I bear no resentment towards Admin for I understand how utterly shitty this board's software is.
To Admin: could you please not use the ``delete all posts by IP'' button or at least hack up some way to protect posts against such deletion? If you don't mind, I understand you're quite busy.
Here is an update on the moderation system:
SPECIFICATION
A moderation post MUST have the string "!!mod-v1" as its `email' field, and it MUST be digitally signed (see the `pk-data' field). The `body' MUST match the following grammar:
`post-id' MUST be the base64-encoded H160 of the post that is being tagged. Each `tag-index' is the zero-based index of the `tag-name' in `tag-line' that should be used to tag `post-id'.
The actual process for deciding whether to keep or delete a post is left up to the implementation.
EXAMPLE
Post "AAAA" is being tagged as "spam" and "worthless", post "BBBB" is being tagged as "spam" and "good", and post "CCCC" is being tagged as "worthless".
--begin post body spam worthless good AAAA 0 1 BBBB 0 2 CCCC 1 --end post body
SUGGESTED TAG NAMES
The purpose of the tags is to classify bad posts and annoyances to help people filter and control the content on their nodes. Dividing "A" and "A+" (good post) into "funny", "insightful" and so on is counterproductive since there is no reason why someone would want to configure their client to treat "funny" posts differently from "insightful" ones. The point of tags is also NOT to classify posts into topics.
Here are some suggested tag names along with their meanings:
"A+" : Post is very good, and should be kept at all costs. "A" : Post is good. "off" : Post is offtopic (in the case of strict thematic communities) or simply does not fit in, socially. This is very vague. "spam" : The post is part of a disruptive flood where the contents of posts are computer-generated (e.g. the content is always the same or is chosen from a list). "mspam" : The post is part of a disruptive flood where the contents of every post seems to be crafted manually by a human. "prng" : While the post is not part of a computer or human driven flood, its contents are unreadable and it seems to have come out of "/dev/random".
IMPLEMENTATION SUGGESTIONS
Here follows an example of moderation system that can be used to decide posts' fates.
Using a system of rules, each post is assigned an integer rating. The decision of whether to keep or delete a post can be done as follows:
if the post's rating is nonnegative: keep it else: if max(ratings of posts that reference it) >= abs(rating): keep it else: delete it
In table 1 we can see an example of this policy. highest rating of referencing posts
\ 3 2 1 0 -1 -2 -3 n/a \------------------------- 3 | k k k k k k k k | 2 | k k k k k k k k | 1 | k k k k k k k k | post rating 0 | k k k k k k k k | -1 | k k k D D D D D | -2 | k k D D D D D D | -3 | k D D D D D D D
Table 1. Deciding whether to keep or delete a post according to its rating and the ratings of the posts that reference it.
An example of system of rules used to determine posts' ratings could be the following. First see if any rule with the "force" modifier matches; if there is, then apply it directly. Otherwise, if any rule with positive rating matches, then the maximum matching rule wins; otherwise, the minimum matching rule wins. For example, if the rules that match a post have ratings {-3,-5,-6}, then the winning rating is -6. However, if the rules that match have ratings {1,2,-3,-5,-6}, then the winning rating is 2.
By default, posts are deleted after two weeks, in order to give the user a chance to review the deletion queue; "immediate" indicates that the posts should be deleted immediately.
--begin example %trusted self bob ken %acquaintances john uriel %all %trusted-users %acquaintances %annoying off mspam %delete spam prng %kill-list bertrand winston %A A A+
pk self 10 tag self %A 10 tag self %delete -11,immediate,force tag self %annoying -5,hide
pk %kill-list -5,immediate
pk %trusted 10
tag %trusted %A 5 tag %trusted %delete -5,immediate tag %trusted %annoying -2,hide
tag %acquaintances %A 3 tag %acquaintances %delete -3 tag %acquaintances %annoying -1,hide
tag john,ken particular-annoyance -4,hide,immediate tag %all some-annoyance-which-I-find-amusing 10 --end example
And by that I mean "decouple the reader and the postserver and keep a single post database".
Name:
polite bump2013-09-11 20:36
>>58 The point of decoupling the reader and the postserver is to allow multiple readers with different customizations and configurations (e.g. PK petnames, bookmarks, etc) efficient and direct access to the local postserver. Otherwise you can just use a web interface or an IRC gateway or a NNTP gateway or whatever one of you might come up with.
shame on you >>56-kun, you might want to read dadistbb standard
Oops, I'm >>56 and I think I made the most retarded post in history. Sorry guys.
Name:
Anonymous2013-09-12 22:11
How's the project doing, DistBB-san?
Name:
Anonymous2013-09-13 5:39
>>61 Slow because my professors decided to give us assignments all at the same time. I'll be able to do more work on the weekend.
Also I'm not sure I've settled the reader/postserver decoupling dilemma, though I'm leaning towards the simpler ``just chuck it all in the same database'' which would indeed cover the most common case in which the administrator is also the sole ``power'' user.
Also, according to server logs, nobody has checked out (in both senses of the word) my repo to review the protocol and the various specs.
What's the point of the DistBB again? Don't get me wrong, the implementation looks fun, but other than the distributed moderation (which is kind of like ``customized hellbanning''), I don't see why we can't conform with a centralized textboard in Scheme.
Please convince me to regain interest in this project again.
Name:
polite bump2013-09-14 19:14
>>69 - Uniform representation of posts. No more imprecise scraping. All clients can also act as servers. - Tripcodes are replaced by public keys (which have very little overhead thanks to the magic of DJB's Ed25519). - Centralized textboard means centralized moderation. The moderator has to act sometimes, to remove things like spam or questionably legal content (esp. if they live in a shitty country). It is also psychologically difficult for a moderator not to turn into a gigantic ``faggot''. (The Admin of this board seems to be doing fine so far.) If and when that happens, the community is torn between staying and moving elsewhere. - Don't put all your eggs in the same basket. If the administrator/moderator gets hit by a truck (or by less lethal real life circumstances) and can no longer host the board (or the post db gets corrupted/deleted and the admin was too dumb to make regular backups), the community must relocate in a rush, at which point they are once again at risk of ````faggot'' moderator syndrome''. - Some communities (not /prog/) rely on active moderation, and they are most at risk of ````faggot'' moderator syndrome''. This should help them as well.
>>69 The ``customized hellbanning'' is also very flexible; while one person may want to not see FV's or nikita's or mentishit's posts, they may happen to find tdavis' posts very entertaining.
>>73 Just go there yourself and commit, you can get all the updates you need, and actually work on the documentation, which needs to be polished.
Name:
the distbb guy2013-09-18 7:01
>>73 University work just kicked in, yo. I haven't worked much on it, but (interestingly) things have settled in my mind as to the design. Maybe my subconscious mind was working on it during sleep.
There is one component I still need help with. I need a simple CAPTCHA system for one-shot posting over Tor or anonymization services over the public HTTP gateway. This is critically important because it's the way you can announce your node so that other people pull posts from you (by default, the policy is to not pull posts during sync if you can't identify the peer you are communicating with).
Name:
Anonymous2013-09-18 14:28
>>75 So like in OTR or FISH (cipher), where a common key/password is made for the room or DHT, and if they don't have it, from solving the captcha, they only see is garbage.
>>76 What? No, like I said, the CAPTCHA thing is just to prevent people posting over anonymization networks from massively spamming. Instead, other people's postservers connect to you and pull the posts from you; this means that if you send them lots of spam, they'll be able to filter and delete your posts by source.
The profs have ganged up on me with assignments again. I should be able to work on DistBB towards the middle of the week, in that short gap between submitting an assignment and getting a new one.
Sorry but it turns out that this week I was planning to work on it got filled with three superlong assignments and preparation for four exams next week. FUCK. I am so sorry guys :(.
Name:
Anonymous2013-11-11 17:00
bampu pantsu for those who want to read the specs of DistBB
Name:
the distbb guy2014-01-07 17:35
I haven't forgotten about you. University work actually never stopped and the only non-university-work-related thing I did during my winter break is sleep (much overdue).
Also nobody contributed anything new to the related DistBB protection-against-DOS-by-very-high-volume-posting, come on guys.
>>83 Didn't you post the first draft a few months ago? Or was that someone else?
University work actually never stopped
Do you live in one of those fancy first world countries where you start each year after summer is over? Because I'm sure as hell I don't have anything university-related to do in winter break, but that's because we start our years in February or so.
Name:
Anonymous2014-01-07 18:28
>>84 And by draft I meant first alpha version of the protocol. Sorry I didn't make myself clear, though you probably knew what I meant.
Name:
Anonymous2014-01-07 19:49
>>83 I barely remember making the obvious suggestion of using a proof-of-work system and other people commenting on it.
>>87 That one. Are the alternatives proposed there not good enough? Isn't scrypt precisely a memory-hard non-parallelizable (not completely immune to parallelization but kinda) function with a difficulty parameter?
>>88 One issue is that an autistic spammer with a cluster (or botnet army of infected Windows machines) could easily have more computing power than most of us, forcing the rest of us to increase the difficulty parameter beyond an acceptable delay. Or maybe I'm overthinking this. I don't know.
scrypt
scrypt is a PBKDF, which means that the computation time is equal to the verification time; you want the proof-of-work verification to be blazing fast or else an attacker could easily perform a DoS. I'm actually not aware of any CS problem that is both unparallelizable and whose solution is easy-to-verify by the poser of the problem (there might actually not be any for theoretical reasons). Huh, I should probably go ask one of my professors about this.
>>89 Unequal computing resources is a problem for that, yeah. A large gap in computing power can exist between well intentioned users and spammers, both in parallel and sequential speed.
>>94 You can parallelize factoring arbitrary composite numbers, but if there are only two factors, you don't gain anything from it.
Name:
Anonymous2014-01-14 21:26
>>95 Divides?(prime, number) can be evaluated in parallel over a large set of primes in the naive algorithm. I assume this generalizes to the fancier algorithms in one way or another.
Name:
the distbb guy2014-01-16 2:15
I think I've figured out some things (in particular DoS-related and on taking the risk out of accepting new nodes to the network).
The latter is a risk since a misbehaving node can pollute the distributed post database nastily and the cleanup would be manual and tedious.
Name:
Anonymous2014-01-16 2:31
>>97 anonymity might be harmed, but the source node that data came from can be kept with the posts. So if it looks like a lot of spam is coming in, the operators can look at the source node, and blacklist that one as malicious. Then they can use the node source labels to group the spam posts and remove all of them if needed.
It's not memory-hard though and someone could definitely make an FPGA/ASIC to speed it up massively, or use individuals cores in their GPU. But at least it's something.
Based on the SEMATECH National Technology Roadmap for Semiconductors (1997 edition), we can expect internal chip speeds to increase by a factor of approximately 13 overall up to 2012, when the clock rates reach about 10GHz.
( ≖‿≖)
Name:
the distbb guy2014-01-16 15:52
>>102 It could also be run on a GPU; squaring integers modulo a 2048-bit semiprime does not take that much memory.
Also the board software itself might become fairly popular, thus increasing the likelihood/benefit of some shithead doing this.
>>104 Please don't leave the third world computer users behind ;_;
Name:
the distbb guy2014-01-16 16:47
I think I came up with something pretty good that can use any memory-hard hash function as a primitive (hell, I might even publish a paper on this). The downsides are that it's kind of interactive (client, server, client, server, client). Verification time is O(log(n)) primitive operations, and the probability of managing to 'cheat' it is very small. There is also a O(log^2(n)) transfer from the client to the server for the verification. 'n' is the time hardness setting.
>>105 The entire point of this is to make all computers about equally bad at solving the challenge.
You are welcome (and quite encouraged) to help make a CAPTCHA system as an alternative to the cryptopuzzle thing.
Okay now all I need is a sequential memory-hard function that can take between 4 and 16 MiB of RAM and runs in less than 0.01 seconds. I don't think scrypt is appropriate.
PS. In fact I don't think one even exists. MD5'ing a 1 MiB block takes 0.005 seconds here. However it seems that my construction itself is memory-hard (weakly; I'll explain more about this later). I'll investigate further.
Name:
Anonymous2014-01-19 3:53
re-reverse necr-ro
Name:
the distbb guy2014-01-20 21:34
Not sure what the ratio of CPU time to memory accesses should be. Anyway, I've made up a construction which, for a memory parameter of 64 MiB, has the following approximate characteristics: - requires 10^6 sequential hash invocations, each requiring 10 unpredictable (sequential-enforcing) memory accesses, each of the same size as the hash digest size - takes about 25 seconds to execute - the challenge size is negligible - the challenge response size is 10 KiB - challenge response verification takes 20 hashes - the probability of success in cheating is negligible
The current hash is SHA256, but I'm afraid it's too slow and makes the ratio of hashing time to memory bandwidth too large, which could enable parallel attacks (i.e. got a GPU and 4 GiB of spare RAM? why not run 64 of them at the same time?). I'm not sure about this though. I could increase the memory bandwidth requirements at the cost of making challenge responses larger (for example, for a total of memory transfers of 4 GiB, challenge response size would be increased to about 40 KiB).
Name:
the distbb guy2014-01-21 0:57
I think I'll replace SHA256 by Tiger for extra speed. I still don't think the memory bandwidth utilization is sufficient though.
Name:
the distbb guy2014-01-21 19:20
I have some work to do now. Could someone please review the scrypt paper in more detail and tell me what amount of memory transfers it performs as a function of the CPU and memory hardness parameters?
Name:
Anonymous2014-02-08 0:46
Just did a quick search for distributed bulletin board, for the hell of it
This looks like it has most of the basic features we were planning for distbb. We could just use it and hack in the new stuff for moderation. Looks like hashcash is in there as well. The only problem is it's written in java. Should we use this or write an alternative?
forgive me if I'm being naive, but couldn't you just make each peer check if the attacker's ips are the same and ignore them if so?
Name:
Anonymous2014-02-09 10:28
>>123 One of the design goals is to allow anonymous networks, in which case there is no id to identity the source.
Name:
Anonymous2014-02-09 10:42
>>124 Oh, I see. Then couldn't you make a new id, a random string, created and saved locally and send it with every request as an identifier? It could be re-created every couple of days
Name:
Anonymous2014-02-09 10:47
>>125 A spammer could use modified local software to generate many ids to try multiple simultaneous posts. There needs to be some proof of work for an id. For instance the server could send a captcha to solve, the client can respond with a solution, and if it matches, the server could respond with an id created for that client. Other ways are for the id to be computational expensive to create, which gives the spammer a cost associated with excessive posting.
Name:
Anonymous2014-02-09 11:06
>>126 how computationally expensive? even if it is 1 minute, the attacker could simply open all the programs at once and wait if you mean memory expensive, it could schedule to use all ram to id one program, then when it's done it flushes and starts the other and so on
>>127 That's a decent point. Proof of work systems are more effective against the scenario where a single spammer sends a spam email to millions of addresses. A forum can still be spammed pretty hard by a slow posting bot.
Name:
Anonymous2014-02-10 1:48
Ok, I've read the thread, how about this a hash function that takes 512MB of RAM and 10 seconds to compute then each guy generates his id easily, a random string, then computes this string's hash (and saves it locally, whatever) then all he needs to do to authenticate is select a peer at random, get his id and compute his id's hash, then send it to everybody (who'd already know it because the peer sent it to them) then the ids of everybody could reset every couple of days so nobody makes a dictionary of them
Name:
Anonymous2014-02-10 2:09
nvm, I think I'm wrong, those numbers still seem small What'd you guys use the hash function on?
>>130 I haven't yet published the details of the memory-hard sequential proof-of-work system I came up with for DistBB (because I've barely had the time to sleep, let alone work on it).