Name: Anonymous 2014-01-17 21:15
From /lounge/
Sounds like a /prog/ challenge. First thoughts: build word frequencies for every post. Then loop over every post and every other post. For each pair of posts construct a link strength based on the word frequencies, then cull under a threshold, then the remaining networks represent identities (i.e. a human poster). You'd need some way of ensuring mutual exclusion too idk.
A small userbase makes it easier to group posts together to form a pseudo identity, which may then eventually be deanonmyzed. But we are much harder targets than redditors and twitterers.
Sounds like a /prog/ challenge. First thoughts: build word frequencies for every post. Then loop over every post and every other post. For each pair of posts construct a link strength based on the word frequencies, then cull under a threshold, then the remaining networks represent identities (i.e. a human poster). You'd need some way of ensuring mutual exclusion too idk.