Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Syncing unknown number of files between multiple servers

Name: Anonymous 2018-05-22 19:59

Imagine we have servers A, B, C. Each server has a directory with multiple CSV files where each file may or may not update between syncs by appending rows. Admins of servers A, B, C want to share these files with each other but do not necessarily trust one another. No CSV file will ever exceed 1MB in size and all servers have good internet speeds.

For an added challenge, there may be a variable number of subdirectories containing a variable number of files.

I'm thinking of a number of ways to go about syncing:
(1) vsftpd + curlftpfs: mount the remote server's directory and use simple UNIX tools to check for new and updated files. Anonymous users may only download files.
(2) GIT: the directory with the public files has a repo that is updated automatically between changes, and remote servers can pull the latest updates automatically.
(3) ATOM feeds: each server announces the added records within files or new files to a server-wide changelog; remote servers use existing ATOM libraries to sync changes
(4) RESTful API: a very basic changelog records which files are updated & at what time, and they can be pulled over HTTP. if-modified-since and similar REST patterns are employed

How would /prog/ go about this? It's not for an assignment and not for work -- just for a personal project.

Name: Anonymous 2018-05-22 20:01

use nntp

Name: Anonymous 2018-05-22 22:45

torrentfs

Name: Anonymous 2018-05-23 0:07

>>3
pretty cool

Name: Anonymous 2018-05-23 0:55

do it in hasn'tkell

Name: Anonymous 2018-05-23 3:02

>>4
pretty slow

Name: Anonymous 2018-05-23 6:28

https://en.wikipedia.org/wiki/Versioning_file_system

this is what your're are looking for, OP

Name: Anonymous 2018-05-23 18:54

rsync

Name: Anonymous 2018-05-31 4:22

SQL database with a FUSE driver that presents tables as CSV files. Translating read / write to select / insert is fun; keeping the data store consistent is not. Fool around with the former and get the latter for free.

Name: Anonymous 2018-05-31 18:31

>>9
but sql is cancer

Name: Anonymous 2018-05-31 18:48

Tired: DROP TABLE users;

Wired: SELECT * FROM users WHERE 1=1;

stay woke famalam

Name: Anonymous 2018-06-01 3:18

Double-hash each file, and store the double-hashes with filenames in a directory file
Then double-hash the directory file
If a client/server wants to sync, send the directory double-hash first
If no match, client requests sending the directory file
client/server can then compare hashes in the sync directory file vs it's own directory file (all precomputed, purely equality testing)
Each no match, client requests sending the new data file

Double-hash is defined as hash(hash(file)+file) as an n-collision prevention mechanism

Name: Anonymous 2018-06-01 3:19

>>12
all this talk of hash makes me wanna smoke

Name: Anonymous 2018-06-01 5:13

>>10
No, it's object stores that are cancer. How many basic CRUD applications have perished because some morons thought they should make design compromises of the sort that are only necessary to serve millions of users?

The real reason NoSQL became popular is because it gave idiots who didn't understand SQL in the first place cover for their ignorance. If you have a problem getting whatever informal inconsistent OOP thing your application does mapped to relational calculus, relational calculus is not the fucking problem. OOP is shit, fix that instead.

In OP's case the data to be synced is already in tabular form anyway. It's not much extra work to do a quick and dirty DB design based on that, make whatever application is reading the CSV files now use a database connector instead and dispense with the filesystem entirely. If something really needs to consume CSV just select and dump it out; data that occupies 1MB when serialized to text is fucking nothing man.

Don't change these.
Name: Email:
Entire Thread Thread List