Home → Repo ^yEzqv → Branch master Files → projects → file-sharing.mdwn

Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

file-sharing.mdwn

Background

This semester (my fourth) I have a course managed using a centralized proprietary web service called “P1azza”. When I tried to download some of the course material for the first time, I noticed a problem: Most of my attempts to download a file result with an invalid file which I cannot open.

I tried downloading several PDFs and PPTXs on various occasions, both when logged in and when not, and the result was similar: Some files are downloaded successfully, but others require several attempts and some don’t work even after many attempts.

I decided I have to download them all, because in “real time” when I need them for homework, I can’t afford to waste time doings dozens of failed downloads. So I started trying them all one by one and managed to do successful downloads of most of them. Then, while sitting in a boring lecture, I had one of those sharing oriented ideas of mine: Run a system which offers these materials in a distributed manner. Not just these specific files, but any course material which students don’t have time to convert to rephrased reorganized Wikipedia articles (and their lecturers refuse to release the sources or under Creative Commons).

The old-style obvious solution is to run an FTP server. I don’t know much about these, but I do know I can access them using a fil browser like Nautilus and copy to my local machine as much as I want. Software projects use them to store release archives and univesities use them to store learning materials and submitted homework. It sounds good and is probably quite easy to setup, assuming it can be served in a darknet, but it has one major crucial flaw: It is centralized.

When I say centralized here, I mean there is no built-in way to get files from a secondary source if the primary source goes down. For example, assume you’re downloading a file from an FTP server and suddenly the server goes down. What happens? The downloaded file is corrupt in the worse case and is partial in the better case, and you don’t know where to get the file from. You have two options:

Manually do a web search and find the same file somewhere else
Wait for the FTP server to go up again

Especially for large files, but also for small text files with the same importance at least, this kind of approach is clearly not a good solution. In order to protect freedom of expression and knowledge access it must be better than this. In particular, it must allow transparent usage of several data sources which can be switched whenever some of them are down.

Network Layer

The following technologies come to my mind, which may solve the problem:

Freenet
OurFileSystem
Syndie (through I2P, Freenet, Tor)
BitTorrent
I2P BitTorrent
Gnutella
I2P Gnutella (I2Phex)
eMule
I2P eMule (iMule)

First, in order to protect the rights and the safety of the users, a clearnet (regular internet) solution must not be used, at least not as the only means of access. So let’s remove these and make an updated list:

Freenet
OurFileSystem
Syndie (through I2P, Freenet, Tor)
I2P BitTorrent
I2P Gnutella (I2Phex)
I2P eMule (iMule)

Good. Now, since Syndie is meant for discussion and not for persistent large file storage, I prefer to not count on it for this kind of requirements. I’ll need content to be organized by meaning and not by time of submission, etc.

In addition, since eMule, Gnutella and BitTorrent provide the same functionality, and BitTorrent seems to be the most popular and technologically advanced in general, I’ll make my life easier for now and save time of deep research by dropping Gnutella and eMule for now.

Now, there’s something I haven’t considered: The usage of distributed hash tables (DHT). With this technology, torrents don’t need trackers which know all the peers of each torrent they track. I still need to understand how it works, how well it works and whether it is really enough to just run a DHT-supporting client to use it and discover other peers without a tracker.

OurFileSystem probably works in a way similar to Syndie, but I didn’t check exactly how: It may be good for my purpose. I tried it one, but it used its own I2P router instead of my running one, so I couldn’t get it to run properly. I will try again.

Final list to start working with:

Freenet
OurFileSystem
I2P BitTorrent with trackers
I2P BitTorrent with DHT

Content Layer

The content layer in this system has three functions:

Locate specific content
Query the network for content with given parameters
List the available content

I don’t know how searching works in Freenet and in DHT, but Transmission supports DHT and there is no option to search. I need to run Freenet first to see how it works, but a far as I know, there is no flexible semantic search like what I want to have. So let’s assume the search is provided by a torrent index. In the usual fashion, this search will work using a distributed semantic database.

Semantic: It contains information about the files shared, in exactly the same way information about regular local file content is expressed using Kiwi ontologies
Distributed: There is no single server holding this database, and all servers communicate transparently, so the client can connect to any of them and transparently search through all of them as one

Since the torrent index itself is supposed to change all the time, BitTorrent is not a good way to share it. Something else must be used, for example store the information as a single huge Kort file placed in a git repository, and synchronize it between the servers all the time. Anyway, I can start with a single index server, and extend later to server federation. It’s quite simple, actually: I just need a semantic datastore. It can even save the torrent files directly inside the database (or as files, whichever is faster).

Then I can have a GUI which displays results. It can be either a website which runs a server script and produces static HTML (e.g. using CGI) or a desktop app which queries the network-accessible database. The second option sounds like much better design because anyone can do anything with the database, i.e. flexibility and openness, but I need to make sure it’s safe first, i.e. no hole in the server will allow someone to crack it.

Hmmm… actually, a desktop app just for one website? Sounds unusual, doesn’t it? Fine, here’s an idea: Supply both a server interface for remote queries and a web interface for users.

User Management

How is content added to the index? As long as it’s just me, no user needs to have write permissions and things are simple. But I do want to allow people to contribute, which means UPDATE commands sent to the database. How do I allow users to prove their identity so I can trust the content they want to add?

In I2P there is no way to block people, because of the anonymity it provides. The client tunnel address can change easily, so there’s no way to continuously block someone. However, I can use SSH keys to authorize people to make changes, and revoke a key if the owner adds spam to the index. It also means the process of becoming a contributor is a bit longer: You need to send me your SSH key first. It can be automated of course.

Any user can be accepted, and then the community can protect the database by:

Marking content as reliable
Marking content as spam

The usage of SSH keys makes things safer than plain passwords, because it’s way harder to impersonate someone by guessing their details.

As to the files themselves, it seems there’s a chance to use magnet links themselves with the trackerless torrents, thus saving database disk space. I need to try sharing a trackerless torrent as a magnet link to verify it works.

User Interface

I want to see what I2P torrent tracker interfaces look like. This time I won’t be pasting screenshots and doing much written observation, because it’s very early for that. But I want to see. Here are the trackers which have a web interface for downloading torrent/magnet files:

diftracker
postman

That’s all I could find. The others seem to be just trackers, without an index.

[See repo JSON]