A Chord-based distributed file system that maintains state through a replicated log. Built using a bunch of other junk that I built. Explanation with graphics at the bottom!
Uses:
- Chord-ish as a membership and failure detection layer.
- Leeky Raft as a consensus layer.
Starting the Raft cluster.
Electing a new leader on leader failure.
- Setup the network.
docker network create dfs-net
docker-compose build && docker-compose up --remove-orphans --scale worker=<worker_count>
- Start 1 +
worker_count
nodes. - Recommended
worker_count ~= 5
. CPU utilization is high across all three components so expect some sluggishness.
- Start 1 +
- Build & run client with
docker build --tag client . -f ./dockerfiles/client/Dockerfile; docker run --rm -it --network="dfs-net" client /bin/sh -c ./dfs
> put go.mod remote
.> get remote local
.- Available client commands listed below in Client Commands.
config.
files for each component can be found inside /config
. Mappings are as follows:
config.dfs.json
: Distributed File System Layerconfig.fd.json
: Membership/Failure Detection Layerconfig.raft.json
: Consensus Layer
put localfilename sdfsfilename
(from local dir)put
both inserts and updates a file
get sdfsfilename localfilename
(fetches to local dir)delete sdfsfilename
ls filename
(list all machines where this data is stored)store
(list all files stored on this machine)
Chord-ish DeFiSh works by combining three separate layers, each of which I built from scratch and are coated in an alarming amount of my own blood, creaking from the rust that accumulated as a result of my tears and sweat getting all over them. They are listed in order of their role in the placing of a user's file onto the distributed filesystem.
-
Chord-ish, the membership layer. The membership layer lays the foundation for everything by assigning nodes / servers in a network onto some "virtual ring", giving them a distinct ID number as a function of their IP address. Then each node begins heartbeating to some number of nodes around it, setting up a system that allows them to gossip membership information and become aware of failures.
-
Leeky Raft, the consensus layer. A client sends commands, or entries to the consensus layer. These commands are similar to HTTP verbs. For example, the command to put the file
test.txt
onto our distributed filesystem with the nameremote.txt
would be expressed as"PUT test.txt remote.txt"
. The consensus layer then replicates this entry to all other nodes in the network. On confirmation that the replication was (mostly) successful, they send the command to the filesystem layer. -
Chordish DeFiSh, the filesystem layer. The filesystem layer receives the command from the consensus layer and begins executing it. It assigns the file a distinct ID number as a function of their filename, using the same method as the membership layer. It then stores this file at the first node with an ID greater than or equal to its own. If no node's ID is greater, then it wraps around the ring and tries to find a node there.
Files are replicated to the 2 nodes directly "ahead" of the aforementioned node. Files are stored as actual files in each nodes' filesystem, and as
filename:sha1(file data)
maps in the runtime memory of each Chordish DeFiSh process, as a fast way to check for file ownership & save time by ignoring write requests for a file it already has.From there, users can upload, delete, or download files from the file system. The visuals below will explain how this all works, sort of.