Masstuffy is an object-storage server that utilizes WARC files.
At the moment, Masstuffy IS NOT FUNCTIONAL and is far from being complete. I am developing it to learn Rust and because I couldn't find any self-hosted object storage solutions that meet my criteria:
- Standalone
- Doesn't waste inodes with millions of files
- Optimized to run on a single machine
- Compresses small objects efficiently
After exploring WebArchive, I realized that the WARC file format perfectly suits my needs. It is seekable, compressed, can contain metadata, and is easily extensible.
- Repository: This is where collections are stored.
- Collections: A collection is a set of records (or objects), which are essentially WARC files.
Before storing records, you need to create a collection. Depending on what you plan to store, you can enable compression (and use a dictionary if necessary) and then insert all the objects you want.
Additionally, there will be an option to generate the dictionary after the collection has been created. In this case, objects will be stored without compression in a temporary folder. Once a certain threshold is reached, the dictionary will be generated.
Masstuffy is licensed under the Affero General Public License (AGPL).
This means you are free to use, modify, and distribute the software,
provided that any modifications are also released under the same license
and that any network services built using Masstuffy also make their source code available.
For more details, please refer to the LICENSE file.
- database
- read and load cdx files to db
- mark for deletion
- search
- by id
- by url
- collections
- create
- load
- generate cdx files
- regenerate cdx files
- read
- append
- fs atomicity
- compression
- compress
- dictionnary
- regenerate cdx files
- dictionnary generation
- make async
- cli
- setup file layout
- create collection
- create
- custom dictionnary
- add records
- get record
- search records
- detect when the server runs and send commands to it
- server
- link to source code (AGPL requirement)
- create collection
- add records
- search records
- report records
- offload decompression (client-side decompression)
- get record(s)
- by id
- by url