Chunk-deduplicated backup system with differential push/pull functionality, encryption and compression.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Olivier 'reivilibre' 594ead0f70 Introduce file creations and deletions during mutation 1 week ago
src Fix integrate_node_in_place 1 week ago
Cargo.toml Rust toolchain updated 1 month ago Update READMEs 3 months ago

山 (yama): deduplicated heap repository

Yama is a system for storing files and directory trees in 'piles'. The data stored is deduplicated (by using content-defined chunking) and can be compressed and encrypted, too.

NOT YET Yama also permits storing to piles on remote computers, using SSH.

Yama is intended for use as a storage mechanism for backups. Datman is a tool to make it easier to use Yama for backups.

The documentation is currently the best source of information about Yama, see the docs directory.

Yama can be used as a library for your own programs; further information about this is yet to be provided but the API documentation (Rustdocs) may be useful.

Other, unpolished, notes

Training a Zstd Dictionary

zstd --train FILEs -o zstd.dict

  • Candidate size: find ~/Programming -size -4k -size +64c -type f -exec grep -Iq . {} \; -printf "%s\n" | jq -s 'add'
  • Want to sample:
    • find ~/Programming -size -4k -size +64c -type f -exec grep -Iq . {} \; -exec cp {} -t /tmp/d/ \;
    • du -sh
    • find > file.list
    • wc -l < file.list → gives a № lines
    • shuf -n 4242 file.list | xargs -x zstd --train -o zstd.dict for 4242 files. Chokes if it receives a filename with a space, just re-run until you get a working set.