bup: git for backups - ccc
TRANSCRIPT
![Page 1: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/1.jpg)
bup: Git for backups
#bup #28c3
1 / 26
![Page 2: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/2.jpg)
Zoran Zaric
I @zoranzaric
I Computer Science student at TUDarmstadt
I bup since April 2010
2 / 26
![Page 3: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/3.jpg)
toc
1. Motivation
2. Git backgrounds3. bup
3.1 Features3.2 Algorithms & data structures
3 / 26
![Page 4: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/4.jpg)
Motivation
I Space efficiency of backups
I Convenient access to backups
I Safety against bitrot, filesystem-, and media errors
I Safety against history changes
4 / 26
![Page 5: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/5.jpg)
Git
I Distributed version control system
I Content addressed
I Immutable objects
I Snapshot- instead of diff-based
5 / 26
![Page 6: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/6.jpg)
Git
I Distributed version control system
I Content addressed
I Immutable objects
I Snapshot- instead of diff-based
5 / 26
![Page 7: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/7.jpg)
Git
I Distributed version control system
I Content addressed
I Immutable objects
I Snapshot- instead of diff-based
5 / 26
![Page 8: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/8.jpg)
Git
I Distributed version control system
I Content addressed
I Immutable objects
I Snapshot- instead of diff-based
5 / 26
![Page 9: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/9.jpg)
Git
I Distributed version control system
I Content addressed
I Immutable objects
I Snapshot- instead of diff-based
5 / 26
![Page 10: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/10.jpg)
Git: A Repository
I BLOBs
e69de29
I Trees
82e3a75
I Commits
3dfe461f
I Tags & Branches
v0.1 master
6 / 26
![Page 11: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/11.jpg)
Git: A Repository
I BLOBs
e69de29
I Trees
82e3a75
I Commits
3dfe461f
I Tags & Branches
v0.1 master
Hello World
6 / 26
![Page 12: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/12.jpg)
Git: A Repository
I BLOBs
e69de29
I Trees
82e3a75
I Commits
3dfe461f
I Tags & Branches
v0.1 master
100644 blob 5e1c309dae7f45e0f39b1bf3ac3cd9db12e7d689 README100644 blob 39c8418e04721b9a30232ce754cac8d9ee78340a DESIGN040000 tree 482fa65ae85c1e5bca8c091b479de60b714a4b6a src
6 / 26
![Page 13: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/13.jpg)
Git: A Repository
I BLOBs
e69de29
I Trees
82e3a75
I Commits
3dfe461f
I Tags & Branches
v0.1 master
tree a3d703e579dc9baae20456eb63fa49f5e4e7c9b4author Zoran Zaric <[email protected]>1314498536 +0200committer Zoran Zaric <[email protected]>1314498536 +0200Example commit
6 / 26
![Page 14: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/14.jpg)
Git: A Repository
I BLOBs
e69de29
I Trees
82e3a75
I Commits
3dfe461f
I Tags & Branches
v0.1 master
63866463d511a245a55a57ca48efe8e67b955dec
6 / 26
![Page 15: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/15.jpg)
Git: A Repository
I BLOBs
e69de29
I Trees
82e3a75
I Commits
3dfe461f
I Tags & Branches
v0.1 master
3dfe461f 82e3a75 e69de29
25b2be3 78af04f 41c28e8
master v1.0
6 / 26
![Page 16: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/16.jpg)
Git: A Repository
I Packfiles e69de29
82e3a75
3dfe461f
41c28e8
78af04f
25b2be3
7 / 26
![Page 17: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/17.jpg)
Git: Problems
I Slow & memory-hungry for bigger files
I No meta data (permissions, owners, ACLs)
8 / 26
![Page 18: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/18.jpg)
Git: Problems
I Slow & memory-hungry for bigger files
I No meta data (permissions, owners, ACLs)
8 / 26
![Page 19: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/19.jpg)
Git: Problems
I Slow & memory-hungry for bigger files
I No meta data (permissions, owners, ACLs)
8 / 26
![Page 20: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/20.jpg)
bup
I Avery Pennarun (git subtree, sshuttle, redo)
I https://github.com/apenwarr/bup
I http://groups.google.com/group/bup-list
9 / 26
![Page 21: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/21.jpg)
bup: Installation
$ sudo apt-get install python2.6-dev python-fuse
$ sudo apt-get install python-pyxattr python-pylibacl
$ mkdir ~/src && cd ~/src
$ git clone https://github.com/apenwarr/bup.git
$ cd bup
$ make
$ make test
$ sudo make install
10 / 26
![Page 22: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/22.jpg)
bup: Examples
$ bup index -ux /home/zz
$ bup save -n laptop /home/zz
$ bup save -r myserver -n laptop /home/zz
$ bup on myserver index -ux /home/zz
$ bup on myserver save -n server /home/zz
$ bup ls laptop/latest/home/zz
11 / 26
![Page 23: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/23.jpg)
bup: Features
Deduplication (http://goo.gl/aBpny)
I Benchmark with two servers and a pseudo vm image on themwith little changesrsnapshot: 4.97Gbup: 2.18G
I Import of rsnapshot backups to buprsnapshot: 12.6Gbup: 4.6G
12 / 26
![Page 24: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/24.jpg)
bup: Features
Deduplication (http://goo.gl/aBpny)
I Benchmark with two servers and a pseudo vm image on themwith little changesrsnapshot: 4.97Gbup: 2.18G
I Import of rsnapshot backups to buprsnapshot: 12.6Gbup: 4.6G
12 / 26
![Page 25: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/25.jpg)
bup: Features
Deduplication (http://goo.gl/aBpny)
I Benchmark with two servers and a pseudo vm image on themwith little changesrsnapshot: 4.97Gbup: 2.18G
I Import of rsnapshot backups to buprsnapshot: 12.6Gbup: 4.6G
12 / 26
![Page 26: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/26.jpg)
bup: Features
Meta data (almost done)
I Owner
I Exakt times
I Permissions
I Extended ACLs
I SELinux
13 / 26
![Page 27: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/27.jpg)
bup: Features
FUSE moduleYou can mount your backups and browse them with your favorite filemanager
14 / 26
![Page 28: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/28.jpg)
bup: Features
Web interface
15 / 26
![Page 29: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/29.jpg)
bup: Features
Runs on dd-wrt
16 / 26
![Page 30: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/30.jpg)
bup: Features
Import-script for rsnapshot backupsMore will follow (Duplicity)
17 / 26
![Page 31: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/31.jpg)
bup: Features
Full compatibility with GitGit tools like gitk or tig can be used with bup repositores
18 / 26
![Page 32: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/32.jpg)
bup: Features
Uses par2 to be save against bitrot, filesystem-, and media-errors
19 / 26
![Page 33: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/33.jpg)
bup: Algorithms & Data Structures
I Hashsplitting
I Midx
I Bloom filters
20 / 26
![Page 34: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/34.jpg)
Hashsplitting
I Rolling checksum
I rsync’s algorithm
I Big files are split in 8kB Chunks (avg)
I 11 least significant bits of the checksum ”1“ ⇒ new chunk
21 / 26
![Page 35: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/35.jpg)
Midx
I idx: indexes for packfiles
I 1 idx per packfile
I An object is found with 3-4 lookups per packfile
I Midx for several packfiles
I Object is found with 2 lookups
I Problem: midx have to be recreated for every change
22 / 26
![Page 36: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/36.jpg)
Bloom Filters
I Probabilistic data structure
I Check if a datum is known
I Append possibleI False-positives
I Rate grows with added dataI When rate >1% the bloom filter is expanded and rewritten
I Hash function optimized for few 1s in result
I Bloom filter is a bitarray; the result is added with bitwise OR
I When a hit is found a midx-lookup is done
23 / 26
![Page 37: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/37.jpg)
Recent
I Meta data support about to be finished(patchset available, testing needed)
I Repack patches pending(deleting old backups)
I inotify based daemon is being discussed
24 / 26
![Page 38: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/38.jpg)
You & bup?
I Python & a bit of C
I Native Windows support?
I OSX / Windows meta data support?
I OSX ”inotify“-like port?
I GUI?
I Diff
25 / 26
![Page 39: bup: Git for backups - CCC](https://reader035.vdocuments.net/reader035/viewer/2022062610/62b641a36a341817db7ab6ea/html5/thumbnails/39.jpg)
Thank You
I @zoranzaric
I zorzar on freenode & hackint
I [email protected] (Email & Jabber)
I zoranzaric.de
I github.com/zoranzaric
I gplus.zoranzaric.de
I Slides: zoranzaric.de/bup-28c3.pdf
26 / 26