Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: How do you backup your linux system?
92 points by Dowwie on July 7, 2017 | hide | past | favorite | 82 comments
Describe your backup workflow


It .... depends.

Given I worry about this sort of thing for a living and am a partner in the firm: I think in terms of backup, DR, BC, availability and more. I have access to rather a lot of gear but the same approach will work for anyone willing to sit down and have a think and perhaps spend a few quid or at least think laterally.

For starters you need to consider what could happen to your systems and your data. Scribble a few scenarios down and think about "what would happen if ...". Then decide what is an acceptable outage or loss for each scenario. For example:

* You delete a file - can you recover it - how long

* You delete a file four months ago - can ...

* You drop your laptop - can you use another device to function

* Your partner deletes their entire accounts (my wife did this tonight - 5 sec outage)

* House burns down whilst on holiday

You get the idea - there is rather more to backups than simply "backups". Now look at appropriate technologies and strategies. eg for wifey, I used the recycle bin (KDE in this case) and bit my tongue when told I must have done it. I have put all her files into our family Nextcloud instance that I run at home. NC/Owncloud also have a salvage bin thing and the server VM I have is also backed up and off sited (to my office) with 35 days online restore points and a GFS scheme - all with Veeam. I have access to rather a lot more stuff as well and that is only part of my data availability plan but the point remains: I've considered the whole thing.

So to answer your question, I use a lot of different technologies and strategies. I use replication via NextCloud to make my data highly available. I use waste/recycle bins for quick "restores". I use Veeam for back in time restores of centrally held managed file stores. I off site via VPN links to another location.

If your question was simply to find out what people use then that's me done. However if you would like some ideas that are rather more realistic for a generic home user that will cover all bases for a reasonable outlay in time, effort and a few quid (but not much) then I am all ears.


Put all personal data on a zfs z2 RAID system (FreeNAS). Take regular snapshots.

Someday I'm going to get a second offsite system to do ZFS backups to, but so far the above has served well. Then again I've been lucky enough to never have a hard drive fail, so the fact that I can lose 2 without losing data is pretty good. I'm vulnerable to fire and theft, but the most likely data loss scenarios are covered.


Even in 2017 you still can't sneakernet sometimes. I have a ZFS file server similar to yours, and it acts as the backup destination for all my other devices. Then, I use two USB HDDs to back up the server, then bring one to work. If either is attached a nightly script bring it up to date. I just keep rotation the two between home & work.


I do something very similar, my oldest build gets synched once a year and is in cold storage. I hope I never need it. The hard drives from my previous build hold cold copies of the most valuable volumes (monthly sync), and the current system has plenty of redundancy and snapshots (insurance, not really backups). I use zerotier to stay in sync when away from home.


This.

I've got incremental snapshots sent to a box in a family member's house (and vice versa).


I use rdiff-backup.[1]

"rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership (if it is running as root), modification times, acls, eas, resource forks, etc. Finally, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted."

[1] - https://github.com/sol1/rdiff-backup


Me too and I'm pretty happy with it. My retention period is 1Y for most systems.


With borg (https://borgbackup.readthedocs.io/) and a custom script (https://raw.githubusercontent.com/sanpii/deploy/master/src/b...) to test pg backup and sync to another server.


I use `btrbk` as a systemd service to snapshot my `/home` subvolume hourly & any other important subvolumes daily. `btrbk` manages the retention policy, which is roughly something like:

- daily snapshots for 1 week

- the first snapshot of every week for 4 weeks

- the first snapshot of every month for 2 years

- the first snapshot of every year for 5 years

Since I use entirely SSD storage I also have a script that mails me a usage report on those snapshots, and I manually prune ones that accidentally captured something huge. (Like a large coredump, download, etc. I do incremental sends, so I can never remove the most recent snapshot.)

Since snapshots are not backups I use `btrfs send/receive` to replicate the daily snapshots to a different btrfs filesystem on spinning rust, w/ the same retention policy. I do an `rsync` of the latest monthlies (once a month) to a rotating set of drives to cover the "datacenter burned down" scenario.

My restore process is very manual but it is essentially: `btrfs send` the desired subvolume(s) to a clean filesystem, re-snapshot them as read/write to enable writes again, and then install a bootloader, update /etc/fstab to use the new subvolid, etc.

---

Some advantages to this setup:

* incremental sends are super fast

* the data is protected against bitrot

* both the live array & backup array can tolerate one disk failure respectively

Some disadvantages:

* no parity "RAID" (yet)

* defrag on btrfs unshares extents and thus in conjunction with snapshots this balloons the storage required.

* as with any CoW/snapshotting filesystem: figuring out disk usage becomes a non-trivial problem


This comes up here quite a bit, lots of great answers in the past 2 discussions:

https://news.ycombinator.com/item?id=12999934

https://news.ycombinator.com/item?id=13694079


I've long used rsnapshot for automated incremental backups, and manually run a script to do a full rsync backup with excludes for /tmp, /sys and the like to an external drive.

http://rsnapshot.org/


I use rsnapshot from cron to external USB drive encrypted with luks. I swap the drive with similar one weekly and store it offsite.


I don't back up the system. Drive failures are so rare nowadays that I will reinstall more often because of hardware changes.

The important stuff (projects, dotfiles) I keep on Tarsnap. I also rsync my entire home directory to an external drive every other week or so.

Similar for servers but I do back up /etc as well.


Drive failures are really rare until they happen when your data is personal experience of a few devices. When you worry about a data centre or n and or rather a lot of other systems with storage then you realise that it happens with monotonous regularity.

Now keeping a few short copies is also fine provided you don't make mistakes. Have you ever wanted to recover from a cock up you did six months ago or three years ago?

You do offsite (Tarsnap), so you have covered off local failures - cool.

Everyone's needs are different and the value they place on their data is different but I would respectfully suggest that you think really hard about how important some bits of your data are and protect them appropriately.

Fuck ups are hard to recover from 8)


My Tarsnap backups are versioned so I can recover dotfiles and projects from virtually any week several years back. And the projects themselves are usually under git.


>I also rsync my entire home directory to an external drive every other week or so.

This is what I do as well, just not quite as often. Sometimes I wonder if I should switch to something like rdiff-backup to get snapshots, but that would only really be useful if I accidentally deleted a file and didn't notice for a while, for instance, which in practice is not a serious problem.

If I had more time and inclination, I might set up a small 2-drive RAIDed NAS box, and do automated regular backups to that. But for my laptop PC, just doing regular syncs to an external HD seems to be fine for now.


Why would you reinstall due to hardware changes? My Debian installation has survived the move from two laptops already without any glitches.


My Arch and Gentoo installs have gone somewhat further than that 8)

Admittedly I'm not sure that you can consider one of them that went from i386 to amd64 as the same install simply because /var/lib/portage/world and /etc/portage/ (plus a few other bits) are the same. I still use it and it's on its fifth or sixth incarnation as my current laptop.

Trigger's broom?

(OK https://en.wikipedia.org/wiki/List_of_Ship_of_Theseus_exampl...)


My installation of Ubuntu is so standard that I'm one apt install and rsync (of the home directory) away from the same result and don't have to manually configure LVM, encryption, etc.


>Drive failures are so rare nowadays that I will reinstall more often because of hardware changes.

2 HD failures in the last 4 years. Not rare for me :-(


http://www.snebu.com -- something I wrote because I wanted snapshot style backups, but without the link farm that rsync-based snapshot backups produce. Snebu does file-level deduplication, compression, multi-host support, files are stored in regular lzo-compressed files, and metadata is in a catalog stored in a sqlite database.

Only real things missing is encryption support (working on that), and backing up KVM virtual machines from the host (working on that too).


For both windows/linux/macOS boxen we use burp. Replaced a propietary program used for over 15 years. http://burp.grke.org/ I highly recommend it.


Mostly local network storage which is backedup multiple times automatically, for the laptop I do manual btrfs send/receives manly to get things restored exactly the way they were.

#helps to see the fstab first

    UUID=<rootfsuuid>   /          btrfs   subvol=root 0 0
    UUID=<espuuid>      /boot/efi  vfat    umask=0077,shortname=winnt,x-systemd.automount,noauto 0 0

    cd /boot
    tar -acf boot-efi.tar efi/
    mount <rootfsdev> /mnt
    cd /mnt
    btrfs sub snap -r root root.20170707
    btrfs sub snap -r home home.20170707
    btrfs send -p root.20170706 root.20170707 | btrfs receive /run/media/c/backup/
    btrfs send -p home.20170706 home.20170707 | btrfs receive /run/media/c/backup/
    cd
    umount /mnt

So basically make ro snapshots of current root and home, and since they're separate subvolumes they can be done on separate schedules. And then send the incremental changes to the backup volume. While only incremental is sent, the receive side has each prior backup to the new subvolume points to all of those extents and is just updated with this backups changes. Meaning I do not have to restore the increments, I just restore the most recent subvolume on the backup. I only have to keep one dated read-only snapshot on each volume, there is no "initial" backup because each subvolume is complete.

Anyway, restores are easy and fast. I can also optionally just send/receive home and do a clean install of the OS.

Related, I've been meaning to look into this project in more detail which leverages btrfs snapshots and send/receive. https://github.com/digint/btrbk


Main home "server" is an Ubuntu system with a couple 2TB HDDs. It runs various services for IoT type stuff, has a few samba shares, houses my private git repositories, backups from Raspberry Pi security cameras, etc. It is backed up to the cloud using a headless Crashplan install. I use git to store dotfiles, /etc/ config files, scripts, and such, in addition to normal programming projects.

We back up photos from our iOS devices to this server using an app called PhotoSync. I also have an instance of the Google Photos Desktop Uploader running in a docker container using x11vnc / wine to mirror the photos to Google Photos (c'mon Google, why isn't there an official Linux client???). I'm really paranoid about losing family photos. I even update an offsite backup every few weeks using a portable HDD I keep at the office.


"I'm really paranoid about losing family photos"

No you aren't paranoid. Sensible. However, you've only just started. Try and do a restore every now and then from the HDDs.

There is no such thing as paranoia when it comes to protecting your data.


>Try and do a restore every now and then from the HDDs.

Not only do I do that, but I also occasionally pull down some files from Crashplan just to make sure everything is backing up and working as expected.


Cover off longterm restores and you are pretty much good.

Don't forget that families/friends can also effectively offsite each other's backups without invoking third parties. That does assume connectivity, storage etc. Also it does need managing 8)


CrashPlan for /home

Plain old btrfs snapshot + rsync to local usb drive and offsite host for /etc, /var, /root


Just keeping my home folder safe, everything else is up and running in less than one hour if I had to start over. Daily backups with borg backup plus a couple of git repos for the dotfiles. All the important stuff is small enough for this, my backup is like 25 gb (a lot of random crap included), and all the photos and videos we used to worry about a few years ago is up in some unlimited sized google cloud for free. Times are pretty good :)


Excellent rsync-time-backup for local machine, backing up etc and home to external disk: https://github.com/laurent22/rsync-time-backup

Duply for servers, keeping backups on S3: http://duply.net/

Cron does daily DB dumps so Duply stores everything needed to restore servers.


- Dirvish[0] for local backups (nightly)

- Crashplan[1] for cloud backups (also nightly; crashplan can backup continuously but I don't do that)

Pretty happy with it, though dirvish takes a little bit of manual setup. Never had to resort to the cloud backups yet.

[0] http://www.dirvish.org/

[1] https://www.crashplan.com/


Backup to external hard drive with btrfs. Rsync is used to copy the full source file system, with short exception lists for stuff I don't want backupped. After the sync, a btrfs snapshot is taken to get history. These napshots are removed with an exponential strategy (many snapshots are recent, few are old, the oldest is always kept), keeping ~30 snapshots a year.

Backup takes ~10min for searching 1TB of disk space. The daily diff is typically 6..15 GB, mostly due to braindead mail storage format...

I want to keep it simple but still have full history and diff backup: no dedicated backup tool, but rsync + btrfs. A file-by-file copy is easy to check and access (and the history also looks that way).

If the source had btrfs, I would use btrfs send/receive to speed it up and make it atomic.

I have two such backup disks in different places. One uses an automatic backup trigger during my lunch break, the other is triggered manually (and thus not often enough).

The sources are diverse (servers, laptops, ...). The most valued one uses 2 x 1 TB SSDs in RAID1 for robustness.

All disks are fully encrypted.


> If the source had btrfs, I would use btrfs send/receive to speed it up and make it atomic.

btrfs subvolume snapshot / send are not atomic, for various definitions of atomic.

Unlike zfs, subvolume snapshots are not atomic recursively. That is, if you have subvol/subsubvol, there's no way to take an atomic snapshot of both. At least this one is obvious, since there's no command for taking recursive snapshots, so it tips you off that this is the case. Not having an easy way to take recursive snapshots, atomic or not, is a different pain point...

What's more insidious is that after taking the snapshot, you must sync(1)[0] before sending said snapshot, otherwise the stream would be incomplete! I'm invoking Cunningham's Law here and saying for the record that this is fucking retarded. I have lost files due to this...design choice.

Moreover, though is is probably a fixed bug, I used to have issues where subvolumes get wedged when I run multiple subvolume snapshot / send in quick succession. I'd get a random unreadable file, and it's not corruption (btrfs scrub doesn't flag it). Usually re-mounting the subvolume will fix it, and at worst re-mounting the while filesystem would fix it so far. I haven't had it happen for a while, but it's either due to my workaround -- good old sleep(60) mutex -- or because I'm running a newer kernel.

I can't wait until xfs reflink support is more mature: that'll get me 90% what I use btrfs for.

tl;dr: btrfs: here be dragons!

[0] https://btrfs.wiki.kernel.org/index.php/Incremental_Backup


No workflow, we just use BackupPC (http://backuppc.sourceforge.net/) - cant recommend it enough. Restores are easy, monitoring and automation on different schedules is all built in. It's really great.


Love backuppc, especially 4.X where they replace the somewhat ugly hack of using hardlinks with checksums. Recommended if you have a collection of systems, if it's just a single system I'd use rdiff-backup (if you have remote storage) or duplicity if you want to pay for remote storage (amazon, rackspace, or similar).


BackupPC has helped me restore my RabbitMQ queues several times. Love BackupPC.

It has an ugly looking interface, but the core of the product is super reliable.


With rsync, using the --link-dest option to make a complete file tree using hard links to a remote server.

Cron runs it, on @reboot schedule. If the backup is successful, some (but not all) old backups are deleted. I delete some oldest preserved backups manually, if disk space runs low.


Most of my projects fit in my laptop hardisk. I have a dropbox subscription which syncs everything interesting in my laptop to dropbox servers. This setup has saved my work once when my old laptop died - I just bought a new laptop and synced everything from dropbox.


I've gotten so used to Arq Backup (macOS/Windows) (and Time Machine (macOS)), neither of which are convenient to run against Linux, that I've actually come to avoid using Linux as something to be backed up except as a container or VM, wholesale.


Perhaps someone could offer a related suggestion. At the moment, I use rsync to backup files to external hard drives. One of the difficulties that I run into is that some folders I want to mirror exactly between my computer and the backup. Other folders I want to only add files, but not delete, to the backup. Still others, like git repositories, I'd like to bundle rather than backup the directory itself. Finally, I make redundant backups to multiple external hard drives, so it would be nice to replay the same backup procedure to multiple devices. Does anyone have a workflow that accomplishes this or something similar?


No, that is not backup or DR or BC (Disaster Recovery and Business Continuity). What you are describing is personal data management mixed up with backup.


Interesting. I've not heard that term before. Can you suggest a personal data management tool that works well with one of the suggested backup solutions in this thread?


I've used clonezilla (http://clonezilla.org/). You create a bootable media (usb stick) then use that to clone to another drive/partition.


After being partially dissatisfied with various existing solutions, I have written my own backup software: file4life (http://www.file4life.org). I've been using it for the last 3 years and am currently working on a new release.

Its main advantages with respect to other approaches, at least for my use cases, are:

- metadata is stored in a single append-only file: no extra software (DB etc.) is needed;

- partial backups can be performed to separate storages. In fact, source and backup directories are not conceptually different, so a duplicate of a directory counts as a backup.


I use a cloud based service to back up hand selected directories.

I run a weekly script to rsync one HD to another. The backup HD is exactly the same size and partitioned identically. I had a HD crash some years ago and it was fairly trivial to swap out the drives (probably needed to make some changes to the MBR). Unfortunately, I had an HD crash some months ago and it was not as easy this time round. Apparently my rsync would fail in the middle and so a lot of files were stale. Unbootable. Fortunately, all the critical data was copied.

I should have a smarter backup script that will alert me on failure to rsync.


I don't back up my systems. Everything important is in some cloud storage somewhere, and I have network boot into installers set up on my network. There's actually one machine where I ran out of space on the local disk and didn't feel like getting up to replace it, so I copied it into a bigger volume on one of my servers and it's been booting off of that ever since.

My customers and employers have all had these rube goldberg enterprisey backup systems, usually Symantec or Veritas talking to HP MSAs.


"Everything important is in some cloud storage somewhere"

Delete one of them and then get it restored by your supplier - I assume you've done that already. I did.


For my backups, I use rclone with Backblaze B2 and Google Cloud Storage (Nearline) using Crypt (with the same password and password2 for both B2 and GCS). This gives me the benefit of file level encryption, with filename encryption too. In my case, I'd rather not use encrypted archives in case a bit got flipped and rendered my archive useless.

I have a systemd timer to run (incremental) backups every 3 hours, and I plan on setting up a mechanism to automatically verify all of my data that has been uploaded.


Home - I back up the data areas. Dump the databases then run Back In Time which covers the home folder and my web folder. From that I can get from 0 to fully running in a few hours if necessary. And with Back In Time I have a usable portable copy of my stuff to take along when I need to.

At work we do regular TAR backups to external drives and SSH-rsync data to our sister office via VPN nightly. Backups are good for system restore then rsync back from remote to get to most recent.


Something similar to [0] but using restic. For machines that have enough uplink, restic does its thing to S3 as well.

Sidenote: I was using Time Machine on MacOS but since I upgraded to 10.13, APFS disks are mandatorily excluded by the OS (apparently as a workaround to to some bug), so restic it is too.

[0]: (warning: jwz) https://www.jwz.org/doc/backups.html


If you want something GUI-driven, deja dup or Back in Time might work for you. You can take a look at the articles I wrote on them here: https://github.com/kororaproject/kp-documentation/blob/maste...


Not Linux, but same answer as it would be.

`rsync -av --files-from=".backup_directories"` on a daily cron job.

iMac and work hackintosh rsynced to local and remote backup machine daily. Windows machine (where all my music and pictures live) both rsynced to local and remote backup machine, and Cobian'd to a second drive daily. I also will run the same Cobian backup to a cold external drive every month or so.

Deathly afraid of data loss.


Bacula manages our network backups to an Overland Storage tape library. While involved to setup, I found it worthwhile given the capabilities it offers and reliability. Bacula can also backup to file system volumes if you don't have tape libraries but would still like the other features (eg encryption, network-wide backup, file catalogues, retention and recycling rules etc).


Crashplan for docs/media. As for the system, I've been meaning to set up proper automatic btrfs snapshots/rsync but haven't gotten around to it yet. Worst case scenario all the docs/media are on their own RAIDZ array, so if some weird system corruption ever happens I can just reformat/reinstall.


I sync data using git-annex to a home server (atom with a couple of drives) and a hosted dedicated server. Technically not "backup" since I don't store old versions.

My work files are kept in git repos which I push to the same servers.

I use Ansible to configure my machine, so that I don't have to backup system files, just the playbooks.


1. Place desired data to back up on a drive (which may be network attached).

2. Clone said drive to an external drive. Detach and lock it in a water/fireproof box when not in use.

3. Swap external drive with another that is stored off site every week or two.

4. Swap with yet another off site external drive less often (a few months).


No test to ensure proper copy?


I have:

- A home NAS (4x5T + 7x4T = 48T, btrfs raid1)

- A 2U server sitting in a local datacenter (8x8T + 2x4T = 72T, btrfs raid1)

- An unlimited Google Drive plan

I run periodic snapshots on both servers and use simple rsync to sync from the home NAS to the colo. Irreplaceable personal stuff is sent from the colo to Google Drive using rclone.


Hourly incremental backup with Deja Dup saved to Synology NAS. My wife's macbook also backs up via time machine to the nas.

Synology NAS backed up daily to Backblaze B2

Dead simple to set up and maintain, and in the event that I need to restore a file or files, it's relatively fast.


I have a script that checks which external drives are connected and when it finds the correct ones it backups to them using rsync. (Only /home is backed up but it varies depending on which external drive is connected)


rsync /home, /etc, and a text file of the currently installed package list to my homebuilt ZFS RAIDZ2 NAS running Arch Linux. This has been enough to recover when my laptop drive fails.

Each device gets its own ZFS filesystem and is snapshotted after rsync.

FolderSync on Android does this automatically when I'm on home wifi. AcroSync for Windows. Both FolderSync and AcroSync are worth the small purchase price. Cronjobs for nix machines. iPad syncs to Mac which has a cronjob.

Stuff I really don't want to lose (photos, music, other art) are on multiple machines + cloud.


At hime, I keep my files in a local Git repo and back it up once in a while to another drive. If I accidentally delete a file, I can get it out of git. Plus I get the benefit of Version Control.


My laptop: rsnapshot to local USB disk, duplicity to remote server, git push for code.

My (two) servers: dump of db, rsync of dumps and files to another server.

It's ok only because I've got little data.


At my last job I used BackupPC to automate backups. http://backuppc.sourceforge.net.


I use blurays to store "archived" files that I don't actively edit: music, pictures, book reports from high school, etc.

Then the usual btrfs send/receive tricks.


Do you off site them?


Veeam Agent for Linux is a surprisingly nice and easy-to-use image-based backup for free. Backup my two home servers to my NAS, which syncs up to Backblaze B2.


Ubuntu comes with Deja Dup which backs up files. It works fine for me. For the installed packages I use aptoncd for backing them up as an iso image.


I use Borg to incrementally backup /home to a Kimsufi server with a 2TB hard drive which costs around 20€/month.


- Crashplan on NAS

- 6 month archival image of NAS to external HDD, rotated every 2 years

- 3 month differential rsync to nearline storage, kept for 5 years


# apt install backupninja # ninjahelper


I just use rsync and a weekly cronjob.


rdup + rdedup + rclone + backblaze b2


Tarsnap for documents (rotated with tarsnapper), S3 reduced redundancy for Pictures (JPG + RAW).


Tarsnap


ecryptfs-mount-private

Symlink ~/.private to Dropbox/private

Per file encryption, and I do not care if Dropbox will get hacked again


I use rsync with external disks


Spideroak One


zfs snapshots + borg


I use rsnapshot [1] for /home data and afio [2] for other stuff (offline databases, system, photos, ...)

With rsnapshot, I have hourly, daily, weekly and monthly backups that use hard links for the saving disk space. These backup dirs can be mounted read-only.

With afio, files above some defined size are compressed and then added to the archive, so that if some compression goes wrong, only that file may be lost, the archive is not corrupted. Can have incremental backups.

From the afio webpage: Afio makes cpio-format archives. It deals somewhat gracefully with input data corruption, supports multi-volume archives during interactive operation, and can make compressed archives that are much safer than compressed tar or cpio archives. Afio is best used as an `archive engine' in a backup script.

[1] http://rsnapshot.org/ [2] http://members.chello.nl/k.holtman/afio.html


I use rsnapshot to create hourly/daily/weekly/monthly snapshots of configured systems. Backups are written to a dedicated drive (which just started showing SMART-errors at so I'll replace it ASAP). I regularly create PGP-encrypted archives of selected sets which get stored off-site, spread around the 'net on cloud hosting services. I'm also thinking about making an arrangement with some friends and family members who have a) good network connections and b) personal servers and c) a need to backup such to swap off-site backup copies - I'll store yours if you store mine. I currently have about 100 GB in 'critical' data which is stored off-site.

I also keep current copies of most configuration data for all systems, mainly by backing up their /etc directories. This is also done for network equipment and remote network configuration data (zone files, etc).


rsync -aAXv --exclude-some-stuff / /media/usb




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: