Huge Backup on BD-R Planned – Go ahead and laugh at me. I’ll laugh back!

"Thank you!" for @Alexander1970 and @Blauhasenpopo for their patience in previous profile messages about this.
Pinging @IC_ … because you gave some "👍" in profiles and finally
Pinging @Nikokaro … because I won't give up trying to convince you in creating proper backup (or just any backup for starters) .

In my search for backup methods resistant against as many threats as possible I decided to systematically copy my data to BD-R. Previously I used CD/DVD/BD only in an unsystematic way for additionally securing few selected files. The most important stuff.
Now I want to create a backup of 2 to 4TB all on Blu Ray discs.
After already been lectured in my status that my method – to condense it into blunt single word form – sucks, I’ll explain what I did and why in more detail (for nobody wanting to read this).

For the amount of data nowadays BD-R, especially the cost effective single layer ones, are small media.
Automatically splitting the data becomes necessary. A helpful script can be found as part of genisoimage package. This nice script – dirsplit – allows dividing a given folder into chunks/volumes of predefined size. Resulting volumes get filled in a smart manner so that only a small amount of space gets wasted.​

Optical discs are prone to scratches. Parts might become unreadable because of a small carelessness.
Additional error correction (ECC) should be applied to guard against this and the infamous “rot” associated with optical discs.​

Standard UDF file system has limitations compared to common Linux file systems.
A file functioning as container for a Linux file system is desirable. And I want the data securly encrypted. Veracrypt should suffice. The people behind this awesome piece of software seem to be even more paranoid than me.​

There is no integrity test in user space.
Drives do their EDC/ECC stuff on their own but ultimately we have to believe them reproducing the correct data (which isn’t always the case due to physics of the analog parts/signal processing → off-topic). Well. Other media don’t offer integrity verification as well so not really a particular downside of discs.​
Wanted to include MD5 checksum verification. MD5 is fast, easy and good enough (before anybody lectures me again). No need for any super secure hash algo here. Arbitrary corruption not leading to checksum mismatch is HIGHLY unlikely.​


For the first test I used only a small subset of my data. Collection of audio books/audio drama of 346GB. Taking away file system overhead and some spare memory (and some other data on the last disc) this meant using 18 BD-R each one carrying a 20GB Veracrypt container and augmented ECC data by dvdisaster.

It was a lot of work:
  • Preparing text file with MD5 checksums
  • dirsplit
  • Creating 20 different containers (believe me, this is tedious with Veracrypt)
  • Little for loop (see spoiler below) distributing the given volume folders into the mounted containers
  • Create ISO images with Imgburn, each one containing only one container file
  • Inflate the ISO images with dvdisaster to single layer BD-R size
    • Create additional file based level of ECC on 2 extra BD-R from other manufacturer than the main backup
  • Burn each image to a BD-R with Imgburn, verify enabled
  • Print nice label on each disc
for i in {1..18}; do rsync -av "/home/Daten/vol_$i/" "/mnt/veracrypt$i" done
1-jpg.404304

That was the backup task. But a backup is only useful when it can be restored. Any sensible backup must be followed by testing the restore procedure.
  • Dump each BD-R on second computer
  • Check each image with dvdisaster (good image, good ECC)
  • Mount all images (boy, this sucks, see spoiler below)
  • Merge all data into empty folder with similar for loop
  • Verify checksums
Mounted.png

Conclusion:
A f…cking load of work compared to just rsync -av from one HDD/SSD to another. But this backup of static data to non-erasable, non-rewritable media was overdue.
Make fun of me if you want. One thing is clear: This additional copy will provide resilient fallback in case of complete loss of all easy accessible copies. The off-site problem is currently still unsolved because I have no friends and would not trust anybody anyway… but:

Seeing the following after all this work was very satisfactory:
CheckResult.png

Equally satisfactory:
Dvdisaster.png
(Sorry: This application just took German language from system settings. The green text says: "Good error correction data" and "Good image")

Nice side effect is the ease of backing up the created containers to online storage now (once I decide which one to pay for). Uploading few giant files (dimension of 500GB or even more) is not feasible with mediocre connection; hard to handle and a hassle if data has to be replaced. Uploading decrypted data is not an option and file-by-file based encryption stuff like EncFS is said to have known weaknesses in case of changes to data → multiple versions (which could happen accidentally before uploading new files)

Using 3-layer (≈10¹¹ bytes) or 4-layer BDXL(≈1.28*10¹¹ bytes) would greatly decrease the effort, but the media are outrageously expensive: ≈190€ per TB for M-DISC

Comments

May I ask why Blu-ray? I've never used other than videogames, and I've never owned a BR Player.
Because DVDs, even dual layer, are far too small (and more expensive per GB)
I would not know any other WORM medium available for end users. As far as I know LTO might be able to enforce WORM against software going apeshit and user error (despite tapes being rewritable on hardware level). But the hurdle of buying a tape machine is a bit too high.

Regarding standalone BD video players (never had one): They are DRM infected garbage. Just dump the movies to HDD and play them with laptop or Raspi. BD movies have ridiculously high data rate and look better than most other sources. Not yet tried 4K BDs… eyes don't get younger so they won't make much of a difference.

The purpose of this additional copy is providing a last line of defense against certain incidents which can undermine even sophisticated backup concepts. Most notable example here is ransomware. Granted, only a targeted attack knowing the victim and actively/manually corrupting existing copies+waiting until all generations are tainted is able to destroy data secured with good backup concept. Of course the chances for such an attack are more or less zero for private persons.
There are some other possible incidents – EMP – which electronics (HDD/SSD) won't survive, but an optical disc archive will. Sadly not as unlikely as one would hope.

Then comes the process of trying and learning. Going different ways.

Finally the fact already mentioned in the blog entry: Much of the work will be helpful for online backup. I would have had to create all the containers and include the checksum function in any case.
Now there are just two steps left for online backup: Paying some provider and uploading.
 
Because DVDs, even dual layer, are far too small (and more expensive per GB)
I would not know any other WORM medium available for end users. As far as I know LTO might be able to enforce WORM against software going apeshit and user error (despite tapes being rewritable on hardware level). But the hurdle of buying a tape machine is a bit too high.

Regarding standalone BD video players (never had one): They are DRM infected garbage. Just dump the movies to HDD and play them with laptop or Raspi. BD movies have ridiculously high data rate and look better than most other sources. Not yet tried 4K BDs… eyes don't get younger so they won't make much of a difference.

The purpose of this additional copy is providing a last line of defense against certain incidents which can undermine even sophisticated backup concepts. Most notable example here is ransomware. Granted, only a targeted attack knowing the victim and actively/manually corrupting existing copies+waiting until all generations are tainted is able to destroy data secured with good backup concept. Of course the chances for such an attack are more or less zero for private persons.
There are some other possible incidents – EMP – which electronics (HDD/SSD) won't survive, but an optical disc archive will. Sadly not as unlikely as one would hope.

Then comes the process of trying and learning. Going different ways.

Finally the fact already mentioned in the blog entry: Much of the work will be helpful for online backup. I would have had to create all the containers and include the checksum function in any case.
Now there are just two steps left for online backup: Paying some provider and uploading.
Thanks for the good explanation. Tbh, I totally forgot of optical discs at this point in life, but you're right. It's important to keep backups for important info at least, and about an EMP or electromagnetic radiation caused by a nuclear detonation, I've never considered it as a real threat but it's a real one and it's good that you considered it.
 
  • Like
Reactions: KleinesSinchen
It's nice to see this from you even if it doesn't make much sense to do on a larger scale. With outdated generations of LTO you'd probably have to do some splitting too (I've only done it with the native tape-splitting support in tar and that was good enough for me). If you'd like to have some fun with tape too, I think it's actually pretty easy and not expensive to get the hardware and media for generation 1, 2, or 3 which I have (and they seem pretty reliable even after all these years).
 
  • Like
Reactions: KleinesSinchen
It's nice to see this from you even if it doesn't make much sense to do on a larger scale. With outdated generations of LTO you'd probably have to do some splitting too (I've only done it with the native tape-splitting support in tar and that was good enough for me). If you'd like to have some fun with tape too, I think it's actually pretty easy and not expensive to get the hardware and media for generation 1, 2, or 3 which I have (and they seem pretty reliable even after all these years).
Biggest advantage of BDs over LTO is that I already own drives.
When concentration is possible for this I will look into tapes. After all having yet another medium will only improve the strategy.

If you omit the money aspect BDXL backup is possible in large dimensions. Quad layer discs from Sony are available. Two of these spindles [Amazon] would be enough for a full backup in my case.
Arguably LTO has a lot higher capacities available.
 
  • Like
Reactions: IC_
My NAS does all the backups...every three hours, keeping it in week, month and year intervals. I think...hahaha. But those backups are not...backedup! :ohnoes: It's more like "version control" than true backups I guess.

I once almost used Click! discs for backups! Tiny 40MB floppies from Iomega that could only really work on PCMCIA laptops. There are USB drives...but they are extremely hard to find! I must have more than 100 discs! Hahahahahah

MD5 is fast, easy and good enough

I like this statement! "good enough" I say that a lot! If it kinda works...it's good enough! :rofl2:(remember batterycheck collision detection ;))

While still a kind of harddrive...I have "Settled" on using Iomega REV for my backups. But...they are at least 3 years old now! Time to refresh them! Yay...work I don't like to do!

Thanks for the fun story and reminder to take backups more seriously.;)
 
  • Like
Reactions: KleinesSinchen
My NAS does all the backups...every three hours, keeping it in week, month and year intervals. I think...hahaha. But those backups are not...backedup! :ohnoes: It's more like "version control" than true backups I guess.
That is more backup than most people have.

Just ask yourself the questions: "What threats want I protection against?" and "Does my concept provide this protection?" and "Does all of my data need the same amount of security or can I safeguard a small subset quickly with additional methods?"

For the second question this comes to my mind for autobackup on NAS only:
  • Single HDD/SSD failure ✅
  • Bad user error (formatting wrong partition on PC) ✅
  • Software error leading to corruption ✅
  • Accidental file deletion ✅
  • Accidental change in source code or any other file ✅
  • Automatic ransomware attack :unsure: Depends on the implementation of your backup and access rights
  • Catastrophic software error :unsure: Depends on the implementation of your backup and access rights
  • Same HDD series failing in short time or ironically when trying to recover RAID from loss of one drive due to high load for old HDDs for long time. :unsure: Depends on luck
Off-site and off-line backup problem:
  • Lightning strike ❌ No guarantee for any connected device to survive, even with some protection
  • Targeted ransomware attack against you ❌
  • Natural disaster destroying your house ❌
  • Burglars stealing all your electronics ❌
  • Authorities confiscating your stuff for no reason ❌
  • Nuclear war and EMP ❌ (You might have different problems and thoughts than data recovery when this happens and you survive)
 
  • Like
Reactions: Archerite
That is more backup than most people have.
Thanks. While that might be true, I still need to have my devices setup to actually do automatic backups/syncs TO the NAS! :blush: Some data I use directly from the network share so it's stored on the NAS. In theory...I could use that on more than one device and always have the same version of that file. Others are on iSCSI volumes on the NAS...like the batterycheck sourcecode ;)...and those have snapshots taken. I forgot the interval but it's more than enough. At least daily.

I really should look into some sort of process like yours. With the encryption maybe...but most definitely the parity stuff to maintain integrity of the data. And kinda clean up the mess on my NAS! Because...usually I just make a full copy of a drive, move it into some archives (tar,zip,sfs) and then forget about it. I keep these as a "just in case" situation to fall back on when I miss some files. but...in a couple of cases I have multiple copies of the exact same files. so need to only safe the important data and remove the rest. that might safe me at least 25% (probably more) of the total amount of data to keep!

And your right, this does not even protect against all natural disasters that might happen. Mostly...my backups only protect against user error. Accidental delete and such.


PS: Sorry for this insanely late replay. :blush::shy:
 
  • Like
Reactions: KleinesSinchen

Blog entry information

Author
KleinesSinchen
Views
397
Comments
8
Last update

More entries in Personal Blogs

More entries from KleinesSinchen

General chit-chat
Help Users
  • No one is chatting at the moment.
    K3Nv2 @ K3Nv2: Att did offer a $500gc tempting to use it for 6 months and cancel