Scanning all these magazines means I also have to back them up. If the whole point is to preserve them, it’s kinda dumb just to palm the files off to the Internet Archive, delete my copies and hope for the best.
Whilst the Internet Archive is great and all, there’s a non-zero chance that one day they’ll have funding cut or donations dry up and they’ll need to rationalise their storage. That could mean nothing but PDFs stick around, certain types of content disappear or some other situation where the high quality, large file size originals are lost.
I thought I’d take a good look at my off-site backup options based around storing 5TB for 10 years. I don’t know exactly how much data I’ll be hoarding over that time-span, but 5TB for a decade feels like a good starting point for comparison purposes.
The CloudAmazon's Deep Glacier is the absolute cheapest way to store data in the cloud. The catch is that you can't get the data back for something between a few hours and a few days and it's expensive to suck the data out.
Deep Glacier costs US$0.00099 per GB for data storage and if you want to restore your files, it’s US$0.0025 per GB for data retrieval plus US$0.09 per GB for data transfer.
To store 5TB would only cost US$5.07/m, but paying that for 10 years would add up to US$608, or A$957 at current exchange rates. If I want to restore all 5TB of data at some point, I’m looking at US$474 as a one-off cost.
Using Deep Glacier is also the easiest (aws s3 cp mag.tar.bz2 s3://decryptionscans/ --storage-class DEEP_ARCHIVE, that’s it) and hasn’t got the large up-front costs like buying new drives or media (to store my current 1.4TB is just A$2.50/m). Just pray you never need to restore data out of it! I’m also not a fan of paying a monthly fee for the rest of my life.
Remote NASGet two 6TB HDDs, chuck em in a cheap 2-bay Synology NAS, configure it to RAID-1, plug it in to my parents router and use old mate rsync to upload files overnight. A Synology DS218j is $260 and two 6TB HDDs go for around $580 for a total setup cost of $840.
My parents wouldn’t make me pay for electricity, but if you’re running it at home or somewhere where you do need to pay for electricity, it’ll certainly add up over time. Assuming a 30W average consumption (idle power is under 10W, max load around 50W) and 20c/kWh for electricity a basic 2-bay NAS will cost 14c/day to run. Over 10 years that’s $511.
I like this option as it’s all automatic, but I don’t know if I’d trust a HDD to last 10 years, or even the NAS to last that long without some sort of problem. If I had to replace a drive (the data will still be good) that’ll put the overall TCO past the cloud. At least the data retrieval costs are low!
Optical media seems quaint but it’s very much still a thing. They’ve got the benefit of being write only so you can’t delete them accidentally and the drives are cheap (~$100). But those of us who stored data on DVD-Rs back in the day only to find them rotting away 15 years later might not trust optical media for long term archival purposes. I was worried about being able to find a Blu-Ray drive in a decade’s time, but there’s fuckloads of DVD-R drives still around and they’re over 20 years old now.
Blu-Ray burnable discs come in single (25GB), double (50GB), triple (100GB) or quad layer (128GB) options. Here’s pricing for the absolute cheapest discs I could find.
- 78c for 1x 25GB - 3.12c/GB
- $2.44 for 1x 50GB - 4.88c/GB
- $15.10 for 1x 100GB - 15.1c/GB
- $20.80 for 1x 128GB - 16.25c/GB
200x 25GB Ritek discs (5TB all up) is $160 plus an LG BH16NS55 burner for $90 comes to just $250. But I don’t really trust Ritek BD-Rs to last a few years, let alone 10 or 20. M-DISC media designed for archival would alleviate my concerns but these cost so much more at $3.60/disc or $720 to store 5TB.
I’ve always wanted a tape drive at home and here’s a chance to own one! The physical media is designed to last for 30+ years, there will always be LTO drives around and the LTO & LTFS spec shouldn’t be inaccessible any time soon.
The latest generation LTO-8 tapes store up to 12TB on a single tape and cost around $250 each, not bad for 12TB, but a drive that’ll read those tapes costs over $4,500 new. On eBay however, there’s an LTO-3 drive that’ll slide into a 5.25" on my HP ML10v2 server for $120 and a SAS expansion card (HP P212/256MB) to go with it is $30. The right cable (SFF-8482 to SFF-8087) will set me back $16 off AliExpress.
LTO-3 tapes hold 400GB each and can be found relatively cheap - about $20 each when purchased in bulk. I’d need 13 tapes to store 5TB - $260. All up I should be able to get 5TB of tape storage for under $450.
The downside of any physical media is having to physically move them. At 400GB/tape I’d be able to fill them up relatively quickly but there’s still a period of time when there won’t be an off-site backup until I get ~400GB of scans. I could rotate the tapes around instead of waiting until they’re full, which is good for checking the validity of the backups, but means I’d have to visit the off-site storage location (my parents) so often.
Optical Disc Archive & RDX
The one true constant in our mixed up world is Sony making wacky proprietary data storage and in 2020 that format is Optical Disc Archive. You can store up to 5.5TB on a single cartridge and they’re designed for long term archival.
Sounds awesome and the discs aren’t even that expensive (US$120 for a 3.3TB disc), but the disc drives, oof. A brand new Sony ODS-D77U external drive is US$6,500. There’s a used one from Japan (with a 110V PSU) for A$1205 delivered which is more realistic, but still crazy pricing compared to a tape drive or blu-ray burner. Never change Sony, never change.
I also discovered RDX, a technology HP seems to have inherited via its acquisition of Tandberg. It looks like a tape, smells like a tape, works like a tape, but inside the cartridge is a 2.5" HDD.
You’d have to have some serious HP Stockholm Syndrome to spend money on RDX!
What's It Gonna Be?
Here’s a pricing summary:
- Amazon Deep Glacier: $950 + $750 if I ever need to restore the entire 5TB
- Off-site NAS: 2x 6TB HDDs & Synology NAS - $840
- M-DISC Blu-Ray: 200x Verbatim M-DISC 25GB BR-R & LG BH16NS55 BD-R drive - $810
- Cheap Blu-Ray: 200x 25GB discs & LG BH16NS55 BD-R drive - $237
- LTO Tape: 13x LTO-3 tapes, HP Ultrium 920 SAS LTO-3 drive, HP P212 SAS controller & cable - $413
I’m tempted to go with the LTO tapes because they’re half the price of the M-DISC Blu-Rays and should last a long time, but damn Amazon Deep Glacier is good despite the cost because I don’t need to shuffle tapes around. If I can figure out a tape rotation system that doesn’t mean I wait for 400GB of data to accumulate before sending a tape off-site, I think I’ll indulge my inner nerd and have an LTO tape drive at home.