When I started this scanning nonsense 2 years ago I intended to keep a local copy of what I digitised (see this post from 2020). After a few months and multiple terabytes later it became an expensive burden to store the growing data, so I stopped keeping a local copy. Then this email arrived:
The Internet Archive’s automatic “three strikes and you’re out policy” kicked in after the copyright holders of some early 2000s gaming magazines I uploaded complained. Luckily my account was re-instated a few minutes later once a human verified I’m not using I’m not using the IA as a torrent site replacement, but I felt sick knowing that the hard work and time I put in to scanning all those books and magazines could vanish so easily.
It got me thinking about keeping a local copy again so if my account is disabled but isn’t re-activated, at least I can upload all my stuff again (without the flagged content) in a new account. The bad news is that there’s over 10TB of content, growing at about 100GB a week, with no end in sight as more and more people give me cool stuff to scan.
Regardless of what option I choose there’s gonna be a hefty cost to store and maintain it, so the purpose of this post is to figure out an affordable and sustainable way to maintain a copy of what I scan so that if need be, I can re-upload it.
The aim is to have everything I scan stored and backed up according to the 3-2-1 rule:
- 3 copies of data
- 2 of those copies on different media
- 1 of them off-site
The “original” is the Internet Archive - that’s one copy. I need to now figure out where to keep the other two copies! They don’t need to be active or easy to access, as I never touch them again. I just need to be able to access them should the copy on the Internet Archive disappear. If that was to happen it’s ok if it takes me a few days or so to get the data back and begin the upload process.
I looked into a few different options, each with their own article because otherwise this would be a very long winded post.
- Setup cost (15TB): $1,036 for drive, cable, controller & 10x tapes
- Ongoing cost: 1.87c/GB ($28 per 1.5TB tape)
- Retrieval cost $0
- Pros: Designed for long term storage
- Cons: Bit of a pain in the arse to use
Hard drives kept offline (18TB new HDDs)
- Setup cost (15TB): $553 for 1x 18TB HDD
- Ongoing cost: 3.07c/GB ($553 per 18TB HDD)
- Retrieval cost: $0
- Pros: Fast to copy, easy to retrieve data
- Cons: Needs regular maintenance/access, relatively fragile, can crap out at any moment, lots of data to lose at once
Hard drives kept offline (2TB used HDDs)
- Setup cost (15TB): $200 for 8x 2TB HDD (discount for buying in bulk)
- Ongoing cost: 1.5c/GB ($30 per 2TB HDD)
- Retrieval cost: $0
- Pros: Fast to copy, easy to retrieve data, used HDDs probably past the inital danger part of the bathtub curve
- Cons: Needs regular maintenance/access, relatively fragile, can crap out at any moment, have to store/manage/maintain multiple HDDs
- Setup cost (15TB): $475 for BD-R drive & 600x 25GB discs
- Ongoing cost: 2.96c/GB ($74 for 100x 25GB discs)
- Retrieval cost: $0
- Pros: Write-once, can’t accidentally overwrite data
- Cons: High chance of disc rot, 600 discs is a lot to store/manage
- Setup cost (15TB): $0
- Ongoing cost: 0.34c/GB per month forever ($51/m for 15TB - $612/yr)
- Retrieval cost: free up to 10TB, then 0.85c/GB ($42.50 for 15TB)
- Pros: Piece of piss to use, totally hands off, can be automated
- Cons: Bloody expensive
They’re all good options and it was hard to choose. Ultimately, I picked the LTO-5 tape option, but mostly because I have a friend who can hook me up with more free tapes than I’ll ever realistically use, which brings the ongoing cost to $0. Doesn’t get much cheaper than that!
I’ll make two copies of each file on separate tapes and keep one tape at home and the other tape at my parents place. This meets the 3-2-1 rule:
- 3 copies: Internet Archive, LTO-5 tape at my house, LTO-5 tape at parents
- 2 copies on different media: Internet Archive (cloud/HDD), LTO-5 tape
- 1 copy off-site: parents place is off-site
If I wasn’t getting free tapes, I’d probably go for LTO-5 tapes and used 2TB HDDs as they have the cheapest per gigabyte costs without the hassle of hundreds of blu-ray discs and the chances of disc rot, or the never ending monthly fees of cloud storage.