Addressing my (video) content

“How do you store your video files?” I was asked recently. Should be simple to answer: “I use …!” After all it’s such a common task, this should be a solved problem. Then why did I end up writing my own set of tools for this?

Well, it turns out that storing files, even large ones is not a problem. It’s really easy. Drag and drop. Drag and drop again to have a backup. Just in case. Easy.

But deleting files? That is the hard part. The crushing backup anxiety. “I need space, right now. Can I delete this file? Do I have it somewhere else? Do I know how to find the other copy? How can I tell a good copy from a corrupt one? Or another random file that happens to have same name?

Well, just in case. Make another backup. Better safe than sorry. I’ll clean this up later. Storage is cheap. Unless it needs to be fast that is.

Time… time is expensive.

So how to even approach this? Maybe I’m biased. Working on source control infrastructure for over a decade, I see opportunities to use content addressing almost everywhere.

So here’s what I want:

Be sure to never lose files
Always know where to look for a given file
Not waste disk space
Not waste time doing repetitive tasks by hand
Not waste time waiting for slow machines
Not waste money on expensive cloud subscriptions

Can I just let AI handle it? 😂

Obviously: No.

Everything in its place

Whenever I put something somewhere I need to decide where to put it. It’s a known strategy for anyone who tries to find things in their house. “If I had to put this away again. Where would I put it?” If we come to the same conclusion every time for a particular item, we would never lose anything. Every item has a designated place just based on its properties. Not on what I use it for, time of day or what mood I’m in. Just the item itself determines where it belongs. Even better if the rule to find a place for any item was so simple and obvious that everyone following it would come to the same conclusion. Then I could tell anyone to clean my house and all things would be in exactly the place I expected them to be: The places they belong. Now what if two items belong in the same place? That would suck. We can put only one thing into any given place. That’s physics. So we must make sure we have enough places to put things. And with a bit of luck we never will be tempted to put two things into the same place. Ever. No matter how many items we hoard.

Finding such a storage scheme with a quasi infinite number of places to put things is not really possible in the real world. But fortunately in the world of computers it’s rather easy, through the magic of hash functions.

So in the computer “place to put things” translates to a path in the filesystem:

/Users/christian/videos/last_week/clip_1.mov

Now that is a pretty terrible path to use. The meaning of last_week changes over time. And if I used more than one camera that week, clip_1 could be the name assigned by both of them.

Really large numbers

So what is a hash function? It’s a computational process that yields a fixed length number for every possible input presented to it, no matter its size.

The important part here is that a good hash function will generate a number that is completely unpredictable. Every possible value appears with the same probability no matter how similar the inputs might be. When we choose a hash size large enough, the number of possible hash values is so enormous that the chances of picking the same for different files are practically zero.

A 20 byte hash for instance has about 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 possible values. That is more than the number of atoms in all of earth’s oceans combined. Picking the same one twice will never happen.

The hash is therefore something like a unique fingerprint of a file. If two files have the same hash we know they are the same file. If they have a different hash we know they can’t be the same.

A typical hash (I chose to use SHA1) looks like this:

25dfeb26e9d19fd12c97fd14b28d130c1adfed0d

So that makes

/Users/christian/videos/25dfeb26e9d19fd12c97fd14b28d130c1adfed0d.mov

A really good filename already. Whenever I have a new file, I can add it with

cp /Volumes/SDCARD/filename1.mov “/Users/christian/videos/$(sha1 /Volumes/SDCARD/filename1.mov)”

This way it will always get a unique name. Wherever I copy this file from now on I know exactly what file it is. If I forget to format the SD card after and wonder at some point if I copied it already, I can just use the same command again.

cp /Volumes/SDCARD/filename1.mov “/Users/christian/videos/$(sha1 /Volumes/SDCARD/filename1.mov)”

It will end up with exactly the same name, so I won’t waste any space on duplicate files.

This is already great. But in practice it’s also useful to have the files organised a bit to find them quickly. So I settled on a folder structure like:

Videos/2026/2026-03/2026-03-31/CAMERA_NAME/12341234-25-25dfeb26e9d19fd12c97fd14b28d130c1adfed0d.mov

So this is organising the files by the date they were recorded and the camera they have been recorded on. The filename also contains the time of the day and the frame rate in addition to the hash.

Adding the timecode is useful when I want to sort the files in the order they were recorded. To add them to a timeline for example. And the frame rate just happens to be something I want to be able to see at first glance so I added it to the naming scheme.

Of course I don’t create all this directories by hand. For this I wrote a little python script around the awesome exiftool.

Actual storage media

So with a naming scheme figured out. Where to actually put the files? I don’t have enough space on my MacBook. Also it might get stolen or break. External SSDs are awesome, but expensive. Especially if I have to store everything twice to have a backup if one breaks.

The solution I settled on is therefore use the cheapest (still reasonably fast) storage medium I could find: Good old 2.5” spinning hard drives. I got a 5 bay dock from Sabrent which works really well and allows swapping drives without needing any tools. Since HDDs can - like any other medium - fail sometimes, I also needed a cloud backup solution. I looked into a few of those, and the best I found is Backblaze. The pricing is cheap and they offer unlimited storage. The client is simple and does the job. It’s not good for file sharing or actually working with the stored files. It’s really meant to be just a backup. But for that it works great.

Of course editing 4k video files off spinning disks is no fun at all. Also I sometimes like to edit on the go and those disks alongside with the case are not exactly travel friendly. So to solve that I just carry around a fast SSD with proxy files for all the clips. Proxy files are lower resolution versions of the original clips that are just about 1/10 of the original size. The quality is still plenty good for editing so I only need the originals on the HDDs when rendering a clip. This works pretty well with DaVinci Resolve and the accompanying proxy generator.

So the whole flow ends up as:

Use cardcopy.py to import new clips after a day of shooting. Storing them on a dedicated “ingest” SSD drive and creating the date based folder structure automatically.
Transcode all the clips to proxy files. Also on the ingest drive so it’s fast.
Use another script mediabackup.py to copy all files that already have a proxy to the spinning disk “archive” drive and also copy the proxies to the “proxy” drive.
Sometime later, when all files on the archive drive are backed up safely on Backblaze, run mediabackup.py —rm. This will read the files back from the archive drive. Calculate their hash once more, and check it matches the filename. This makes sure the file did not get corrupted during copying it around. Once that check is ok - and only then, the script deletes the file from the ingest drive.

Peace of mind

This accomplishes all my initial goals. As long as I trust Backblaze enough that it won’t go away at the same time as one of my HDDs might fail I won’t lose any files. The content based path scheme is very effective at preventing unintended duplication of files so I know I’m really using just as much storage as I actually need. Yes I’m paying one cloud subscription, but given how reasonable it is priced and how much additional safety it provides I would not call that a waste at all.

Everything is done by the scripts. No decisions to be made by me. No surprises. After a shoot I can just insert the SD card, type a command and take a walk. Or go to sleep - perfect.

Command Palette

Everything in its place

Really large numbers

Actual storage media

Peace of mind

Comments