add post: extract multiple tgz google takeout archives

This commit is contained in:
Ariejan de Vroom 2024-03-20 10:39:01 +01:00
parent 8af820358b
commit 4a40fb34ce
Signed by: ariejan
GPG Key ID: AD739154F713697B

View File

@ -0,0 +1,59 @@
+++
date = 2024-03-20
title = "How to extract multiple .tgz Google takeout archives"
tags = ["protip", "tgz", "tar"]
+++
I love Google Photos for its easy of use and features. But, it's _Google_. As you may knwo I like to selfhost all the things, but for the longest time I was not able to find a good selfhosted alternative for Google Photos.
## Immich
[Immich](https://immich.app/) is a selfhosted photo and video management system. Sounds fancy, and it is. Aside from having a good mobile app for iOS that will do background backups, it sports facial recognition, hardware transcoding of videos, reverse geocoding and lots more. There's a big disclaimer though, Immich is still under active development. I see that more as an encouragment to use it than a warning :-)
So, naturally, I want to move all my photos and videos (going back to 2006) from Google Photos to Immich.
## Google Takeout
Due to great EU legislation Google Takeout exists. It allows you to easily create an archive of your data for export. Now, how useful that data export is going to be is another matter.
Because I have _a lot_ of data, Google is offering me to create multiple `.tgz` files for export of 50GB each. I now have six Google takeout archives:
```
-rwxrwxr-x 1 ariejan ariejan 50G Mar 5 11:16 takeout-001.tgz
-rwxrwxr-x 1 ariejan ariejan 50G Mar 5 11:16 takeout-002.tgz
-rwxrwxr-x 1 ariejan ariejan 50G Mar 5 11:16 takeout-003.tgz
-rwxrwxr-x 1 ariejan ariejan 50G Mar 5 11:16 takeout-004.tgz
-rwxrwxr-x 1 ariejan ariejan 50G Mar 5 11:16 takeout-005.tgz
-rwxrwxr-x 1 ariejan ariejan 39G Mar 5 11:16 takeout-006.tgz
```
So, I was wondering, how do I unpack these files? Well, there are several ways I want to document.
## cat | tar
With some linux-fu I could easily pipe all these files into tar to extract:
```
cat takeout-{001..006}.tgz | tar xzivf -
```
Or, I could just glob all the files, they will be ordered automatically. How nice is that!
```
cat takeout-*.tgz | tar xzivf -
```
Both these methods will pipe the data from the archives, in order, into tar, which will extract the provided data.
## pv | tar
If you want to go fancy and have `pv` installed, you can use that as well. `pv` is a utility that monitors progress of data through a pipe.
```
pv takeout-*.tgz | tar xzif -
88.9GiB 0:19:19 [89.2MiB/s] [==========> ] 30% ETA 0:43:04
```
## Importing into Immich
Now, with the data extracted, how am I going to import it into Immich? There are tools for that and I'll post again on how that process went.