Sunday, April 17, 2022

Show HN: Toolkit of software to backup Google Takeout at 6GB/s+ to Azure https://ift.tt/Gc2qyKk

Show HN: Toolkit of software to backup Google Takeout at 6GB/s+ to Azure After seeing all those posts about Google accounts being banned for frivolous and automated reasons, I started to use Google Takeout more and more to prepare for the worst. If you aren't aware of what Google Takeout it, it is a Google service that allows you to download archives of all your data from Google. I understand that this may be kind of niche, but if the size of your Google Takeout is large and prohibitive to transfer and backup, this toolkit I made may be right for you. Problem is, my Takeout jobs are 1.25TB as it also includes the videos I've uploaded in my YouTube account. Without them, it's 300GB which is still a very large amount to me. It got really old to be transferring 1.25TB by hand manually. It's a pain to do it even on a gigabit connection and it is also a pain to do it in a VPS. At most I got 300MB/s doing it inside a VPS but every session took an hour or three to complete and it was rather high-touch. The Google Takeout interface is hostile to automation and download links obtained from it are only valid for 15 minutes before you must re-enter your credentials. You can't queue up downloads. Not only that, you must have some temporary storage on whatever computer you have before you send it off to some final archival storage. What a pain! In HN-overkill fashion, I came up with a toolkit to make this whole process much, much faster. I noticed that each connection of a download from Google Takeout archive seemed to be limited to 30MB/s. However, multiple connections scaled this up well. 5 connections, 150MB/s. I noticed that Azure had functionality to do "server-to-server" transfers of data from public URLs with different data ranges. It seems this is used for built-in transfer of resources from external object storage services such as S3 or GCS. I noticed that you can send as many parallel commands to Azure as you want to do as many transfers in parallel as possible. As it was Google, I'm sure their infrastructure could handle it. I noticed that there were extensions for Chromium browsers that could intercept downloads and get their "final download link". So I glued all this stuff together. Unfortunately, there were some issues with some bugs in Azure that prevented direct downloading of Google links and Azure only exposed their endpoints over HTTP 1.1 which greatly limits the amount of parallel downloads. I noticed that Cloudflare Workers can be used to overcome all these limitations by base64-ing the Google URLs and HTTP3-izing the Azure endpoint. Another great thing is that Cloudflare Workers does not care about charging for ingress and egress bandwidth. Also, like Google, Cloudflare has an absurd amount of bandwidth and peering. With all this combined, I am able to get 6GB/s+ transfers of my 50GB archives from Google Takeout to Azure Storage and am able to back it up periodically without having to setup a VPS, find storage, find bandwidth, or really having any "large" computing or networking resources. I use this toolkit a lot myself and it may be useful for you too if you're in the same situation as me! https://ift.tt/AoT2sw6 April 17, 2022 at 11:00PM

Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home