Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are a few out there ready to go. Here's one

https://github.com/hartator/wayback-machine-downloader

You just dump and sync to s3 and use terraform to provision the route53 and bucket setups.

Yes they are mostly content sites. The hardest part is filtering adult domains assuming you don't want them. There are a staggering number of adult domains that expire every year and get huge traffic.



So you're purely dumping static html pages on those domains?

Can you grab assets like images or even video from archive.org?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: