I run a data warehouse for sports data, with a web frontend. Lots of integrations with 3rd party APIs to pull in data.
Monster server (128GB RAM, 18 core) on Hetzner ($60/mo). CloudFlare to front everything to add a little extra security mostly - I only allow my home IP and CF to connect directly to the app/db/web server.
* MySQL, because one key source has a library that ingests easily to MySQL
* Postgres for the main warehouse and app data (fdw to pull in MySQL data)
* Django for the webapp, with celery and redis for async jobs
* SES for email
* Prefect (vs Airflow) for running data ingestion/transform jobs
* DBT (triggered by prefect) for transform jobs
* Netlify for the static site frontend, jekyll for static site gen
How has your experience been with Prefect? I've been looking at it for a few of my own projects, but haven't fully committed to using it yet (Prefect has a lot of features and my projects only require a fraction of them, so I'm opting to use "lighter weight" solutions instead).
It's ok. Minimal overhead - basically just a remote scheduling and job metadata tool. Their v2 upgrade is nice but not quite stable yet. I just wanted something better than a cron job, and heard bad things about running your own Airflow (and Astronomer is too expensive for this project)
Monster server (128GB RAM, 18 core) on Hetzner ($60/mo). CloudFlare to front everything to add a little extra security mostly - I only allow my home IP and CF to connect directly to the app/db/web server.
* MySQL, because one key source has a library that ingests easily to MySQL
* Postgres for the main warehouse and app data (fdw to pull in MySQL data)
* Django for the webapp, with celery and redis for async jobs
* SES for email
* Prefect (vs Airflow) for running data ingestion/transform jobs
* DBT (triggered by prefect) for transform jobs
* Netlify for the static site frontend, jekyll for static site gen
* Google Analytics