Hi HN,
I’ve been a data engineer for over a decade and have slowly been working on a way for 1 person to run a data platform (mostly to automate my job)
When I was starting, I found the wide range of tools required to build data products overwhelming. Infrastructure setup, app deployment, pipeline development, creating metrics, dashboards and finally, integrating ML into product. No wonder companies have 100s of people working on this. So last year I quit the best job in the world to figure this out.
I’d like to introduce Phidata: Building Blocks for Data Engineering
It works like this:
1. You start with a codebase that has common data tools like Airflow, Superset and Jupyter pre-configured. Infrastructure and Apps are defined as python objects.
2. Build data products (tables, metrics, models) in python or SQL. Test locally using docker and run production on AWS.
3. Infrastructure, Apps and Data Products live in the same codebase. Teams working together share code and dependencies in a pythonic way.
Using phidata, I’ve been running multiple data platforms and have automated most of my boilerplate code using a specially trained GPT-3 data assistant.
If you work with data and are looking for a better development experience, checkout [phidata.com](https://www.phidata.com/) or message me, I’d love your feedback.
Ashpreet