Zero-ETL, What's that?
1 Tech Idea, 1 Action, 1 Look back
In the mail 📧
One Tech idea 💥: Zero-ETL
One Small Action💡: 2023 Resolutions? - Don’t give up just yet
DevRetro2022 🌍 : Delivering hope via my 2022 reflection
Read time: 2 minutes
Data Warehousing combines data from multiple sources and stores them in centralized storage.
Conventionally, this is achieved by ETL systems, a combination of tools and code(Python/Spark/SQL) to retrieve and transform the data. For Large systems with TBs of data, this might not be a great solution. Since the data is duplicated from its source and pushed into the warehouse, duplication also means that they must be kept in constant sync, and data quality must be ensured at all times.
Zero-ETL enables you to integrate data from different sources and run federated queries on top of them, without explicit ETL pipelines. This means the data can remain in it’s source but can still be accessed through a centralized warehouse, removing storage and duplication problem. The data is also available near real time, working past data freshness and sync issues.
Amazon recently announced ETL-free integration between
Aurora and Redshift.
Amazon Redshift and Apache Spark
One downside of Zero-ETL is with the current implementations there is little support for transformation and compliance. But this is just the beginning of the Zero-ETL era. We will keep a watch for more updates.
One Small Action 💡
It’s common to fall off our new year goals after the 1st week of Jan. After all, we are all human, and our willpower is limited. To keep you going, here is a slight nudge. Fill the following in a sticky note and put it where you can see it.
I will [action] every [frequency] for the next ____ weeks
My 2022 Reflection💡
2022 was a mixture of good and bad, Uncertain times with specific actions.
Thanks for reading Tech Deep Dives! Subscribe for free to receive new posts and support my work.