Data Pipelines Pocket Reference: Moving and Processing Data for Data Science
by James DensmoreA concise guide to designing and implementing data pipelines, crucial for understanding workflow automation.
Airflow in Action
by Denny Lee and James DensmoreAn in-depth exploration of Apache Airflow's capabilities, perfect for mastering orchestration techniques.
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin KleppmannCovers foundational principles of data architecture, essential for building robust data pipelines.
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
by Tyler Akidau, Slava Chernyak, and Reuven LaxExplores modern data processing paradigms, offering insights into real-time data workflows.
Building Data Streaming Applications with Apache Kafka
by Manish KumarA practical guide to integrating Kafka with data pipelines, enhancing data processing capabilities.
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
by Omar B. Al-Hashimi and Ayman A. El-HalabiFocuses on building scalable data pipelines using Spark, complementing Airflow's orchestration.
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
by Ralph Kimball and Margy RossEssential for understanding data warehousing concepts, vital for effective data pipeline design.
Fundamentals of Data Engineering: Plan and Build Robust Data Systems
by Joe Reis and Matt HousleyA comprehensive resource for mastering data engineering, from architecture to implementation.