Designing Data-Intensive Applications
by Martin KleppmannA comprehensive guide to building systems that handle large volumes of data, focusing on data models, storage, and processing.
Kafka: The Definitive Guide
by Neha Narkhede, Gwen Shapira, and Todd PalinoAn essential resource for mastering Apache Kafka, covering its architecture, use cases, and best practices for real-time data processing.
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
by Tyler Akidau, Slava Chernyak, and Reuven LaxOffers a deep dive into the principles of streaming data systems, exploring architectures and frameworks for real-time processing.
Data Quality: The Accuracy Dimension
by Jack E. OlsonFocuses on ensuring data quality in pipelines, providing techniques to maintain integrity and accuracy throughout data processing.
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
by Vikram GoyalExplores various techniques for analyzing real-time data streams, enhancing your ability to derive insights from live data.
Building Data Streaming Applications with Apache Kafka
by Manish KumarA practical guide to developing streaming applications, emphasizing the integration of Kafka with various data sources.
Streaming Data: Understanding the Real-Time Pipeline
by Andrew PsaltisPresents a comprehensive overview of streaming data architectures, focusing on the challenges and solutions in real-time processing.
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
by Blaine Sundrud and Danil ZburivskyCovers data engineering principles using Spark and Delta Lake, relevant for understanding data processing in modern pipelines.