JAX London, 3–6 October 2022
The Conference for Java & Software Innovation

“No one at Google uses MapReduce anymore” – Cloud Dataflow explained

This talk originates from the archive. To the CURRENT program
Until June 23 ✓ Save up to £668 ✓ Workshops day for free ✓ Arduino Starter Kit or Amazon Echo Dot for free Register now
Tuesday, October 11 2016
15:15 - 16:05

The MapReduce paper, published by Google more than 10 years ago (2004!), sparked the parallel processing revolution and gave birth to countless open source and research projects. The MapReduce model is now officially obsolete, so the new data processing models we use are called Flume (for the processing pipeline definition) and MillWheel (for the real-time dataflow orchestration). They are known externally as Cloud Dataflow / Apache Beam. They  allow you to specify both batch and real-time data processing pipelines in Java and have them deployed and maintained automatically – and yes, dataflow can deploy lots of machines to handle Google-scale problems.

What is the magic behind the scenes ? What is the post-MapReduce dataflow model ? What is a streaming-first model ? What are the flow optimization algorithms ? Read the papers or come for a walk through the algorithms with me.

Behind the Tracks

Software Architecture & Design
Software innovation & more
Architecture structure & more
Agile & Communication
Methodologies & more
Emerging Technologies
Everything about the latest technologies
DevOps & Continuous Delivery
Delivery Pipelines, Testing & more
Cloud & Modern Infrastructure
Everything about new tools and platforms
Big Data & Machine Learning
Saving, processing & more