Jeff Meyerson talks with Frances Perry about Apache Beam, a unified batch and stream processing model. Topics include a history of batch and stream processing, from MapReduce to the Lambda Architecture to the more recent Dataflow model, originally defined in a Google paper. Dataflow overcomes the problem of event time skew by using watermarks and other methods discussed between Jeff and Frances. Apache Beam defines a way for users to define their pipelines in a way that is agnostic of the underlying execution engine, similar to how SQL provides a unified language for databases. This seeks to solve the churn and repeated work that has occurred in the rapidly evolving stream processing ecosystem.
- Twitter, https://twitter.com/francesjperry
- Apache Beam, http://http://beam.incubator.apache.org/
- Dataflow paper, http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
- Stream Processing 101, https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
- Stream Processing 102, https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
- Dataflow YouTube Video, https://www.youtube.com/watch?v=3UfZN59Nsk8