Tag: Spark

SE-Radio Episode 272: Frances Perry on Apache Beam

Filed in Episodes by on October 25, 2016 2 Comments
SE-Radio Episode 272: Frances Perry on Apache Beam

Jeff Meyerson talks with Frances Perry about Apache Beam, a unified batch and stream processing model. Topics include a history of batch and stream processing, from MapReduce to the Lambda Architecture to the more recent Dataflow model, originally defined in a Google paper. Dataflow overcomes the problem of event time skew by using watermarks and […]

Continue Reading »

SE-Radio Episode 260: Haoyuan Li on Alluxio

Filed in Episodes by on June 14, 2016 0 Comments
SE-Radio Episode 260: Haoyuan Li on Alluxio

Jeff Meyerson talks to Haoyuan Li about Alluxio, a memory-centric distributed storage system. The cost of memory and disk capacity are both decreasing every year–but only the throughput of memory is increasing exponentially. This trend is driving opportunity in the space of big data processing. Alluxio is an open source, memory-centric, distributed, and reliable storage […]

Continue Reading »

SE-Radio Episode 235: Ben Hindman on Apache Mesos

Filed in Episodes by on August 17, 2015 1 Comment
SE-Radio Episode 235: Ben Hindman on Apache Mesos

Ben Hindman talks to Jeff Meyerson about Apache Mesos, a distributed systems kernel. Mesos abstracts away many of the hassles of managing a distributed system. Hindman starts with a high-level explanation of Mesos, explaining the problems he encountered trying to run multiple instances of Hadoop against a single data set. He then discusses how Twitter uses […]

Continue Reading »