Tag: hadoop

SE-Radio Episode 260: Haoyuan Li on Alluxio

Filed in Episodes by on June 14, 2016 0 Comments
SE-Radio Episode 260: Haoyuan Li on Alluxio

Jeff Meyerson talks to Haoyuan Li about Alluxio, a memory-centric distributed storage system. The cost of memory and disk capacity are both decreasing every year–but only the throughput of memory is increasing exponentially. This trend is driving opportunity in the space of big data processing. Alluxio is an open source, memory-centric, distributed, and reliable storage […]

Continue Reading »

SE-Radio Episode 235: Ben Hindman on Apache Mesos

Filed in Episodes by on August 17, 2015 1 Comment
SE-Radio Episode 235: Ben Hindman on Apache Mesos

Ben Hindman talks to Jeff Meyerson about Apache Mesos, a distributed systems kernel. Mesos abstracts away many of the hassles of managing a distributed system. Hindman starts with a high-level explanation of Mesos, explaining the problems he encountered trying to run multiple instances of Hadoop against a single data set. He then discusses how Twitter uses […]

Continue Reading »

Episode 229: Flavio Junqueira on Distributed Coordination with Apache ZooKeeper

Filed in Episodes by on June 17, 2015 8 Comments
Episode 229: Flavio Junqueira on Distributed Coordination with Apache ZooKeeper

Flavio Junqueira is the author of Zookeeper: Distributed Process Coordination. Flavio and Jeff Meyerson begin by defining ZooKeeper and talking about what ZooKeeper is and isn’t. ZooKeeper can be thought of as a patch against certain fallacies of distributed computing: that the network is secure, has zero latency, has infinite bandwidth, and so on. With […]

Continue Reading »

Episode 222: Nathan Marz on Real-Time Processing with Apache Storm

Filed in Episodes by on March 6, 2015 6 Comments
Episode 222: Nathan Marz on Real-Time Processing with Apache Storm

Nathan Marz is the creator of Apache Storm, a real-time streaming application. Storm does for stream processing what Hadoop does for batch processing. The project began when Nathan was working on aggregating Twitter data using a queue-and-worker system he had designed. Many companies use Storm, including Spotify, Yelp, WebMD, and many others. Jeff and Nathan […]

Continue Reading »

Episode 199: Michael Stonebraker on Current Developments in Databases

Filed in Episodes by on December 5, 2013 13 Comments
Episode 199: Michael Stonebraker on Current Developments in Databases

Recording Venue: Skype Guest: Michael Stonebraker Dr. Michael Stonebraker, one of the leading researchers and technology entrepreneurs in the database space, joins Robert for a discussion of database architecture and the emerging NewSQL family of databases. Dr. Stonebraker opens with his take on how the database market is segmented around a small number of use […]

Continue Reading »