Tag: hadoop

Episode 229: Flavio Junqueira on Distributed Coordination with Apache ZooKeeper

Filed in Episodes by on June 17, 2015 1 Comment
Episode 229: Flavio Junqueira on Distributed Coordination with Apache ZooKeeper

Flavio Junqueira is the author of Zookeeper: Distributed Process Coordination. Flavio and Jeff Meyerson begin by defining ZooKeeper and talking about what ZooKeeper is and isn’t. ZooKeeper can be thought of as a patch against certain fallacies of distributed computing: that the network is secure, has zero latency, has infinite bandwidth, and so on. With […]

Continue Reading »

Episode 222: Nathan Marz on Real-Time Processing with Apache Storm

Filed in Episodes by on March 6, 2015 0 Comments
Episode 222: Nathan Marz on Real-Time Processing with Apache Storm

Nathan Marz is the creator of Apache Storm, a real-time streaming application. Storm does for stream processing what Hadoop does for batch processing. The project began when Nathan was working on aggregating Twitter data using a queue-and-worker system he had designed. Many companies use Storm, including Spotify, Yelp, WebMD, and many others. Jeff and Nathan […]

Continue Reading »

Episode 199: Michael Stonebraker on Current Developments in Databases

Filed in Episodes by on December 5, 2013 13 Comments
Episode 199: Michael Stonebraker on Current Developments in Databases

Recording Venue: Skype Guest: Michael Stonebraker Dr. Michael Stonebraker, one of the leading researchers and technology entrepreneurs in the database space, joins Robert for a discussion of database architecture and the emerging NewSQL family of databases. Dr. Stonebraker opens with his take on how the database market is segmented around a small number of use […]

Continue Reading »

Episode 193: Apache Mahout

Filed in Episodes by on April 22, 2013 2 Comments
Episode 193: Apache Mahout

Recording Venue: Skype Guest: Grant Ingersoll Grant Ingersoll, founder of the Mahout project, talks with Robert about machine learning.   The conversation begins with an introduction to machine learning and the forces driving the adoption of this technique. Grant explains the three main use cases, similarity metrics, supervised versus unsupervised learning, and the use of large data […]

Continue Reading »