Tag: big data

Episode 398: Apache Kudu with Adar Lieber-Dembo

Filed in Episodes by on February 12, 2020 0 Comments
Episode 398: Apache Kudu with Adar Lieber-Dembo

Adar Lieber-Dembo from Cloudera discusses Apache Kudu, which is a columnar data storage system for fast analytics and fast ingestion of large datasets. Kudu takes its inspiration from systems in the Hadoop ecosystem, but it addresses many of their shortcomings. SE Radio’s Akshay Manchale spoke with Adar about motivations behind building Kudu, features available for […]

Continue Reading »

SE-Radio Episode 358: Probabilistic Data Structure for Big Data Problems

Filed in Episodes by on February 27, 2019 1 Comment
SE-Radio Episode 358: Probabilistic Data Structure for Big Data Problems

Andrii Gakhov, author of the book Probabilistic Data Structures and Algorithms for Big Data Applications talks about probabilistic data structures and their application to the big data domain. Host Robert Blumen spoke with Dr. Gakhov about how probabilistic data structures differ from their exact counterparts; hash functions – cryptographic and non-cryptographic; space versus accuracy tradeoffs; […]

Continue Reading »

SE-Radio Episode 346: Stephan Ewen on Streaming Architecture

Filed in Episodes by on November 14, 2018 0 Comments
SE-Radio Episode 346: Stephan Ewen on Streaming Architecture

Stephen Ewen, one of the original creator of Apache Flink discusses streaming architecture. Streaming architecture has become more important because it enables real-time computation on big data. Edaena Salinas spoke with Stephen Ewen about the comparison between batch processing and stream processing. Stephen explained the architecture components and the types of applications that can be […]

Continue Reading »

SE-Radio Episode 260: Haoyuan Li on Alluxio

Filed in Episodes by on June 14, 2016 0 Comments
SE-Radio Episode 260: Haoyuan Li on Alluxio

Jeff Meyerson talks to Haoyuan Li about Alluxio, a memory-centric distributed storage system. The cost of memory and disk capacity are both decreasing every year–but only the throughput of memory is increasing exponentially. This trend is driving opportunity in the space of big data processing. Alluxio is an open source, memory-centric, distributed, and reliable storage […]

Continue Reading »

SE-Radio Episode 235: Ben Hindman on Apache Mesos

Filed in Episodes by on August 17, 2015 1 Comment
SE-Radio Episode 235: Ben Hindman on Apache Mesos

Ben Hindman talks to Jeff Meyerson about Apache Mesos, a distributed systems kernel. Mesos abstracts away many of the hassles of managing a distributed system. Hindman starts with a high-level explanation of Mesos, explaining the problems he encountered trying to run multiple instances of Hadoop against a single data set. He then discusses how Twitter uses […]

Continue Reading »