Tag: big data
Episode 424: Sean Knapp on Dataflow Pipeline Automation

Sean Knapp, CEO of Ascend.io, discusses data pipelines and data pipeline automation. Sean spoke with Host Robert Blumen about the ubiquity of data pipelines; what data pipelines do; where the data comes from, how it is transformed, where it goes; and what it is used for (analytics, machine learning, reporting, alerting, business intelligence). Semi-automated and […]
Episode 398: Apache Kudu with Adar Lieber-Dembo

Adar Lieber-Dembo from Cloudera discusses Apache Kudu, which is a columnar data storage system for fast analytics and fast ingestion of large datasets. Kudu takes its inspiration from systems in the Hadoop ecosystem, but it addresses many of their shortcomings. SE Radio’s Akshay Manchale spoke with Adar about motivations behind building Kudu, features available for […]
SE-Radio Episode 358: Probabilistic Data Structure for Big Data Problems

Andrii Gakhov, author of the book Probabilistic Data Structures and Algorithms for Big Data Applications talks about probabilistic data structures and their application to the big data domain. Host Robert Blumen spoke with Dr. Gakhov about how probabilistic data structures differ from their exact counterparts; hash functions – cryptographic and non-cryptographic; space versus accuracy tradeoffs; […]
SE-Radio Episode 346: Stephan Ewen on Streaming Architecture

Stephen Ewen, one of the original creator of Apache Flink discusses streaming architecture. Streaming architecture has become more important because it enables real-time computation on big data. Edaena Salinas spoke with Stephen Ewen about the comparison between batch processing and stream processing. Stephen explained the architecture components and the types of applications that can be […]
SE-Radio Episode 260: Haoyuan Li on Alluxio

Jeff Meyerson talks to Haoyuan Li about Alluxio, a memory-centric distributed storage system. The cost of memory and disk capacity are both decreasing every year–but only the throughput of memory is increasing exponentially. This trend is driving opportunity in the space of big data processing. Alluxio is an open source, memory-centric, distributed, and reliable storage […]