Episode 424: Sean Knapp on Dataflow Pipeline Automation
Sean Knapp, CEO of Ascend.io, discusses data pipelines and data pipeline automation. Sean spoke with Host Robert Blumen about the ubiquity of data pipelines; what data pipelines do; where the data comes from, how it is transformed, where it goes; and what it is used for (analytics, machine learning, reporting, alerting, business intelligence). Semi-automated and ad-hoc automation. Costly manual recovery from failure modes. Partial failures and bulk redo. Pipeline automation. Why automate? The orchestration layer; architecture of the orchestration layer. What type of state does the orchestration layer keep? Failure modes and optimizing redo. Monitoring pipelines. Privacy and pipelines. Pipeline automation-as a-services.
Related Links
- SE Radio 198: Wil van der Aalst on Workflow Management
- SE Radio 351: Bernd Rücker on Orchestrating Microservices with Workflow Management
- SE Radio 289: James Turnbull on Declarative Programming
- Ascend.io
- Sean Knapp on Twitter, LinkedIn
- Rebuilding Reliable Data Pipelines Through Modern Tools by Ted Malaska
- Data Pipelines with Apache Airflow by Bas Harenslak and Julian de Ruiter
- Pipeline Driven by Roy Osherove
SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)
Podcast: Play in new window | Download
Subscribe: Apple Podcasts | RSS
Tags: big data, database, devops, distributed systems, IEEE Computer Society, podcast, SE-Radio, workflow