Ā· Datumology Ā· Data Stack Ā· 3 min read
Edge Data Stack: DuckDB, dbt, evidence.dev, and marimo
Exploring a lean, powerful data stack for modern analytics challenges.
Introduction
In the rapidly evolving landscape of data engineering, thereās a growing trend towards lean, efficient, and developer-friendly tools. This āEdge Data Stackā philosophy emphasizes leveraging modern, often open-source components that are simple to set up, focused on code-driven workflows, and highly performant. This article explores a compelling edge stack combination: DuckDB, dbt, Evidence.dev, and Marimo.
Why these specific tools?
- DuckDB: Serves as the core analytical database. Itās an in-process powerhouse known for exceptional performance and ease of use, especially for local or single-node projects. Its SQL-centric design eliminates the need for a separate server.
- dbt: Brings software engineering best practices (version control, testing, modularity) to data transformation, running directly against DuckDB using the
dbt-duckdbplugin. It enables building reliable models using SQL. - Evidence.dev & Marimo: Represent the āBI-as-Codeā layer for visualization and exploration.
- Evidence.dev: Builds data apps and dashboards using SQL and Markdown, outputting static sites that integrate well with version control.
- Marimo: Offers a reactive Python notebook environment for interactive exploration and building simple data apps, complementing Evidence or serving as a dynamic alternative.
Together, these tools form a powerful yet manageable stack, reducing infrastructure overhead and prioritizing developer experience for modern data challenges.
Deeper Dive into DuckDB
DuckDB stands out as the analytical engine in this edge stack due to several key characteristics:
- In-Process Database: Unlike traditional client-server databases (like PostgreSQL or MySQL), DuckDB runs inside the application thatās using it (e.g., your Python script, R session, or even a BI tool). Thereās no separate database server to install, manage, or connect to. Data is typically stored in a single file (like
my_database.db), simplifying deployment and local development significantly. - Optimized for Analytics (OLAP): While tools like SQLite are great for transactional tasks (OLTP), DuckDB is specifically designed for Online Analytical Processing (OLAP). It uses techniques like vectorized query execution and columnar storage internally, allowing it to process large amounts of data and complex analytical SQL queries much faster than row-oriented databases or generic tools like Pandas for many analytical workloads.
- Rich SQL Support: DuckDB offers a comprehensive SQL dialect, supporting standard SQL features like window functions, common table expressions (CTEs), complex joins, and aggregations needed for analytical tasks. It aims for high compatibility with PostgreSQL syntax.
- Direct Data Querying: A major advantage is DuckDBās ability to directly query data in various formats without needing to import it first. You can run SQL queries directly on Parquet files, CSVs, JSON files, and more. This makes it incredibly powerful for exploring data stored in local files or even data lakes.
- Extensibility: DuckDB has a growing ecosystem of extensions for specialized tasks, such as spatial data analysis (
spatial), full-text search (fts), interacting with other databases (postgres_scanner), and more. - Ideal Use Cases:
- Local Data Analysis & Exploration: Quickly analyze datasets that fit on your machine but might be slow or cumbersome with tools like Pandas.
- Powering Interactive Dashboards: Acts as a fast backend for BI-as-code tools like Evidence or Streamlit.
- Data Transformation: Serves as an efficient engine for dbt transformation pipelines.
- Teaching & Learning SQL: Its ease of setup makes it great for learning analytical SQL.
- Embedded Analytics: Can be embedded directly into applications to provide analytical capabilities.
Essentially, DuckDB provides the speed and SQL power of a dedicated analytical database without the operational overhead, making it a perfect fit for lean, efficient data workflows.