Ā· Datumology Ā· Data Stack  Ā· 3 min read

Edge Data Stack: DuckDB, dbt, evidence.dev, and marimo

Exploring a lean, powerful data stack for modern analytics challenges.

Introduction

In the rapidly evolving landscape of data engineering, there’s a growing trend towards lean, efficient, and developer-friendly tools. This ā€œEdge Data Stackā€ philosophy emphasizes leveraging modern, often open-source components that are simple to set up, focused on code-driven workflows, and highly performant. This article explores a compelling edge stack combination: DuckDB, dbt, Evidence.dev, and Marimo.

Why these specific tools?

  1. DuckDB: Serves as the core analytical database. It’s an in-process powerhouse known for exceptional performance and ease of use, especially for local or single-node projects. Its SQL-centric design eliminates the need for a separate server.
  2. dbt: Brings software engineering best practices (version control, testing, modularity) to data transformation, running directly against DuckDB using the dbt-duckdb plugin. It enables building reliable models using SQL.
  3. Evidence.dev & Marimo: Represent the ā€œBI-as-Codeā€ layer for visualization and exploration.
    • Evidence.dev: Builds data apps and dashboards using SQL and Markdown, outputting static sites that integrate well with version control.
    • Marimo: Offers a reactive Python notebook environment for interactive exploration and building simple data apps, complementing Evidence or serving as a dynamic alternative.

Together, these tools form a powerful yet manageable stack, reducing infrastructure overhead and prioritizing developer experience for modern data challenges.

Deeper Dive into DuckDB

DuckDB stands out as the analytical engine in this edge stack due to several key characteristics:

  • In-Process Database: Unlike traditional client-server databases (like PostgreSQL or MySQL), DuckDB runs inside the application that’s using it (e.g., your Python script, R session, or even a BI tool). There’s no separate database server to install, manage, or connect to. Data is typically stored in a single file (like my_database.db), simplifying deployment and local development significantly.
  • Optimized for Analytics (OLAP): While tools like SQLite are great for transactional tasks (OLTP), DuckDB is specifically designed for Online Analytical Processing (OLAP). It uses techniques like vectorized query execution and columnar storage internally, allowing it to process large amounts of data and complex analytical SQL queries much faster than row-oriented databases or generic tools like Pandas for many analytical workloads.
  • Rich SQL Support: DuckDB offers a comprehensive SQL dialect, supporting standard SQL features like window functions, common table expressions (CTEs), complex joins, and aggregations needed for analytical tasks. It aims for high compatibility with PostgreSQL syntax.
  • Direct Data Querying: A major advantage is DuckDB’s ability to directly query data in various formats without needing to import it first. You can run SQL queries directly on Parquet files, CSVs, JSON files, and more. This makes it incredibly powerful for exploring data stored in local files or even data lakes.
  • Extensibility: DuckDB has a growing ecosystem of extensions for specialized tasks, such as spatial data analysis (spatial), full-text search (fts), interacting with other databases (postgres_scanner), and more.
  • Ideal Use Cases:
    • Local Data Analysis & Exploration: Quickly analyze datasets that fit on your machine but might be slow or cumbersome with tools like Pandas.
    • Powering Interactive Dashboards: Acts as a fast backend for BI-as-code tools like Evidence or Streamlit.
    • Data Transformation: Serves as an efficient engine for dbt transformation pipelines.
    • Teaching & Learning SQL: Its ease of setup makes it great for learning analytical SQL.
    • Embedded Analytics: Can be embedded directly into applications to provide analytical capabilities.

Essentially, DuckDB provides the speed and SQL power of a dedicated analytical database without the operational overhead, making it a perfect fit for lean, efficient data workflows.

Back to Blog

Related Posts

View All Posts Ā»

DuckDB for Edge Data Analytics

Exploring how DuckDB enables powerful analytics at the edge, bringing data processing closer to where data is generated.