dbt python model + duckdb spark api #526

rchui · 2025-03-27T00:01:18Z

rchui
Mar 27, 2025

Is it possible to use the spark API with dbt python models to programmatically construct queries? The duckdb docs only reference starting from a dataframe or referencing a table by name. Some how you'd need to pair it with the dbt ref mechanism and pass the query plan back up to duckdb without pulling down any data.

rchui · 2025-03-29T01:33:14Z

rchui
Mar 29, 2025
Author

This works as expected, but I'm pretty sure the toArrow() is just dumping the table locally and then re-uploading it back to duckdb. Is there a way to avoid this? Would it be possible to change the Python model to be able to pass back through a duckdb spark dataframe?

import duckdb
from duckdb.experimental.spark.sql import functions as F

spark = SparkSession.builder.getOrCreate()


def model(dbt: Any, session: Any) -> Any:
    dbt.ref("table")

    spark.conn = session
    return (
        spark.table("table")
        .select(F.col("item_id"), F.col("location_id"), F.col("company_id"))
        .toArrow()
    )

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dbt python model + duckdb spark api #526

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

dbt python model + duckdb spark api #526

Uh oh!

rchui Mar 27, 2025

Replies: 1 comment

Uh oh!

rchui Mar 29, 2025 Author

rchui
Mar 27, 2025

rchui
Mar 29, 2025
Author