Replies: 1 comment
-
|
This works as expected, but I'm pretty sure the toArrow() is just dumping the table locally and then re-uploading it back to duckdb. Is there a way to avoid this? Would it be possible to change the Python model to be able to pass back through a duckdb spark dataframe? import duckdb
from duckdb.experimental.spark.sql import functions as F
spark = SparkSession.builder.getOrCreate()
def model(dbt: Any, session: Any) -> Any:
dbt.ref("table")
spark.conn = session
return (
spark.table("table")
.select(F.col("item_id"), F.col("location_id"), F.col("company_id"))
.toArrow()
) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is it possible to use the spark API with dbt python models to programmatically construct queries? The duckdb docs only reference starting from a dataframe or referencing a table by name. Some how you'd need to pair it with the dbt ref mechanism and pass the query plan back up to duckdb without pulling down any data.
Beta Was this translation helpful? Give feedback.
All reactions