datafusion-go is a Go database/sql driver and Arrow bridge backed by Apache DataFusion's in-process query engine.
Use it when you want analytic SQL from Go without running a database server, while still having access to Arrow record batches for callers that need exact Arrow schemas or complex values.
- Current module version:
v0.530100.1. - Bundled Apache DataFusion version:
53.1.0. - Minimum Go version:
1.24. - Normal execution requires cgo and a C toolchain.
- Transactions are not supported.
- The DSN opens an in-process DataFusion session, not a remote host or file-backed embedded database.
- Release automation publishes native libraries for
darwin-arm64,darwin-amd64,linux-amd64,linux-arm64, andwindows-amd64. Windows arm64 is not currently bundled.
Release metadata is maintained in versions.toml. Release tags are derived as v<major>.<encoded-datafusion-version>.<patch>, where DataFusion 53.1.0 encodes as 530100.
Apache DataFusion is a Rust query engine built around Arrow. datafusion-go wraps it for Go applications that already use database/sql, but it does not force every result through scalar row conversion.
The package has two main paths:
database/sqlfor ordinary SQL execution, prepared statements, connection pooling, and scalar row scanning.- Arrow-native APIs for streaming
arrow.RecordBatchvalues and registering Go Arrow readers as DataFusion tables.
The driver is intentionally in-process. It is useful for local analytic execution, tests, tools, embedded query features, and services that want DataFusion inside the Go process. It is not a network database client, and it does not make DataFusion behave like a persistent file database.
go get github.com/datafusion-contrib/datafusion-goInstall from a tagged release for normal consumer use. Pseudo-versions from @main are development snapshots and may not have matching GitHub Release assets for the native runtime downloader.
Normal builds require:
- Go with
CGO_ENABLED=1. - A C toolchain for the target platform.
- A compatible
libdatafusion_gonative library at runtime.
Default builds use cgo but do not link DataFusion at Go link time. At runtime, the driver looks for a shared native library in this order:
DATAFUSION_GO_LIBRARY, if set.internal/native/lib/<goos>-<goarch>/in a source checkout.- A checksum-verified library downloaded from the matching GitHub Release into the user cache.
Set DATAFUSION_GO_NO_DOWNLOAD=1 to disable automatic release-asset downloads. Set DATAFUSION_GO_DOWNLOAD_BASE to override the release download base URL.
CGO_ENABLED=0 is not supported for normal use; the package returns a clear datafusion-go requires cgo error in that mode.
Import the driver for registration and open the datafusion driver with an empty DSN:
package main
import (
"context"
"database/sql"
"fmt"
"log"
_ "github.com/datafusion-contrib/datafusion-go"
)
func main() {
db, err := sql.Open("datafusion", "")
if err != nil {
log.Fatal(err)
}
defer db.Close()
var one int64
if err := db.QueryRowContext(context.Background(), "select 1").Scan(&one); err != nil {
log.Fatal(err)
}
fmt.Println(one)
}Runnable examples are available in:
The package registers itself with database/sql as datafusion.
db, err := sql.Open("datafusion", "")A sql.DB opened by one connector shares a DataFusion SessionContext by default. This matches typical database/sql expectations: catalog changes such as CREATE VIEW are visible across pooled connections from the same sql.DB.
Use isolated sessions when you want each physical connection to have its own DataFusion session state:
db, err := sql.Open("datafusion", "?datafusion.go.shared_session=false")Or configure it directly on a connector:
connector, err := datafusion.NewConnectorWithInitContext(
"",
nil,
datafusion.WithSharedSession(false),
)Supported DSNs are:
""?<options>datafusion://datafusion://?<options>
Query parameters are passed to DataFusion as session configuration options:
?datafusion.execution.batch_size=8192
Driver-owned options use the datafusion.go. prefix and are stripped before the remaining options are passed to DataFusion:
?datafusion.go.shared_session=false
File paths, hosts, and other URL forms are rejected. The URL form exists only to carry session options.
Use ordinary database/sql calls when rows contain scalar Arrow values that can be represented as database/sql/driver.Value values:
rows, err := db.QueryContext(ctx, "select 1 as value")Use QueryArrowContext when you need exact Arrow schemas, record-batch streaming, or complex Arrow values that database/sql cannot scan:
conn, err := db.Conn(ctx)
if err != nil {
return err
}
defer conn.Close()
reader, err := datafusion.QueryArrowContext(ctx, conn, "select 1 as value")
if err != nil {
return err
}
defer reader.Close()Records returned by Read must be released by the caller.
Use NewConnector or NewConnectorWithInitContext when a pooled database needs setup SQL before use.
connector, err := datafusion.NewConnectorWithInitContext(
"",
func(ctx context.Context, exec driver.ExecerContext) error {
_, err := exec.ExecContext(ctx, "create view nums as select 1 as n", nil)
return err
},
)
if err != nil {
return err
}
defer connector.Close()
db := sql.OpenDB(connector)
defer db.Close()In shared-session mode, the initialization callback runs once per connector. In isolated-session mode, it runs for each connection and reset.
DataFusion prepares one SQL statement at a time. Split migration or setup scripts before calling the driver, then execute the statements in order:
err := datafusion.ExecStatements(ctx, db, []string{
"create view one as select 1 as n",
"create view two as select 2 as n",
})The helper skips blank statements and wraps errors with the statement index.
Query contexts are bridged to native cancellation. Cancellation is checked during planning, stream creation, and record-batch reads.
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
rows, err := db.QueryContext(ctx, "select * from some_large_table")Native cancellation errors can be matched with errors.Is(err, datafusion.ErrNativeCancelled).
The driver supports DataFusion SQL parameters through database/sql:
row := db.QueryRowContext(ctx, "select ? + 1, ?", int64(41), "x")Supported ordinary parameter values are:
nilbool- signed integer types that fit
int64 - unsigned integer types as DataFusion
UInt64 - floating-point values as
float64 string[]bytetime.TimeasTimestamp[ns]time.DurationasDuration[ns]
time.Time preserves loadable IANA locations such as America/New_York. Fixed-offset, local, or otherwise non-loadable locations bind as UTC unless TimestampWithTimeZone is used. float32 values are promoted to DataFusion Float64 through database/sql conversion.
Other values are rejected by CheckNamedValue before native execution.
Use typed wrappers when inference would be ambiguous or when exact Arrow/DataFusion types matter:
row := db.QueryRowContext(
ctx,
"select $1, $2, $3, $4, $5",
datafusion.DateFromTime(day),
datafusion.TimeFromTime(clock),
datafusion.DurationFromTime(2*time.Second),
datafusion.DecimalString("123.45", 10, 2),
datafusion.NullOf(datafusion.ParameterInt64),
)Available wrappers cover UInt64, Date, Time, Timestamp, Duration, Decimal, and typed nulls through NullOf, NullDecimal, and NullTimestamp.
Bare nil binds as DataFusion's untyped null. Use NullOf, NullDecimal, or NullTimestamp when DataFusion needs a concrete type, for example select $1 + 1 with NullOf(ParameterInt64).
Prepared statements report Stmt.NumInput for ? placeholders, $1/$2 positional parameters, and distinct $name parameters. Repeated named parameters count once; each ? occurrence counts as a separate positional parameter. Mixed question-mark, dollar-numbered, and named parameter styles are rejected during prepare.
Named statements must be executed with matching sql.Named arguments. Positional arguments for named statements, named arguments for positional statements, missing names, extra names, and duplicate supplied names are rejected before query execution is handed to DataFusion.
QueryArrowContext executes SQL on a *sql.Conn and returns a closeable Arrow reader:
reader, err := datafusion.QueryArrowContext(ctx, conn, "select $1", int64(42))
if err != nil {
return err
}
defer reader.Close()
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
return err
}
// Use record.
record.Release()
}The Arrow reader owns native stream resources and must be closed. A finalizer is installed as a leak safety net, but finalizers are not prompt cleanup and should not be relied on for normal resource management.
Arrow record readers can be registered as in-memory DataFusion tables:
rdr, err := array.NewRecordReader(schema, []arrow.RecordBatch{batch})
if err != nil {
return err
}
defer rdr.Release()
if err := datafusion.RegisterArrowReader(ctx, conn, "events", rdr); err != nil {
return err
}RegisterArrowReader consumes the remaining batches from the reader, serializes them as an Arrow IPC stream, and registers decoded Rust-owned batches. The copy is intentional: a registered table can outlive the cgo call, while ordinary Go Arrow arrays can contain Go-owned buffers that native code must not retain.
RegisterArrowReaderZeroCopy skips the IPC copy by exporting the reader through the Arrow C Stream Interface. Use it only when every exported Arrow buffer is valid for native code to retain until the table is dropped or the owning session/connector closes, for example data built with Arrow Go's memory/mallocator package or another C/foreign allocator.
Do not use the zero-copy API with ordinary Go-allocated Arrow buffers unless you fully own that lifetime and cgo pointer-safety tradeoff.
Important exported APIs:
Driver: thedatabase/sql/driver.Driverregistered asdatafusion.Connector: owns the native DataFusion database handle used to open pooled connections.NewConnector: creates a connector with an optional initialization callback.NewConnectorWithInitContext: creates a connector whose initialization callback receives a context.WithSharedSession: controls whether connections from one connector share a DataFusionSessionContext.ExecStatements: executes an already-split slice of SQL statements.QueryArrowContext: streams Arrow record batches from a*sql.Conn.RegisterArrowReader: safely registers a Go Arrow reader as a DataFusion table through IPC copy.RegisterArrowReaderZeroCopy: registers Arrow data without IPC copy when native buffer lifetime is guaranteed.Error: structured driver errors with operation type and native error kind.DataFusionVersionandDataFusionGoVersion: generated version constants.
See the Go package documentation for the complete API reference.
database/sql row conversion currently supports:
| Arrow type family | Go value |
|---|---|
| Null | nil |
| Bool | bool |
| Signed integers | int64 |
| Unsigned integers | int64 when in range |
| Float16/Float32/Float64 | float64 |
| Utf8/LargeUtf8/StringView | string |
| Binary/LargeBinary/FixedSizeBinary/BinaryView | []byte |
| Date/Time/Timestamp | time.Time |
| Duration | int64 nanoseconds |
| Decimal | string |
| Intervals | string |
Column metadata is exposed through database/sql where Arrow schema information is precise: nullable columns use typed sql.Null* scan types where practical, fixed-size binary columns report length, decimal columns report precision and scale, and temporal/interval database type names include their Arrow unit or interval subtype.
Variable-width string and binary columns do not report declared lengths because DataFusion's Arrow result schema does not preserve SQL declarations such as VARCHAR(32).
Time-only values are returned as UTC time.Time values on the Unix epoch date. Duration values are returned as int64 nanoseconds because time.Duration is not a legal database/sql/driver.Value. Interval values are returned as strings that preserve their month, day, millisecond, and nanosecond components.
Nested and complex Arrow values such as lists, structs, maps, unions, dictionaries, extensions, and run-end encoded values are rejected from database/sql row conversion when schema information is available. Use the Arrow-native reader for exact batch data.
PrepareContextvalidates SQL syntax with DataFusion's parser.- A prepared query must contain exactly one SQL statement. Multiple result sets are not supported;
Rows.NextResultSetreports no additional result sets. Stmt.NumInputreports parser-derived parameter counts for question-mark, dollar-numbered, or named parameters.- Connections from one connector share a DataFusion
SessionContextby default. - Shared sessions intentionally share session-scoped catalog/config mutations, including
CREATE VIEW,DROP VIEW, andSET, across all connections from the same connector. - Non-query statements are serialized at the connector level across
ExecContext,QueryContext, andQueryArrowContext; queries can still observe normal ordering effects if they run concurrently with DDL. - Pooled reuse implements
driver.SessionResetter. Shared sessions validate the connection and preserve shared session state; isolated sessions recreate the underlyingSessionContextand rerun the connector initialization callback when one was provided. - Connections implement
driver.Validator; closed connections are reported invalid before they return to the pool. - Native errors carry machine-readable error kinds across the C ABI and are exposed on
*datafusion.Error.NativeKind. errors.Iscan match native sentinel errors:ErrNativeCancelled,ErrNativeInvalidArgument,ErrNativeFailure, andErrNativePanic.RowsAffectedreturns0by default and reports the sum of a single integercount,rows_affected, orrowsaffectedoutput column when DataFusion emits one.LastInsertIdreturns0, nil; DataFusion does not expose insert IDs through this driver.- Prepared statement handles are reused when callers use
db.Prepareorconn.PrepareContext, but DataFusion still plans and executes each run; the driver does not cache DataFusion physical plans. Closeis idempotent for connectors, connections, statements, rows, and Arrow readers.- Transactions are unsupported and return explicit unsupported errors.
BeginTxstill honors an already-canceled context before returning the unsupported transaction error.
Default builds dynamically load a shared native library at runtime. This keeps ordinary Go builds from linking the DataFusion native library directly.
Source checkouts should run make bundle or make test before invoking Go tests that open the driver directly:
make testmake test builds the Rust shim, copies the generated native archive and shared library to internal/native/lib/<goos>-<goarch>/, and runs the dynamic and bundled Go test paths.
Development link modes:
make rust.build
go test -tags=datafusion_use_bundled ./...
go test -tags=datafusion_use_source ./...
go test -tags=datafusion_use_static_lib ./...datafusion_use_bundledlinks the static archive frominternal/native/lib/<goos>-<goarch>.datafusion_use_sourceanddatafusion_use_static_liblink fromrust/target/release, so runmake rust.buildfirst.- On
windows-amd64,make rust.buildtargets Rust'sx86_64-pc-windows-gnuABI so Go cgo and Rust produce compatible static objects.
Shared-library link mode links with -ldatafusion_go and requires libdatafusion_go to be on the system linker path:
CGO_LDFLAGS="-L/path/to/lib" go test -tags=datafusion_use_lib ./...versions.toml is the only human-maintained release/version source.
After changing it, regenerate derived Go/Rust version files and the Rust lockfile:
make generate
make generate.checkDo not hand-edit generated version constants in:
- version.go
- internal/native/version_generated.go
- rust/src/generated.rs
- generated version/dependency fields in rust/Cargo.toml
Generated Go/Rust files and rust/Cargo.lock updates should be committed with the versions.toml change.
Enable cgo for normal execution:
CGO_ENABLED=1 go test ./...If automatic download is disabled or the release does not publish an asset for your platform, either set DATAFUSION_GO_LIBRARY or build a local library:
make bundle
DATAFUSION_GO_LIBRARY="$PWD/internal/native/lib/$(go env GOOS)-$(go env GOARCH)/libdatafusion_go.so" go test ./...On macOS, use libdatafusion_go.dylib. On Windows, use datafusion_go.dll.
Run the project build/test target so internal/native/lib/<goos>-<goarch>/ is populated:
make testUse an empty DSN or a session-options DSN. Paths, hosts, and file URLs are intentionally unsupported.
sql.Open("datafusion", "")
sql.Open("datafusion", "?datafusion.execution.batch_size=8192")
sql.Open("datafusion", "datafusion://?datafusion.go.shared_session=false")The database/sql path rejects complex Arrow values. Use QueryArrowContext to read exact Arrow record batches.
Use a MinGW/GNU C toolchain. The bundled Windows build targets Rust's x86_64-pc-windows-gnu ABI.
Run linting:
make lintRun the default Go test path:
make testRun the source link-mode tests (datafusion_use_static_lib links identically,
so it has no separate make target):
make test.sourceRun Rust tests:
make rust.testFor release-sensitive changes, prefer the full release verification targets documented in CONTRIBUTING.md.
Native libraries are generated by the build and intentionally kept out of Git because the static archives exceed GitHub's normal file-size limits and Go module zips have a 500 MiB source limit.
Release automation runs only in GitHub Actions after CI completes successfully on main. The CI workflow builds and tests static archives and shared libraries for darwin-arm64, darwin-amd64, linux-amd64, linux-arm64, and windows-amd64 on the pinned release runners and uploads them with build-time checksums; the Release workflow downloads those exact artifacts, verifies the checksums, smoke-tests the downloaded libraries without rebuilding anything, writes the release-asset SHA256SUMS manifest into the tagged tree, and uploads the native files plus checksums to the GitHub release. If the tag derived from versions.toml already exists, the release workflow exits without publishing.
Before a public release, update versions.toml, run make generate, update CHANGELOG.md, and merge the release PR into main after CI passes.
See CONTRIBUTING.md for setup, testing, version-bump, and release instructions.
This project is licensed under the terms in LICENSE.