Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

AlloyDB Haystack Integration

PyPI - Version PyPI - Python Version


AlloyDB is a fully managed, PostgreSQL-compatible database service on Google Cloud, optimised for demanding transactional and analytical workloads.

This package provides a Haystack DocumentStore backed by AlloyDB with the pgvector extension, enabling both dense vector similarity search and full-text keyword search.

Connections are established through the AlloyDB Python Connector, which handles IAM-based authentication and TLS encryption without requiring manual firewall rules or IP allowlisting.

Installation

pip install alloydb-haystack

Usage

from haystack_integrations.document_stores.alloydb import AlloyDBDocumentStore
from haystack_integrations.components.retrievers.alloydb import (
    AlloyDBEmbeddingRetriever,
    AlloyDBKeywordRetriever,
)

Environment Variables

Variable Description
ALLOYDB_INSTANCE_URI AlloyDB instance URI: projects/P/locations/R/clusters/C/instances/I
ALLOYDB_USER Database user (or IAM principal for IAM auth)
ALLOYDB_PASSWORD Database password (not required when enable_iam_auth=True)

Basic Example

import os
from haystack import Document
from haystack_integrations.document_stores.alloydb import AlloyDBDocumentStore

# Requires ALLOYDB_INSTANCE_URI, ALLOYDB_USER, and ALLOYDB_PASSWORD env vars
store = AlloyDBDocumentStore(
    db="my-database",
    embedding_dimension=768,
    recreate_table=True,
)

store.write_documents([
    Document(content="Paris is the capital of France", embedding=[0.1] * 768),
    Document(content="Berlin is the capital of Germany", embedding=[0.2] * 768),
])

print(store.count_documents())  # 2

IAM Authentication

When using a service account for database access:

store = AlloyDBDocumentStore(
    db="my-database",
    user=Secret.from_env_var("ALLOYDB_IAM_USER"),  # e.g. "my-sa@my-project.iam"
    enable_iam_auth=True,
    embedding_dimension=768,
)

Vector Similarity Search

from haystack_integrations.components.retrievers.alloydb import AlloyDBEmbeddingRetriever

retriever = AlloyDBEmbeddingRetriever(document_store=store, top_k=5)
result = retriever.run(query_embedding=[0.1] * 768)
print(result["documents"])

Keyword Search

from haystack_integrations.components.retrievers.alloydb import AlloyDBKeywordRetriever

retriever = AlloyDBKeywordRetriever(document_store=store, top_k=5)
result = retriever.run(query="capital France")
print(result["documents"])

HNSW Index

For large datasets, the HNSW index provides approximate nearest-neighbour search with significantly better query throughput:

store = AlloyDBDocumentStore(
    db="my-database",
    embedding_dimension=768,
    search_strategy="hnsw",
    hnsw_index_creation_kwargs={"m": 16, "ef_construction": 64},
    hnsw_ef_search=40,
)

Integration Tests

Integration tests require a running AlloyDB instance. Set the following environment variables before running:

export ALLOYDB_INSTANCE_URI="projects/MY_PROJECT/locations/MY_REGION/clusters/MY_CLUSTER/instances/MY_INSTANCE"
export ALLOYDB_USER="my-db-user"
export ALLOYDB_PASSWORD="my-db-password"

Then run:

cd integrations/alloydb
hatch run test:integration

License

alloydb-haystack is distributed under the terms of the Apache-2.0 license.