This tutorial covers setting up the hybrid data architecture that powers the UKConnect system: relational database for operational data and vector database for knowledge retrieval.
- Database schema design for customer support
- Vector database setup for RAG (Retrieval-Augmented Generation)
- Data population and initialization
- Integration testing
┌─────────────────┐ ┌─────────────────┐
│ Relational DB │ │ Vector Database │
│ (SQLite/PG) │ │ (Embeddings) │
├─────────────────┤ ├─────────────────┤
│ • Customers │ │ • Policy Docs │
│ • Bookings │ │ • FAQ Content │
│ • Transactions │ │ • Knowledge Base│
│ • Train Data │ │ • Help Articles │
└─────────────────┘ └─────────────────┘
The system uses 4 core tables:
CREATE TABLE customers (
customer_id TEXT PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
address TEXT,
phone TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);CREATE TABLE bookings (
booking_id TEXT PRIMARY KEY,
customer_id TEXT,
departure_station TEXT,
arrival_station TEXT,
departure_datetime TIMESTAMP,
ticket_type TEXT,
price DECIMAL(10,2),
status TEXT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);CREATE TABLE transactions (
transaction_id TEXT PRIMARY KEY,
booking_id TEXT,
amount DECIMAL(10,2),
transaction_type TEXT,
payment_method TEXT,
status TEXT,
processed_at TIMESTAMP,
FOREIGN KEY (booking_id) REFERENCES bookings(booking_id)
);CREATE TABLE fares (
route_id INTEGER PRIMARY KEY,
departure_station TEXT,
arrival_station TEXT,
distance_miles INTEGER,
standard_price DECIMAL(10,2),
flexible_price DECIMAL(10,2)
);# Initialize schema and populate sample data
cd production-ai-customer-support
python -c "from utils.create_schema import create_database_schema; create_database_schema()"
python -c "from utils.populate_data import populate_data; populate_data()"- Semantic search for company policies and documentation
- Google Embeddings API for vector generation
- Pickle storage for development (production uses vector DBs)
Policy Documents → Text Chunking → Embeddings → Vector Storage
# Located in: database/UKConnect_policy.txt
# Contains: Refund policies, terms & conditions, fare rules# From utils/chunking_data.py
- Chunk size: 500-1000 characters
- Overlap: 100 characters
- Preserve context boundaries# From database/vector_db.py
import google.generativeai as genai
def generate_embeddings(text_chunks):
embeddings = []
for chunk in text_chunks:
result = genai.embed_content(
model="models/embedding-001",
content=chunk
)
embeddings.append(result['embedding'])
return embeddings# Setup vector database for policy documents
python -c "from database.vector_db import setup_vector_database; setup_vector_database()"Create .env file with required configurations:
# API Configuration
GOOGLE_API_KEY=your_google_api_key_here
# Database Configuration
DATABASE_URL=sqlite:///database/ukconnect_rail.db
# Vector Database
VECTOR_DB_PATH=database/vector_db.pklproduction-ai-customer-support/
├── database/
│ ├── ukconnect_rail.db # SQLite database
│ ├── vector_db.pkl # Vector embeddings
│ ├── UKConnect_policy.txt # Policy documents
│ ├── ukconnect_qa_pairs.json # Q&A pairs
│ └── ukconnect_rag_chunks.json # Processed chunks
- 15 realistic customers with complete profiles
- 25+ train bookings across different scenarios
- Route data for major UK rail connections
- Transaction history for testing refunds/modifications
# From utils/populate_data.py
customers = [
{
"customer_id": "CUS001",
"name": "James Thompson",
"email": "james.thompson@email.co.uk",
"address": "45 Baker Street, London, W1U 7EW",
"phone": "+44 20 7946 0958"
},
# ... 14 more customers
]bookings = [
{
"booking_id": "UKC001",
"customer_id": "CUS001",
"departure_station": "London Euston",
"arrival_station": "Manchester Piccadilly",
"ticket_type": "Standard",
"price": 85.50,
"status": "Active"
},
# ... complex scenarios for testing
]# Test database connectivity
python -c "
from database.database import DatabaseManager
db = DatabaseManager()
print('Customers:', len(db.get_all_customers()))
print('Bookings:', len(db.get_all_bookings()))
"# Test vector search functionality
python -c "
from database.vector_db import VectorDatabase
vdb = VectorDatabase()
results = vdb.search('refund policy', top_k=3)
print('Search results:', len(results))
"# Run basic system test
python run_test_scenarios.py --session 1- Query Analysis: Extract semantic meaning from customer question
- Vector Search: Find relevant policy documents using similarity
- Context Preparation: Combine retrieved chunks with query context
- Generation: LLM generates response using retrieved information
- Chunk Overlap: Ensures context continuity
- Relevance Filtering: Threshold-based result filtering
- Caching: Frequent queries cached for speed
- Batch Processing: Efficient embedding generation
# Database health monitoring
def check_database_health():
# Verify table integrity
# Check data consistency
# Monitor query performance
pass# Periodic vector database updates
def update_vector_database():
# Re-process modified documents
# Regenerate embeddings
# Update search indices
passWith your database infrastructure ready:
- Tutorial 3: Build the Policy Agent for RAG-powered responses
- Tutorial 4: Create the Ticket Agent for operational tasks
- Tutorial 5: Implement the Master Agent orchestrator
- Hybrid approach combines structured data (relational) with unstructured knowledge (vector)
- Proper chunking is critical for effective RAG performance
- Sample data should reflect real-world usage patterns
- Health monitoring ensures system reliability
- Environment configuration enables easy deployment across environments
Your data foundation is now ready to power intelligent customer support interactions!