A Retrieval Augmented Generation (RAG) system that answers questions about AWS services by searching real AWS documentation and generates responses using Claude.
Live demo: https://d3d0zch3u8ca61.cloudfront.net
Ask a question about AWS → the system searches through indexed AWS documentation → retrieves the most relevant sections → sends them to Claude → returns an answer grounded in real documentation with source links.
Every answer is backed by retrieved documentation.
┌──────────────────────────────────────────────────────────────┐
│ INGESTION (one-time) │
│ │
│ AWS Docs → Scrape & Clean → Chunk → Embed (Titan v2) → |
│ Upload to S3 → Upload to Pinecone │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ QUERY (per question) │
│ │
│ User → CloudFront → API Gateway → Lambda │
│ ├─ Embed question │
│ ├─ Search Pinecone │
│ ├─ Build prompt │
│ ├─ Call Claude │
│ └─ Return answer │
└──────────────────────────────────────────────────────────────┘
| Component | Service |
|---|---|
| LLM | Claude Sonnet 4.6 via Amazon Bedrock |
| Embeddings | Amazon Titan Embeddings v2 (1024-dim) |
| Vector DB | Pinecone (free tier) |
| Backend | AWS Lambda + API Gateway (REST) |
| Frontend | Static HTML/JS on S3 + CloudFront |
| Storage | Amazon S3 |
| Monitoring | Amazon CloudWatch |
| Language | Python 3.11 |
aws-docs-rag/
├── scripts/
│ ├── 01_ingest_docs.py # Scrape & clean AWS documentation
│ ├── 02_chunk_docs.py # Split docs into overlapping chunks
│ ├── 03_generate_embeddings.py # Generate vectors via Titan v2
│ ├── 04_upload_to_pinecone.py # Create index & upload vectors
│ ├── 05_test_rag_local.py # Test full RAG pipeline locally
│ ├── 06_deploy_lambda.py # Package & deploy Lambda function
│ ├── 07_deploy_api_gateway.py # Create REST API endpoint
│ └── 08_deploy_frontend.py # Deploy static site to S3 + CloudFront
├── lambda_function/
│ └── lambda_handler.py # Lambda handler (embed → search → generate)
├── frontend/
│ └── index.html # Chat UI (single file, no framework)
├── set_env.sh # Environment variables (Linux/macOS)
├── set_env.ps1 # Environment variables (Windows)
├── requirements.txt # Python dependencies
├── GUIDE.md # Complete step-by-step build guide
└── README.md
- Scrape — Downloads user guide pages for S3, EC2, Lambda, DynamoDB, and VPC from docs.aws.amazon.com
- Clean — Strips HTML chrome (nav, footers, scripts), keeps documentation content
- Chunk — Splits documents into ~1000-character pieces with 200-character overlap using LangChain's RecursiveCharacterTextSplitter
- Embed — Sends each chunk to Amazon Titan Embeddings v2 → 1024-dimensional vector
- Store — Uploads vectors + metadata to Pinecone with cosine similarity indexing
Each pipeline stage writes a small manifest with a run_id, stores output under a versioned S3 prefix, and stops if the previous stage did not finish cleanly. Pinecone uploads also verify the final index state; failed batches are written to local-data/failed_batches.json and pinecone/failed_batches/<run_id>.json.
- Embed the user's question using the same Titan v2 model
- Search Pinecone for the 5 most semantically similar document chunks
- Build a prompt with the question + retrieved chunks + anti-hallucination instructions
- Generate an answer using Claude via Bedrock
- Return the answer with source URLs
- AWS account with Bedrock model access
- Pinecone account (free tier)
- Python 3.11+
- AWS CLI v2
# Clone the repo
git clone https://github.com/micronwave/aws-docs-rag.git
cd aws-docs-rag
# Set up Python environment
python3 -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
pip install -r requirements.txt
# Configure environment variables
cp set_env.sh set_env.local.sh # Edit with your Pinecone key + AWS account ID
source set_env.local.sh
export ALLOWED_ORIGIN=https://your-cloudfront-domain.example.com
# Create S3 bucket
aws s3 mb s3://$S3_BUCKET_NAME --region us-east-2
# Run the pipeline
python scripts/01_ingest_docs.py # ~5 min
python scripts/02_chunk_docs.py # ~30 sec
python scripts/03_generate_embeddings.py # ~10 min
python scripts/04_upload_to_pinecone.py # ~1 min
# Test locally
python scripts/05_test_rag_local.py "How do I create an S3 bucket?"
# Deploy
python scripts/06_deploy_lambda.py
python scripts/07_deploy_api_gateway.py
python scripts/08_deploy_frontend.pyBefore running scripts/06_deploy_lambda.py and scripts/07_deploy_api_gateway.py,
set ALLOWED_ORIGIN to the CloudFront URL that will serve the frontend. Both
deploy scripts read the same environment variable.
On a first deploy, use the CloudFront URL you plan to serve from, then rerun
the Lambda and API Gateway scripts if that URL changes. scripts/07_deploy_api_gateway.py
writes api_endpoint.txt; scripts/deploy_config.py creates or reuses
origin_verify_secret.txt; and scripts/08_deploy_frontend.py publishes the
frontend against the same-origin /query path while configuring CloudFront to
forward that path to API Gateway with the private verification header.
| Service | Monthly Cost |
|---|---|
| Pinecone | $0 (free tier) |
| Lambda | $0 (free tier) |
| API Gateway | $0 (free tier, first 12 months) |
| S3 | ~$0.03 |
| CloudFront | ~$0.50 |
| Bedrock (Claude) | ~$2–10 depending on usage |
| Total | ~$3–11/month |
Compared to ~$700+/month if using OpenSearch Serverless as the vector database.
- Lambda runs with a scoped IAM policy — only
bedrock:InvokeModeland CloudWatch log permissions - Pinecone API key stored as a Lambda environment variable
- API Gateway handles CORS, request routing, and stage-level throttling; Lambda rejects requests that do not carry the private CloudFront origin-verification header
- The frontend never ships a backend credential to the browser; the browser only calls same-origin
/query - S3 frontend bucket served through CloudFront with HTTPS
- No user data is stored — queries are stateless
| Need | Solution |
|---|---|
| More documentation | Add service URLs to 01_ingest_docs.py, re-run pipeline |
| Faster cold starts | Lambda Provisioned Concurrency or move to ECS Fargate |
| Repeated query caching | Add DynamoDB TTL cache in front of Bedrock |
| Better retrieval | Add reranking step (retrieve top 20, rerank to top 5) |
| Cheaper generation | Swap Claude Sonnet for Claude Haiku on simple queries |
| User authentication | Add Amazon Cognito |