A serverless OCR (Optical Character Recognition) service built with AWS Lambda, API Gateway, Textract, and Rekognition that processes base64 images and returns the text with highest confidence score.
API Gateway → Caller Lambda → Target Lambda → AWS Textract + Rekognition
- Caller Lambda (
proxy_service): Receives HTTP requests and invokes target Lambda - Target Lambda (
text_recognition_service): Performs dual OCR using AWS Textract and Rekognition - API Gateway v2: HTTP API endpoint for external access
- Dual OCR Processing: Uses both AWS Textract and Rekognition for maximum accuracy
- Confidence Scoring: Returns result with highest confidence score
- Base64 Image Input: Accepts images as base64 strings via HTTP POST
- Error Handling: Graceful fallback if one OCR service fails
- CloudWatch Logging: Full request/response logging for debugging
- AWS CLI configured
- Terraform installed
- Node.js 18+ installed
project/
├── main.tf
├── proxy_service/
│ └── index.js
└── text_recognition_service/
├── index.js
├── package.json
└── node_modules/
| Method | Latency | Coupling | Scalability | Cost | Complexity | Use Case |
|---|---|---|---|---|---|---|
| Direct Invoke | Low | High | Medium | Low | Low | Simple request/response |
| SQS | Medium | Low | High | Low | Medium | Decoupled async processing |
| SNS | Medium | Low | High | Low | Medium | Fan-out notifications |
| EventBridge | Medium | Low | High | Medium | Medium | Event-driven architecture |
| Kinesis | Medium | Low | Very High | Medium | High | Stream processing |
| DynamoDB Streams | Medium | Low | High | Low | Medium | Data change reactions |
| S3 Events | Medium | Low | High | Low | Low | File processing |
| API Gateway | Medium | Medium | High | Medium | Medium | HTTP-based communication |
| Step Functions | Medium | Low | High | High | High | Complex workflows |
| Destinations | Low | Medium | Medium | Low | Low | Success/failure routing |
Best Practices by Use Case:
- Simple sync communication: Direct invoke (RequestResponse)
- Async processing: SQS or SNS
- Event-driven architecture: EventBridge
- High-throughput streaming: Kinesis
- Complex workflows: Step Functions
- File processing: S3 Events
- Web APIs: API Gateway
- Database triggers: DynamoDB Streams
I have used direct Invoke for simplicity.
- Install dependencies:
cd text_recognition_service
npm install
cd ..- Deploy infrastructure:
terraform init
terraform apply- Get API URL:
terraform output api_urlPOST https://your-api-url/serverless_lambda_stage/process
Content-Type: application/json
{
"image": "base64_encoded_image_string"
}
curl -X POST https://your-api-url/serverless_lambda_stage/process \
-H "Content-Type: application/json" \
-d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAZAAAADI..."}'{
"message": "Image processed successfully",
"result": {
"success": true,
"data": {
"bestResult": {
"source": "textract",
"text": "Hello World Document",
"confidence": 0.95,
"wordCount": 3
},
"allResults": [
{
"success": true,
"source": "textract",
"text": "Hello World Document",
"confidence": 0.95,
"wordCount": 3
},
{
"success": true,
"source": "rekognition",
"text": "Hello World Document",
"confidence": 0.87,
"wordCount": 3
}
],
"processingInfo": {
"imageSize": 15420,
"imageType": "jpeg",
"timestamp": "2025-01-11T14:30:00.000Z",
"servicesUsed": ["textract", "rekognition"]
}
}
}
}TARGET_LAMBDA_FUNCTION_NAME: Auto-configured by TerraformAWS_REGION: Set toeu-central-1
- 2x Lambda functions (caller + target)
- API Gateway v2 (HTTP API)
- S3 bucket for Lambda code storage
- IAM roles and policies
- CloudWatch log groups
lambda:InvokeFunctiontextract:DetectDocumentTextrekognition:DetectTextlogs:*
- Caller Lambda:
/aws/lambda/caller-lambda - Target Lambda:
/aws/lambda/target-lambda - API Gateway:
/aws/api_gw/serverless_lambda_gw
# Get log URLs
terraform output caller_lambda_logs
terraform output target_lambda_logs- Textract: ~$1.50 per 1000 pages
- Rekognition: ~$1.00 per 1000 images
- Lambda: Pay per invocation + execution time
- API Gateway: Pay per request
-
Internal Server Error
- Check CloudWatch logs for detailed error messages
- Verify IAM permissions for Textract/Rekognition
-
Code Not Updating
- Ensure
source_code_hashis set in Terraform - Force update:
terraform apply -target=aws_lambda_function.target_lambda
- Ensure
-
OCR Not Working
- Verify image is valid base64
- Check image format (supports PNG, JPEG, GIF)
- Ensure image size < 5MB for Textract, < 5MB for Rekognition
# Check Terraform plan
terraform plan
# View current Lambda code
aws lambda get-function --function-name target-lambda
# Test Lambda directly
aws lambda invoke --function-name target-lambda \
--payload '{"image":"base64string"}' response.jsonTest individual components:
# Test image generation
# Use the provided HTML tool to generate base64 test images
# Test Lambda locally (with AWS SAM)
sam local invoke target-lambda -e test-event.json- Add new OCR function in
text_recognition_service/index.js - Update
Promise.allSettledarray - Add required IAM permissions
- Update
processOCRResultsfunction
MIT License