Skip to content

Shiva1803/FinsightAI

Repository files navigation

🚀 FinsightAI - Intelligent Invoice & Purchase Order Verification System

FinsightAI Banner

FastAPI React TypeScript Python Tailwind CSS

AI-powered document verification system that detects fraud, identifies discrepancies, and automates invoice-PO matching with advanced anomaly detection.

FeaturesDemoInstallationUsageAPIContributing


📋 Table of Contents


🎯 Overview

FinsightAI is a comprehensive full-stack application designed to revolutionize financial document verification. It leverages AI and machine learning to automatically compare Purchase Orders (POs) with Invoices, detect anomalies, identify potential fraud, and provide actionable insights through an intuitive dashboard.

Why FinsightAI?

  • 🔍 Automated Verification: Eliminate manual document comparison
  • 🛡️ Fraud Detection: AI-powered anomaly detection with risk scoring
  • 📊 Visual Analytics: Interactive charts and detailed breakdowns
  • ⚡ Fast Processing: OCR-powered text extraction from PDFs and images
  • 💾 Data Management: Store, retrieve, and export verification records
  • 🎨 Modern UI: Beautiful, responsive interface with dark mode support

✨ Features

Core Functionality

🔐 Document Verification

  • Smart Comparison: Automatically compare PO and Invoice documents
  • Field Matching: Verify vendor names, amounts, quantities, prices, and dates
  • Discrepancy Detection: Identify mismatches with severity levels (Low, Medium, High)
  • Risk Scoring: Calculate risk scores (0-10) based on anomaly patterns

🤖 AI-Powered Data Extraction

  • Google Gemini AI: Advanced AI-powered document data extraction
  • Automatic Fallback: Falls back to local extraction if AI fails
  • Structured Output: Extracts vendor, amounts, dates, line items, tax details
  • Multi-Provider Support: Gemini, Shivaay, OpenAI, and local extraction
  • Real-time Status: Monitor AI provider status and configuration

🛡️ Anomaly Detection

  • Overbilling Detection: Identify when invoice amounts exceed PO amounts
  • Price Manipulation: Detect altered unit prices
  • Vendor Fraud: Flag vendor name mismatches
  • Quantity Discrepancies: Spot quantity differences in line items
  • Tax Anomalies: Identify tax calculation errors
  • Historical Patterns: Track vendor anomaly history

📄 OCR & Document Processing

  • Multi-Format Support: Process PDF, PNG, JPG, JPEG files
  • Tesseract OCR: Extract text from scanned documents
  • Smart Parsing: Automatically extract vendor, amounts, dates, line items
  • Confidence Scoring: Assess extraction quality

📧 Email Watcher

  • Automatic Monitoring: Monitor email inbox for new invoices and POs
  • IMAP Support: Connect to Gmail, Outlook, and other IMAP servers
  • Real-time Processing: Automatically extract and save document data
  • Live Statistics: Track emails processed, last check time, and errors
  • Easy Configuration: Simple web interface for setup

📊 Visual Analytics

  • Bar Charts: Compare PO vs Invoice totals
  • Pie Charts: Discrepancy breakdown by category
  • Line Item Tables: Detailed item-by-item comparison
  • Timeline Views: Track document lifecycle
  • Risk Indicators: Visual risk score displays

💼 Data Management

  • Record Storage: SQLite database for all verifications
  • CSV Export: Export records for external analysis
  • Search & Filter: Find specific verifications quickly
  • Bulk Operations: Process multiple documents

User Interface

  • 🎨 Modern Design: Clean, professional interface with Tailwind CSS
  • 🌓 Dark Mode: Full dark mode support with theme toggle
  • 📱 Responsive: Works on desktop, tablet, and mobile
  • 🎭 Animations: Smooth transitions with Framer Motion
  • 🎯 Intuitive Navigation: Easy-to-use multi-page layout
  • 🎨 Consistent Theme: Amber/yellow accent colors throughout

🎬 Demo

Dashboard Overview

View key metrics, recent verifications, and system health at a glance.

Document Upload

  • Drag-and-drop interface for uploading documents
  • AI-powered extraction with Google Gemini
  • Automatic fallback to local extraction
  • Real-time processing status

Verification Results

Comprehensive verification report with:

  • Risk score and discrepancy level
  • Vendor, amount, and quantity matching
  • Anomaly insights and fraud indicators
  • Interactive charts and visualizations
  • Detailed line item comparison

Email Watcher

  • Configure IMAP email monitoring
  • Automatic document processing from email attachments
  • Live status dashboard with statistics
  • Start/stop controls with one click

Records Management

Browse all verification records, view details, and export to CSV.


🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Frontend (React)                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │Dashboard │  │  Upload  │  │  Verify  │  │ Records  │   │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘   │
│                         │                                    │
│                    Axios API Client                          │
└─────────────────────────┬───────────────────────────────────┘
                          │ HTTP/REST
┌─────────────────────────▼───────────────────────────────────┐
│                    Backend (FastAPI)                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ OCR Engine   │  │  Extractor   │  │ Verification │     │
│  │ (Tesseract)  │  │   Parser     │  │    Agent     │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│                         │                                    │
│                    ┌────▼─────┐                             │
│                    │ Database │                             │
│                    │ (SQLite) │                             │
│                    └──────────┘                             │
└─────────────────────────────────────────────────────────────┘

Data Flow

  1. Upload: User uploads PO and Invoice files
  2. OCR: Tesseract extracts text from documents
  3. Parse: Extractor parses structured data (vendor, amounts, items)
  4. Verify: Verification Agent compares documents and detects anomalies
  5. Analyze: Risk scoring and fraud pattern detection
  6. Visualize: Generate charts and comparison tables
  7. Store: Save results to database
  8. Display: Present results in interactive dashboard

🛠️ Tech Stack

Frontend

  • React 18 - UI library
  • TypeScript - Type-safe JavaScript
  • Vite - Fast build tool
  • Tailwind CSS - Utility-first CSS framework
  • Framer Motion - Animation library
  • Recharts - Chart library
  • React Router - Client-side routing
  • Axios - HTTP client
  • Lucide React - Icon library

Backend

  • FastAPI - Modern Python web framework
  • Python 3.11+ - Programming language
  • Google Gemini AI - Advanced AI for document extraction
  • Tesseract OCR - Text extraction engine
  • SQLModel - SQL database ORM
  • SQLite - Embedded database
  • IMAPClient - Email monitoring
  • Pillow - Image processing
  • PyPDF2 - PDF manipulation
  • Pandas - Data analysis
  • Uvicorn - ASGI server

Development Tools

  • ESLint - JavaScript linter
  • TypeScript Compiler - Type checking
  • Autoprefixer - CSS vendor prefixes
  • PostCSS - CSS transformation

📦 Installation

Prerequisites

Before you begin, ensure you have the following installed:

Install System Dependencies

macOS:

brew install poppler tesseract

Ubuntu/Linux:

sudo apt-get update
sudo apt-get install poppler-utils tesseract-ocr

Windows:

Configure AI Provider (Optional)

Create a .env file in the root directory:

# Google Gemini AI (Recommended)
GEMINI_API_KEY=your_gemini_api_key_here
AI_PROVIDER=gemini

# Get your free API key from: https://aistudio.google.com/app/apikey

The system will automatically fall back to local extraction if no API key is provided.

Clone Repository

git clone https://github.com/Shiva1803/FinsightAI.git
cd FinsightAI

Backend Setup

  1. Create Virtual Environment
python -m venv venv
  1. Activate Virtual Environment

macOS/Linux:

source venv/bin/activate

Windows:

venv\Scripts\activate
  1. Install Dependencies
pip install -r requirements.txt
  1. Verify Installation
python -c "import pytesseract; print('Tesseract OK')"

Frontend Setup

  1. Navigate to Frontend
cd frontend
  1. Install Dependencies
npm install
  1. Return to Root
cd ..

🚀 Usage

Quick Start (Recommended)

Use the provided startup script to launch both frontend and backend:

./start_dev.sh

This will:

  • Start the backend server on http://localhost:8000
  • Start the frontend dev server on http://localhost:5173
  • Open your browser automatically

Manual Start

Start Backend

source venv/bin/activate  # On Windows: venv\Scripts\activate
uvicorn app.main:app --reload --port 8000

Start Frontend (in another terminal)

cd frontend
npm run dev

Access Application


📚 API Documentation

Base URL

http://localhost:8000

Endpoints

1. Upload Document

POST /upload/

Upload and extract data from a single document.

Request:

  • Content-Type: multipart/form-data
  • Body: file (PDF/Image)

Response:

{
  "id": 1,
  "parsed": {
    "vendor_name": "AlphaTech Supplies Ltd.",
    "document_number": "PO-2045",
    "total_amount": 613010,
    "line_items": [...]
  }
}

2. Verify Documents

POST /verify/

Compare PO and Invoice for discrepancies.

Request:

  • Content-Type: multipart/form-data
  • Body:
    • po_file (PDF/Image)
    • invoice_file (PDF/Image)

Response:

{
  "verification_summary": {
    "vendor_match": false,
    "total_match": true,
    "amount_difference": 0,
    "discrepancy_level": "high",
    "risk_score": 6.5,
    "needs_review": true
  },
  "anomaly_insights": {
    "anomaly_detected": true,
    "anomaly_types": ["vendor_mismatch"],
    "risk_score": 6.5,
    "fraud_indicators": [...]
  },
  "visualization_data": {
    "bar_chart": {...},
    "pie_chart": {...},
    "line_items_comparison": [...]
  }
}

3. List Records

GET /records/

Retrieve all verification records.

Response:

[
  {
    "id": 1,
    "source_file": "verification_PO-2045_INV-4719",
    "parsed": {...}
  }
]

4. Delete Record

DELETE /records/{record_id}

Delete a specific record.

Response:

{
  "message": "Record deleted successfully"
}

5. Export CSV

GET /export/

Export all records to CSV file.

Response: CSV file download

6. AI Extraction

POST /extract/ai/

Extract data using Google Gemini AI with automatic fallback.

Request:

  • Content-Type: multipart/form-data
  • Body: file (PDF/Image) or ocr_text (string)

Response:

{
  "success": true,
  "provider": "gemini",
  "record_id": 1,
  "data": {
    "vendor_name": "AlphaTech",
    "document_number": "INV-4719",
    "total_amount": 613010,
    ...
  }
}

7. AI Provider Status

GET /ai/status/

Get current AI provider configuration and status.

Response:

{
  "current_provider": "gemini",
  "available_providers": ["gemini", "local"],
  "gemini_configured": true,
  "model": "gemini-2.5-flash"
}

8. Email Watcher - Start

POST /email/watch/start/

Start monitoring email inbox for documents.

Request:

  • Content-Type: application/x-www-form-urlencoded
  • Body:
    • imap_host: IMAP server (e.g., imap.gmail.com)
    • imap_user: Email address
    • imap_pass: App password
    • check_interval: Check interval in seconds (default: 60)

Response:

{
  "status": "started"
}

9. Email Watcher - Status

GET /email/watch/status/

Get email watcher status and statistics.

Response:

{
  "running": true,
  "config": {
    "imap_host": "imap.gmail.com",
    "imap_user": "user@gmail.com",
    "check_interval": 60
  },
  "stats": {
    "status": "running",
    "emails_processed": 5,
    "last_check": "2025-10-31 14:30:00",
    "last_error": null
  }
}

10. Email Watcher - Stop

POST /email/watch/stop/

Stop the email watcher.

Response:

{
  "status": "stopped"
}

11. Health Check

GET /health/

Check API health status.

Response:

{
  "status": "ok"
}

Interactive API Documentation

FastAPI provides automatic interactive API documentation:


📁 Project Structure

FinsightAI/
├── app/                          # Backend application
│   ├── __init__.py
│   ├── main.py                   # FastAPI app & routes
│   ├── ocr.py                    # OCR processing
│   ├── extractor.py              # Document parsing
│   ├── verification_agent.py     # Verification logic
│   ├── db.py                     # Database operations
│   ├── utils.py                  # Utility functions
│   ├── email_watcher.py          # Email monitoring
│   ├── gemini_client.py          # Google Gemini AI client
│   ├── ai_extractor.py           # AI extraction orchestrator
│   └── shivaay_client.py         # External API client
│
├── frontend/                     # Frontend application
│   ├── public/                   # Static assets
│   ├── src/
│   │   ├── components/           # React components
│   │   │   ├── FileUpload.tsx
│   │   │   ├── Layout.tsx
│   │   │   ├── Preloader.tsx
│   │   │   ├── SplashScreen.tsx
│   │   │   └── ThemeToggle.tsx
│   │   ├── hooks/                # Custom React hooks
│   │   │   └── useDistortionEffect.ts
│   │   ├── pages/                # Page components
│   │   │   ├── Dashboard.tsx
│   │   │   ├── Home.tsx
│   │   │   ├── Records.tsx
│   │   │   ├── UploadDocument.tsx
│   │   │   ├── VerifyDocuments.tsx
│   │   │   └── EmailWatcher.tsx
│   │   ├── services/             # API services
│   │   │   └── api.ts
│   │   ├── types/                # TypeScript types
│   │   │   └── index.ts
│   │   ├── App.tsx               # Main app component
│   │   ├── main.tsx              # Entry point
│   │   └── index.css             # Global styles
│   ├── package.json
│   ├── tsconfig.json
│   ├── vite.config.ts
│   └── tailwind.config.js
│
├── uploads/                      # Uploaded files storage
├── venv/                         # Python virtual environment
├── tests/                        # Test files
│   ├── test_api_verify.py
│   ├── test_verification_agent.py
│   └── ...
│
├── futurix.db                    # SQLite database
├── requirements.txt              # Python dependencies
├── start_dev.sh                  # Startup script
├── restart_backend.sh            # Backend restart script
├── README.md                     # This file
├── SETUP.md                      # Detailed setup guide
└── VERIFICATION_500_FIX.md       # Troubleshooting guide

⚙️ Configuration

Environment Variables

Create a .env file in the root directory:

# ============================================
# AI Provider Configuration
# ============================================

# Google Gemini AI (Recommended)
GEMINI_API_KEY=your_gemini_api_key_here
AI_PROVIDER=gemini

# Get your free API key from: https://aistudio.google.com/app/apikey

# ============================================
# Optional: External API Configuration
# ============================================
SHIVAAY_API_KEY=your-api-key-here
SHIVAAY_API_URL=https://api.shivaay.com/extract

OPENAI_API_KEY=your-openai-key-here
OPENAI_MODEL=gpt-4

# ============================================
# Email Watcher (Optional)
# ============================================
# Configure via web interface at /email-watcher
# Or set these for automatic startup:
# EMAIL_WATCHER_HOST=imap.gmail.com
# EMAIL_WATCHER_USER=your-email@gmail.com
# EMAIL_WATCHER_PASS=your-app-password
# EMAIL_WATCHER_INTERVAL=60

# ============================================
# Database Configuration
# ============================================
DATABASE_URL=sqlite:///./futurix.db

# ============================================
# CORS Origins
# ============================================
CORS_ORIGINS=http://localhost:3000,http://localhost:5173

# ============================================
# Application Settings
# ============================================
DEBUG=True
LOG_LEVEL=INFO

Frontend Configuration

Edit frontend/vite.config.ts to configure the development server:

export default defineConfig({
  server: {
    port: 5173,
    proxy: {
      '/api': {
        target: 'http://localhost:8000',
        changeOrigin: true,
        rewrite: (path) => path.replace(/^\/api/, '')
      }
    }
  }
})

Backend Configuration

Edit app/main.py to configure CORS and other settings:

app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "http://localhost:3000",
        "http://localhost:5173"
    ],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

🧪 Testing

Run All Tests

source venv/bin/activate
python -m pytest

Run Specific Tests

Verification Tests:

python test_api_verify.py

OCR Tests:

python test_pdf_extraction.py

Verification Agent Tests:

python test_verification_agent.py

Test Coverage

The project includes comprehensive tests for:

  • ✅ OCR text extraction
  • ✅ Document parsing
  • ✅ AI extraction (Gemini)
  • ✅ Verification logic
  • ✅ Anomaly detection
  • ✅ Email watcher
  • ✅ Database operations
  • ✅ API endpoints

🚢 Deployment

Backend Deployment

Using Docker (Recommended)

Create Dockerfile:

FROM python:3.11-slim

RUN apt-get update && apt-get install -y tesseract-ocr

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and run:

docker build -t finsightai-backend .
docker run -p 8000:8000 finsightai-backend

Using Heroku

heroku create finsightai-backend
git push heroku main

Frontend Deployment

Build for Production

cd frontend
npm run build

Deploy to Vercel

npm install -g vercel
vercel --prod

Deploy to Netlify

npm install -g netlify-cli
netlify deploy --prod

🔧 Troubleshooting

Common Issues

1. Tesseract Not Found

Error: TesseractNotFoundError

Solution:

# macOS
brew install tesseract

# Ubuntu
sudo apt-get install tesseract-ocr

# Verify installation
tesseract --version

2. Port Already in Use

Error: Address already in use

Solution:

# Find and kill process on port 8000
lsof -ti:8000 | xargs kill -9

# Or use different port
uvicorn app.main:app --port 8001

3. Module Not Found

Error: ModuleNotFoundError: No module named 'fastapi'

Solution:

source venv/bin/activate
pip install -r requirements.txt

4. CORS Errors

Error: Access-Control-Allow-Origin

Solution: Check app/main.py CORS configuration matches your frontend URL.

5. 500 Internal Server Error

Solution: Check backend logs for detailed error messages. See VERIFICATION_500_FIX.md for specific verification errors.

Getting Help

  • 📖 Check SETUP.md for detailed setup instructions
  • 🐛 Check Issues for known problems
  • 💬 Open a new issue if you need help

🤝 Contributing

Contributions are welcome! Here's how you can help:

How to Contribute

  1. Fork the Repository
git clone https://github.com/Shiva1803/FinsightAI.git
  1. Create a Branch
git checkout -b feature/amazing-feature
  1. Make Changes
  • Write clean, documented code
  • Follow existing code style
  • Add tests for new features
  1. Commit Changes
git commit -m "Add amazing feature"
  1. Push to Branch
git push origin feature/amazing-feature
  1. Open Pull Request
  • Describe your changes
  • Reference any related issues

Development Guidelines

  • Code Style: Follow PEP 8 for Python, ESLint rules for TypeScript
  • Documentation: Update README and inline comments
  • Testing: Add tests for new features
  • Commits: Use clear, descriptive commit messages

Areas for Contribution

  • 🐛 Bug fixes
  • ✨ New features
  • 📝 Documentation improvements
  • 🎨 UI/UX enhancements
  • 🧪 Additional tests
  • 🌐 Internationalization

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


👤 Contact

Shresth Panigrahi


🙏 Acknowledgments


📊 Project Stats

GitHub stars GitHub forks GitHub issues GitHub pull requests


Made with ❤️ by Team QuantCoders

⭐ Star this repo if you find it helpful!

About

FinSight AI is a new gen system which helps SMEs and firms check discrepancies in invoices and POs.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors