Skip to content

kupakwash/FinBridge

Repository files navigation

💹 FinBridge

AI-Powered Financial Document Interpreter

From Wall Street jargon to Main Street clarity — instantly.

Built by Kupakwashe T. Mapuranga
B.Tech Artificial Intelligence & Machine Learning
Symbiosis Institute of Technology, Pune · CA3 FinTech Project · 2026


📖 What is FinBridge?

Most people cannot understand financial documents. Annual reports run 200+ pages. Earnings calls are full of jargon. Fund fact sheets use terms that take years to learn. This creates a massive gap — people cannot make informed investment decisions because they cannot read the documents that matter.

FinBridge closes that gap.

You paste or upload any financial document — an annual report, a fund fact sheet, a news article, an earnings call transcript — and FinBridge analyses it in seconds using a trained AI pipeline. You get back a clean, plain-English breakdown that anyone can understand, regardless of their financial background.

"You don't need to understand finance to make informed decisions — FinBridge reads it for you."


✨ Features

Feature What it does
🎯 Sentiment Analysis Classifies the document as Positive, Neutral, or Negative using FinBERT — a financial-domain transformer trained specifically on market language. Returns confidence scores for all three classes.
💬 Plain English Summary Uses Groq's LLaMA 3.3 70B model to generate a structured, jargon-free explanation. Includes a key numbers table, a simple story of what happened, good news, concerns, and a one-line verdict. Scales with document length.
⚠️ Risk Flag Detection Scans for 30+ genuine risk phrases (going concern, material weakness, covenant breach, fraud, liquidity risk, etc.) using a context-aware engine that avoids false positives from standard legal boilerplate.
📚 Jargon Glossary Automatically identifies 70+ financial terms in the document and provides plain-English definitions with mention counts.
📑 PDF Export Generates a professionally formatted PDF report with proper tables, coloured headers, risk cards, and a clean layout — no garbled characters.
📝 DOCX Export Generates an editable Word document with the same content — ready to share, annotate, or submit.
📄 TXT Export Plain text version with ASCII table formatting — works everywhere, smallest file size.

🧠 How It Works

FinBridge runs every document through a four-stage AI pipeline:

┌─────────────────────────────────────────────────────────────────┐
│                        USER INPUT                               │
│              Paste text  OR  Upload PDF (max 10MB)              │
└─────────────────────┬───────────────────────────────────────────┘
                      │
          ┌───────────▼───────────┐
          │   TEXT EXTRACTION     │
          │   pdfplumber reads    │
          │   PDF → clean text    │
          └───────────┬───────────┘
                      │
        ┌─────────────┼─────────────────────────┐
        │             │             │            │
        ▼             ▼             ▼            ▼
  ┌──────────┐  ┌──────────┐  ┌────────┐  ┌─────────┐
  │ FinBERT  │  │  Groq /  │  │  Risk  │  │ Jargon  │
  │Sentiment │  │LLaMA 3.3 │  │Scanner │  │Detector │
  │Classifier│  │  70B LLM │  │30+ pat │  │70+ terms│
  │79.4% F1  │  │Plain Eng │  │Context │  │Glossary │
  └────┬─────┘  └────┬─────┘  └───┬────┘  └────┬────┘
       │              │            │             │
       └──────────────┴────────────┴─────────────┘
                      │
          ┌───────────▼───────────┐
          │   RESULTS DASHBOARD   │
          │  Sentiment · Summary  │
          │  Risk Flags · Jargon  │
          └───────────┬───────────┘
                      │
          ┌───────────▼───────────┐
          │   EXPORT REPORTS      │
          │   PDF · DOCX · TXT    │
          └───────────────────────┘

Stage 1 — Sentiment Classification (FinBERT)

The document text is fed into ProsusAI/finbert — a BERT-based transformer pre-trained on financial corpora and fine-tuned for 3-class sentiment classification. Unlike general-purpose sentiment models, FinBERT understands financial language nuances: "revenue declined" is negative, but "declining exposure to volatile assets" is neutral.

  • Model: ProsusAI/finbert
  • Classes: Positive · Neutral · Negative
  • Accuracy: 79.4% on held-out Financial PhraseBank test set
  • Input limit: 512 tokens (truncated if longer)

Stage 2 — Plain English Summary (Groq / LLaMA 3.3 70B)

The document is sent to Groq's inference API with a carefully engineered prompt that instructs the model to explain the content at Grade 6 reading level. Every financial term gets explained in brackets on first use. Numbers are given real-world context. The output scales with document length: short documents get 2-3 paragraphs, long annual reports get a full structured breakdown with tables.

  • Primary provider: Groq (LLaMA 3.3 70B) — free, very fast
  • Fallback 1: Google Gemini 1.5 Flash — free
  • Fallback 2: Anthropic Claude Haiku — paid
  • Fallback 3: Rule-based sentence extraction — always works offline

Stage 3 — Risk Flag Detection (Rule Engine)

A context-aware pattern matcher scans for 30+ risk phrases across six categories: solvency/survival risks, legal & regulatory risks, financial performance risks, market risks, governance risks, and restatement risks. Crucially, the engine checks surrounding context to avoid false positives — for example, "going concern basis is still appropriate" is a positive statement and is not flagged, while "going concern qualification issued" is a genuine warning and is flagged.

Stage 4 — Jargon Detection (Glossary Engine)

A curated dictionary of 70+ financial terms is scanned against the document. Every term found is returned with its mention count and a plain-English definition written at the same Grade 6 level as the summary.


📊 Sentiment Model — Performance Comparison

Five models were trained and evaluated on the Financial PhraseBank dataset (5,842 sentences, 3-class classification: Positive / Neutral / Negative).

Model Accuracy Precision Recall F1 Score
Naive Bayes 68.9% 74.3% 68.9% 62.9%
Logistic Regression 69.3% 70.7% 69.3% 69.9%
SVM (Linear) 69.5% 71.7% 69.5% 70.4%
LSTM 55.4% 55.4%
FinBERT 79.4% 79.0% 79.0% 79.0%

FinBERT outperforms all classical models by a margin of 8.6 percentage points in F1 score. Its domain-specific pre-training on financial corpora makes it uniquely suited for this task.


📁 Project Structure

finbridge/
│
├── app/                          ← Core AI application
│   ├── main.py                   ← Streamlit UI (complete dashboard)
│   ├── sentiment.py              ← FinBERT sentiment classifier
│   ├── explainer.py              ← Multi-provider LLM summariser
│   ├── risk_flags.py             ← Context-aware risk scanner (30+ patterns)
│   ├── jargon.py                 ← Financial glossary detector (70+ terms)
│   ├── pdf_reader.py             ← PDF text extraction (pdfplumber)
│   └── report_generator.py      ← PDF / DOCX / TXT export engine
│
├── config/
│   └── settings.py              ← API keys, model config, provider priority
│
├── examples/
│   └── sample_documents.txt     ← 3 test documents (positive, negative, neutral)
│
├── tests/
│   └── test_core.py             ← Unit tests for risk scanner and jargon detector
│
├── streamlit_app.py             ← HuggingFace Spaces entry point
├── requirements.txt             ← All Python dependencies
├── .env.example                 ← Template for API keys
├── .gitignore                   ← Excludes .env and large files
└── README.md                    ← This file

🚀 Quick Start

Prerequisites

  • Python 3.10 or higher
  • A free Groq API key (takes 2 minutes)

1. Clone the repository

git clone https://github.com/kupakwash/finbridge.git
cd finbridge

2. Create a virtual environment

# Windows
python -m venv venv
venv\Scripts\activate

# Mac / Linux
python -m venv venv
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

⚠️ First install downloads PyTorch (~800MB). Allow 5–10 minutes on first run.

4. Configure API keys

# Copy the template
cp .env.example .env

Open .env and add your Groq key:

GROQ_API_KEY=gsk_your_key_here

Get your free Groq key at console.groq.com → API Keys → Create Key.

5. Run the app

streamlit run app/main.py

Open your browser at http://localhost:8501


🔑 API Keys

FinBridge supports three LLM providers. You only need one — Groq is recommended because it is free and fast.

Provider Key Variable Cost Where to get
Groq GROQ_API_KEY Free console.groq.com
Google Gemini GEMINI_API_KEY Free aistudio.google.com
Anthropic Claude ANTHROPIC_API_KEY ~$0.01/analysis console.anthropic.com

The app tries providers in this order: Groq → Gemini → Claude → Smart Fallback. If no key is configured, a rule-based fallback summary is generated automatically — the app always works.


🧪 Testing

python tests/test_core.py

Expected output:

✅ Risk flag: going concern detected
✅ Risk flag: empty text handled
✅ Risk flag: clean text returns 0 flags
✅ Risk summary: no flags message correct
✅ Jargon: bull market detected
✅ Jargon: empty text handled
✅ Explainer: short text rejected correctly
✅ Explainer: long text summary generated

✅ All 8 tests passed!

📄 Example Output

Given this input (Old Mutual ZWG Money Market Fund Fact Sheet):

"The Fund registered a return of 2.84% in Q4 against 3.71% in Q3 of 2025, bringing its full year return to 13.60%. The RBZ reaffirmed its commitment to a tight monetary policy. Resultantly, market liquidity remained constrained, and interest rates competitive above 10% per annum..."

FinBridge produces:

Sentiment: 🟡 Neutral (94.5% confidence)

Plain English Summary:

This is a money market fund — think of it like a savings account managed by professionals. Old Mutual pools your money with other investors and lends it to banks and the government for short periods, earning interest. That interest gets paid back to you monthly...

Risk Flags: LIQUIDITY RISK · INTEREST RATE RISK (both MEDIUM — manageable)

Jargon Detected: YIELD · LIQUIDITY · INTEREST RATE · PORTFOLIO · INFLATION · MONETARY POLICY


🌍 Why This Matters

  • 3.5 billion people globally have limited financial literacy
  • Only 27% of Indian adults are financially literate (vs 52% global average for advanced economies)
  • Annual reports average 150–250 pages of dense financial language
  • Retail investors lose money not because markets are complex — but because documents are unreadable

FinBridge makes financial documents accessible to everyone — students, first-time investors, small business owners, and retail traders — regardless of their educational background or location.

Roadmap to wider impact:

  • 🌐 Multilingual output (Shona, Swahili, Zulu, Hindi) — African & Asian markets
  • 📱 Mobile app — analysis on the go
  • 🔌 API access for fintech platforms and robo-advisors
  • 🏫 Educational mode — learn finance while reading real documents

🛠️ Tech Stack

Layer Technology Purpose
UI Streamlit 1.32 Web dashboard
Sentiment AI FinBERT (HuggingFace Transformers) Financial sentiment classification
Summary AI Groq API / LLaMA 3.3 70B Plain-English document explanation
PDF Parsing pdfplumber Extract text from uploaded PDFs
PDF Export ReportLab Generate formatted PDF reports
DOCX Export python-docx Generate Word document reports
Config python-dotenv Secure API key management
Deep Learning PyTorch FinBERT inference backend

📜 License

This project is licensed under the MIT License — see LICENSE for details.


👤 Author

Kupakwashe T. Mapuranga

B.Tech Artificial Intelligence & Machine Learning
Symbiosis Institute of Technology, Pune, India

📧 kupakwashemapuranga@gmail.com
🐙 github.com/kupakwash

CA3 FinTech Application Project · 2026


Built with ❤️ to make financial knowledge accessible to everyone.

About

AI-powered financial document interpreter: upload any financial report and get plain-English sentiment analysis, risk flags, jargon definitions, and downloadable PDF/DOCX reports — built with FinBERT (79.4% F1) + Groq LLaMA 3.3 70B.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages