From Wall Street jargon to Main Street clarity — instantly.
Built by Kupakwashe T. Mapuranga
B.Tech Artificial Intelligence & Machine Learning
Symbiosis Institute of Technology, Pune · CA3 FinTech Project · 2026
Most people cannot understand financial documents. Annual reports run 200+ pages. Earnings calls are full of jargon. Fund fact sheets use terms that take years to learn. This creates a massive gap — people cannot make informed investment decisions because they cannot read the documents that matter.
FinBridge closes that gap.
You paste or upload any financial document — an annual report, a fund fact sheet, a news article, an earnings call transcript — and FinBridge analyses it in seconds using a trained AI pipeline. You get back a clean, plain-English breakdown that anyone can understand, regardless of their financial background.
"You don't need to understand finance to make informed decisions — FinBridge reads it for you."
| Feature | What it does |
|---|---|
| 🎯 Sentiment Analysis | Classifies the document as Positive, Neutral, or Negative using FinBERT — a financial-domain transformer trained specifically on market language. Returns confidence scores for all three classes. |
| 💬 Plain English Summary | Uses Groq's LLaMA 3.3 70B model to generate a structured, jargon-free explanation. Includes a key numbers table, a simple story of what happened, good news, concerns, and a one-line verdict. Scales with document length. |
| Scans for 30+ genuine risk phrases (going concern, material weakness, covenant breach, fraud, liquidity risk, etc.) using a context-aware engine that avoids false positives from standard legal boilerplate. | |
| 📚 Jargon Glossary | Automatically identifies 70+ financial terms in the document and provides plain-English definitions with mention counts. |
| 📑 PDF Export | Generates a professionally formatted PDF report with proper tables, coloured headers, risk cards, and a clean layout — no garbled characters. |
| 📝 DOCX Export | Generates an editable Word document with the same content — ready to share, annotate, or submit. |
| 📄 TXT Export | Plain text version with ASCII table formatting — works everywhere, smallest file size. |
FinBridge runs every document through a four-stage AI pipeline:
┌─────────────────────────────────────────────────────────────────┐
│ USER INPUT │
│ Paste text OR Upload PDF (max 10MB) │
└─────────────────────┬───────────────────────────────────────────┘
│
┌───────────▼───────────┐
│ TEXT EXTRACTION │
│ pdfplumber reads │
│ PDF → clean text │
└───────────┬───────────┘
│
┌─────────────┼─────────────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌────────┐ ┌─────────┐
│ FinBERT │ │ Groq / │ │ Risk │ │ Jargon │
│Sentiment │ │LLaMA 3.3 │ │Scanner │ │Detector │
│Classifier│ │ 70B LLM │ │30+ pat │ │70+ terms│
│79.4% F1 │ │Plain Eng │ │Context │ │Glossary │
└────┬─────┘ └────┬─────┘ └───┬────┘ └────┬────┘
│ │ │ │
└──────────────┴────────────┴─────────────┘
│
┌───────────▼───────────┐
│ RESULTS DASHBOARD │
│ Sentiment · Summary │
│ Risk Flags · Jargon │
└───────────┬───────────┘
│
┌───────────▼───────────┐
│ EXPORT REPORTS │
│ PDF · DOCX · TXT │
└───────────────────────┘
The document text is fed into ProsusAI/finbert — a BERT-based transformer pre-trained on financial corpora and fine-tuned for 3-class sentiment classification. Unlike general-purpose sentiment models, FinBERT understands financial language nuances: "revenue declined" is negative, but "declining exposure to volatile assets" is neutral.
- Model:
ProsusAI/finbert - Classes: Positive · Neutral · Negative
- Accuracy: 79.4% on held-out Financial PhraseBank test set
- Input limit: 512 tokens (truncated if longer)
The document is sent to Groq's inference API with a carefully engineered prompt that instructs the model to explain the content at Grade 6 reading level. Every financial term gets explained in brackets on first use. Numbers are given real-world context. The output scales with document length: short documents get 2-3 paragraphs, long annual reports get a full structured breakdown with tables.
- Primary provider: Groq (LLaMA 3.3 70B) — free, very fast
- Fallback 1: Google Gemini 1.5 Flash — free
- Fallback 2: Anthropic Claude Haiku — paid
- Fallback 3: Rule-based sentence extraction — always works offline
A context-aware pattern matcher scans for 30+ risk phrases across six categories: solvency/survival risks, legal & regulatory risks, financial performance risks, market risks, governance risks, and restatement risks. Crucially, the engine checks surrounding context to avoid false positives — for example, "going concern basis is still appropriate" is a positive statement and is not flagged, while "going concern qualification issued" is a genuine warning and is flagged.
A curated dictionary of 70+ financial terms is scanned against the document. Every term found is returned with its mention count and a plain-English definition written at the same Grade 6 level as the summary.
Five models were trained and evaluated on the Financial PhraseBank dataset (5,842 sentences, 3-class classification: Positive / Neutral / Negative).
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Naive Bayes | 68.9% | 74.3% | 68.9% | 62.9% |
| Logistic Regression | 69.3% | 70.7% | 69.3% | 69.9% |
| SVM (Linear) | 69.5% | 71.7% | 69.5% | 70.4% |
| LSTM | 55.4% | — | — | 55.4% |
| FinBERT ✅ | 79.4% | 79.0% | 79.0% | 79.0% |
FinBERT outperforms all classical models by a margin of 8.6 percentage points in F1 score. Its domain-specific pre-training on financial corpora makes it uniquely suited for this task.
finbridge/
│
├── app/ ← Core AI application
│ ├── main.py ← Streamlit UI (complete dashboard)
│ ├── sentiment.py ← FinBERT sentiment classifier
│ ├── explainer.py ← Multi-provider LLM summariser
│ ├── risk_flags.py ← Context-aware risk scanner (30+ patterns)
│ ├── jargon.py ← Financial glossary detector (70+ terms)
│ ├── pdf_reader.py ← PDF text extraction (pdfplumber)
│ └── report_generator.py ← PDF / DOCX / TXT export engine
│
├── config/
│ └── settings.py ← API keys, model config, provider priority
│
├── examples/
│ └── sample_documents.txt ← 3 test documents (positive, negative, neutral)
│
├── tests/
│ └── test_core.py ← Unit tests for risk scanner and jargon detector
│
├── streamlit_app.py ← HuggingFace Spaces entry point
├── requirements.txt ← All Python dependencies
├── .env.example ← Template for API keys
├── .gitignore ← Excludes .env and large files
└── README.md ← This file
- Python 3.10 or higher
- A free Groq API key (takes 2 minutes)
git clone https://github.com/kupakwash/finbridge.git
cd finbridge# Windows
python -m venv venv
venv\Scripts\activate
# Mac / Linux
python -m venv venv
source venv/bin/activatepip install -r requirements.txt
⚠️ First install downloads PyTorch (~800MB). Allow 5–10 minutes on first run.
# Copy the template
cp .env.example .envOpen .env and add your Groq key:
GROQ_API_KEY=gsk_your_key_hereGet your free Groq key at console.groq.com → API Keys → Create Key.
streamlit run app/main.pyOpen your browser at http://localhost:8501
FinBridge supports three LLM providers. You only need one — Groq is recommended because it is free and fast.
| Provider | Key Variable | Cost | Where to get |
|---|---|---|---|
| Groq ⭐ | GROQ_API_KEY |
Free | console.groq.com |
| Google Gemini | GEMINI_API_KEY |
Free | aistudio.google.com |
| Anthropic Claude | ANTHROPIC_API_KEY |
~$0.01/analysis | console.anthropic.com |
The app tries providers in this order: Groq → Gemini → Claude → Smart Fallback. If no key is configured, a rule-based fallback summary is generated automatically — the app always works.
python tests/test_core.pyExpected output:
✅ Risk flag: going concern detected
✅ Risk flag: empty text handled
✅ Risk flag: clean text returns 0 flags
✅ Risk summary: no flags message correct
✅ Jargon: bull market detected
✅ Jargon: empty text handled
✅ Explainer: short text rejected correctly
✅ Explainer: long text summary generated
✅ All 8 tests passed!
Given this input (Old Mutual ZWG Money Market Fund Fact Sheet):
"The Fund registered a return of 2.84% in Q4 against 3.71% in Q3 of 2025, bringing its full year return to 13.60%. The RBZ reaffirmed its commitment to a tight monetary policy. Resultantly, market liquidity remained constrained, and interest rates competitive above 10% per annum..."
FinBridge produces:
Sentiment: 🟡 Neutral (94.5% confidence)
Plain English Summary:
This is a money market fund — think of it like a savings account managed by professionals. Old Mutual pools your money with other investors and lends it to banks and the government for short periods, earning interest. That interest gets paid back to you monthly...
Risk Flags: LIQUIDITY RISK · INTEREST RATE RISK (both MEDIUM — manageable)
Jargon Detected: YIELD · LIQUIDITY · INTEREST RATE · PORTFOLIO · INFLATION · MONETARY POLICY
- 3.5 billion people globally have limited financial literacy
- Only 27% of Indian adults are financially literate (vs 52% global average for advanced economies)
- Annual reports average 150–250 pages of dense financial language
- Retail investors lose money not because markets are complex — but because documents are unreadable
FinBridge makes financial documents accessible to everyone — students, first-time investors, small business owners, and retail traders — regardless of their educational background or location.
Roadmap to wider impact:
- 🌐 Multilingual output (Shona, Swahili, Zulu, Hindi) — African & Asian markets
- 📱 Mobile app — analysis on the go
- 🔌 API access for fintech platforms and robo-advisors
- 🏫 Educational mode — learn finance while reading real documents
| Layer | Technology | Purpose |
|---|---|---|
| UI | Streamlit 1.32 | Web dashboard |
| Sentiment AI | FinBERT (HuggingFace Transformers) | Financial sentiment classification |
| Summary AI | Groq API / LLaMA 3.3 70B | Plain-English document explanation |
| PDF Parsing | pdfplumber | Extract text from uploaded PDFs |
| PDF Export | ReportLab | Generate formatted PDF reports |
| DOCX Export | python-docx | Generate Word document reports |
| Config | python-dotenv | Secure API key management |
| Deep Learning | PyTorch | FinBERT inference backend |
This project is licensed under the MIT License — see LICENSE for details.
Kupakwashe T. Mapuranga
B.Tech Artificial Intelligence & Machine Learning
Symbiosis Institute of Technology, Pune, India
📧 kupakwashemapuranga@gmail.com
🐙 github.com/kupakwash
CA3 FinTech Application Project · 2026
Built with ❤️ to make financial knowledge accessible to everyone.