Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 26 additions & 27 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,37 +2,36 @@ name: Docker Image CI-CD

on:
push:
branches: [ "main" ]
branches: [ "master" ]
pull_request:
branches: [ "main" ]
branches: [ "master" ]

jobs:
build:

runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Cache Docker layers
uses: actions/cache@v3
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-

- name: Build Docker image
run: docker build -t informa-truth .

- name: Test the application (Run tests inside container)
run: docker run --rm informa-truth pytest tests/
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Cache Docker layers
uses: actions/cache@v3
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-

- name: Build Docker image
run: docker build -t informa-truth .

- name: Test the application (Run tests inside container with PYTHONPATH)
run: docker run --rm -e PYTHONPATH=/app informa-truth pytest tests/
13 changes: 8 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
# Use an official Python runtime as a parent image
FROM python:3.11-slim

# Set environment variable for PYTHONPATH
ENV PYTHONPATH=/app

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container
COPY . /app

# Install the dependencies
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Expose Streamlit default port
EXPOSE 8501
# Expose the port Flask runs on
EXPOSE 5000

# Correct command to run Streamlit app
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
# Default command to run the Flask app
CMD ["python", "app.py"]
216 changes: 157 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# 📘 InformaTruth: AI-Driven News Veracity Analyzer
This project addresses the challenge by fine-tuning a transformer-based classification model (RoBERTa) on the LIAR dataset to automatically determine whether a news statement is real or fake. Additionally, it employs a generative LLM (FLAN-T5) to produce natural language explanations for its predictions, increasing user trust and transparency in the system.
# 📘 InformaTruth: AI-Driven News Authenticity Analyzer
🧠 Fine-tuned RoBERTa-based Multi-Modal Fake News Detector with Explanation Generation using FLAN-T5, URL/PDF/Text support, and Agentic LangGraph orchestration. Orchestrated through a LangGraph-powered agentic pipeline with Planner, Retriever, Tool Router, Fallback Agent, and LLM Answerer agents, plus memory and dynamic tool augmentation.

[![InformaTruth](https://github.com/user-attachments/assets/42d5bc32-c739-4f5e-a8e6-9e89cc0a6e6e)](https://github.com/user-attachments/assets/42d5bc32-c739-4f5e-a8e6-9e89cc0a6e6e)
[![InformaTruth](https://github.com/user-attachments/assets/a5c9932e-03a9-4008-9b34-f8fa80687334)](https://github.com/user-attachments/assets/a5c9932e-03a9-4008-9b34-f8fa80687334)

---

## 🔍 Overview
In the digital age, misinformation spreads rapidly across news outlets, social media, and online platforms. With the increasing difficulty of distinguishing between credible journalism and deceptive content, there is a growing demand for automated systems that can detect fake news efficiently and explain their reasoning. Manual fact-checking is time-consuming, prone to bias, and often fails to scale with the speed of information propagation. It also includes a user-friendly UI, Rest API, modular components and a complete Dockerized CI/CD pipeline for enterprise deployment.
<!-- https://github.com/user-attachments/assets/a5c9932e-03a9-4008-9b34-f8fa80687334 -->

---

Expand All @@ -16,81 +14,181 @@ In the digital age, misinformation spreads rapidly across news outlets, social m

---

## 🔍 Overview
In the digital age, misinformation spreads rapidly across news outlets, social media, and online platforms. With the increasing difficulty of distinguishing between credible journalism and deceptive content, This agentic AI system detects fake news from text, PDF, or website URLs using a fine-tuned RoBERTa model. It leverages a multi-agent architecture with LangGraph, including Planner, Retriever, Tool Router, and Explanation Agent. When a claim is classified, the system uses FLAN-T5 to generate human-readable reasoning. If local evidence fails, it falls back on Wikipedia or DuckDuckGo search. This production-grade solution supports real-world fact-checking, multi-source ingestion, tool-augmented reasoning, and modular orchestration.

---

## ⚙️ Tech Stack
| **Category** | **Technology/Resource** |
|----------------------------|----------------------------------------------------------------------------------------|
| **Core Framework** | PyTorch, Transformers |
| **Classification Model** | Fine-tuned RoBERTa-base |
| **Explanation Model** | FLAN-T5-base |
| **Training Data** | LIAR Dataset (Political Fact-Checking) |
| **Evaluation Metrics** | Accuracy, Precision, Recall, F1-score |
| **Text Extraction** | Newspaper3k (URLs), PyMuPDF (PDFs) |
| **Training Framework** | HuggingFace Trainer |
| **Deployment** | Streamlit (Web Interface) |
| **Hosting** | Render |
| **Category** | **Technology/Resource** |
| --------------------------- | ------------------------------------------------------------------------------------------------------ |
| **Core Framework** | PyTorch, Transformers, HuggingFace |
| **Classification Model** | Fine-tuned RoBERTa-base on LIAR Dataset |
| **Explanation Model** | FLAN-T5-base (Zero-shot Prompting) |
| **Training Data** | LIAR Dataset (Political Fact-Checking) |
| **Evaluation Metrics** | Accuracy, Precision, Recall, F1-score |
| **Training Framework** | HuggingFace Trainer |
| **LangGraph Orchestration** | LangGraph (Multi-Agent Directed Acyclic Execution Graph) |
| **Agents Used** | PlannerAgent, InputHandlerAgent, ToolRouterAgent, ExecutorAgent, ExplanationAgent, FallbackSearchAgent |
| **Input Modalities** | Raw Text, Website URLs (via Newspaper3k), PDF Documents (via PyMuPDF) |
| **Tool Augmentation** | DuckDuckGo Search API (Fallback), Wikipedia (Planned), ToolRouter Logic |
| **Web Scraping** | Newspaper3k (HTML → Clean Article) |
| **PDF Parsing** | PyMuPDF |
| **Explainability** | Natural language justification generated using FLAN-T5 |
| **State Management** | Shared State Object (LangGraph-compatible) |
| **Deployment Interface** | Flask (HTML,CSS,JS) |
| **Hosting Platform** | Render (Docker) |
| **Version Control** | Git, GitHub |
| **Logging & Debugging** | Logs, Print Debugs, Custom Logger |
| **Input Support** | Text, URLs, PDF documents |
| **Explainability** | FLAN-T5 generated natural language explanations |

---

### ✅ Key Features
- **Multi-format input**: Supports raw text, URLs, and PDF files.
- **NLP Pipeline**: Includes summarization, classification, and LLM-based explanation.
- **Moduler coding and logging**: Clean, modular code with logging.
- **Streamlit UI**: Clean, responsive frontend for interaction.
- **Rest API**: For integration with other systems.
- **Dockerized**: Fully containerized for production deployments.
- **CI/CD**: GitHub Actions pipeline for testing, linting, and Docker validation.
## ✅ Key Features

* **🔄 Multi-Format Input Support**
Accepts raw **text**, **web URLs**, and **PDF documents** with automated preprocessing for each type.

* **🧠 Full NLP Pipeline**
Integrates summarization (optional), **fake news classification** (RoBERTa), and **natural language explanation** (FLAN-T5).

* **🧱 Modular Agent-Based Architecture**
Built using **LangGraph** with modular agents: `Planner`, `Tool Router`, `Executor`, `Explanation Agent`, and `Fallback Agent`.

* **📜 Explanation Generation**
Uses **FLAN-T5** to generate human-readable, zero-shot rationales for model predictions.

* **🧪 Tool-Augmented & Fallback Logic**
Dynamically queries **DuckDuckGo** when local context is insufficient, enabling robust fallback handling.

* **🧼 Clean, Modular Codebase with Logging**
Structured using clean architecture principles, agent separation, and informative logging.

* **🌐 Flask with Web UI**
User-friendly, interactive, and responsive frontend for input, output, and visual explanations.

* **🐳 Dockerized for Deployment**
Fully containerized setup with `Dockerfile` and `requirements.txt` for seamless deployment.

* **⚙️ CI/CD with GitHub Actions**
Automated pipelines for testing, linting, and Docker build validation to ensure code quality and production-readiness.

---

## 📦 Project Structure
## 📦 Project File Structure

```bash
InformaTruth/
├── src/
│ ├── config.py # Configuration
│ ├── data.py # Data handling
│ ├── inference.py # Model inference
│ ├── main.py # Main script
│ ├── model.py # Model definition
│ └── loggger.py # Logging
├── fine_tuned_liar_detector/ # Fine-tuned model
├── .github/ # GitHub Actions
│ └── workflows/
│ └── main.yml
├── agents/ # Modular agents (planner, executor, etc.)
│ ├── executor.py
│ ├── fallback_search.py
│ ├── input_handler.py
│ ├── planner.py
│ ├── router.py
│ └── __init__.py
├── fine_tuned_liar_detector/ # Fine-tuned RoBERTa model directory
│ ├── config.json
│ ├── vocab.json
│ ├── tokenizer_config.json
│ ├── special_tokens_map.json
│ ├── model.safetensors
│ └── merges.txt
├── graph/ # LangGraph state and builder logic
│ ├── builder.py
│ ├── state.py
│ └── __init__.py
├── models/ # Classification + LLM model loader
│ ├── classifier.py
│ ├── loader.py
│ └── __init__.py
├── notebook/
│ └── experiment.ipynb # Experimentation
├── news/ # Sample news or test input
│ └── news.pdf
├── test/
├── notebook/ # Jupyter notebooks for experimentation
│ ├── 1 Fine-Tuning.ipynb
│ └── 2 Fine-Tuning with Multi Agent.ipynb
├── static/ # Static files (CSS, JS)
│ ├── css/
│ │ └── style.css
│ └── js/
│ └── script.js
├── templates/ # HTML templates for Flask UI
│ ├── dj_base.html
│ └── dj_index.html
├── tests/ # Unit tests
│ └── test_app.py
├── .github/ # GitHub Actions
│ └── workflows/
│ └── main.yml
├── train/ # Training logic
│ ├── config.py
│ ├── data_loader.py
│ ├── predictor.py
│ ├── run.py
│ ├── trainer.py
│ └── __init__.py
├── utils/ # Utilities like logging, evaluation
│ ├── logger.py
│ ├── results.py
│ └── __init__.py
├── app.py # Streamlit app
├── app.png # Demo
├── demo.webm # Demo video
├── setup.py # Python setup file
├── Dockerfile # Dockerfile
├── flask_api.py # Rest API
├── requirements.txt # Dependencies
├── .gitignore # Git ignore file
├── LICENSE # License
└── README.md # This file
├── __init__.py
├── app.png # Demo
├── demo.webm # Demo video
├── app.py # Flask app entry point
├── main.py # Main script / orchestrator
├── config.py # Configuratin file
├── setup.py # Project setup for pip install
├── render.yaml # Project setup render
├── Dockerfile # Docker container spec
├── requirements.txt # Python dependencies
├── LICENSE # License file
├── .gitignore # Git ignore rules
├── .gitattributes # Git lfs rules
└── README.md # Readme
```

---

## 🧱 System Architecture
```mermaid
graph TD
A[Input] --> B{Input Type}
B -->|Text| C[Direct Processing]
B -->|URL| D[Newspaper3k Extraction]
B -->|PDF| E[PyMuPDF Extraction]
C & D & E --> F[Fine Tuneing RoBERTa Classification]
F --> G[FLAN-T5 Explanation]
G --> H[Streamlit UI Output]
A[User Input] --> B{Input Type}
B -->|Text| C[Direct Text Processing]
B -->|URL| D[Newspaper3k Parser]
B -->|PDF| E[PyMuPDF Parser]

C --> F[Text Cleaner]
D --> F
E --> F

F --> G[Context Validator]
G -->|Sufficient Context| H[RoBERTa Classifier]
G -->|Insufficient Context| I[Web Search Agent]

I --> J[Context Aggregator]
J --> H

H --> K[FLAN-T5 Explanation Generator]
K --> L[Output Formatter]

L --> M[Web UI using Flask,HTML,CSS,JS]

style M fill:#e3f2fd,stroke:#90caf9
style G fill:#fff9c4,stroke:#fbc02d
style I fill:#fbe9e7,stroke:#ff8a65
style H fill:#f1f8e9,stroke:#aed581
```

---
Expand Down Expand Up @@ -167,4 +265,4 @@ jobs:
🔗 [Facebook](https://www.facebook.com/mdemon.hasan2001/)
🔗 [WhatsApp](https://wa.me/8801834363533)

---
---
7 changes: 7 additions & 0 deletions __init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Package initialization
from .models.loader import ModelLoader
from .graph.builder import PipelineBuilder
from .utils.logger import setup_logging
from .utils.results import display_results

__all__ = ['ModelLoader', 'PipelineBuilder', 'setup_logging', 'display_results']
7 changes: 7 additions & 0 deletions agents/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from .input_handler import InputHandler
from .planner import Planner
from .fallback_search import FallbackSearch
from .router import Router
from .executor import Executor

__all__ = ['InputHandler', 'Planner', 'FallbackSearch', 'Router', 'Executor']
Loading