Skip to content

tan-73/GDPR-Compliance-Checker

Repository files navigation

Legal Document Summarization & GDPR Recital Tracker

Streamlit app that summarizes legal documents, surfaces risks/clauses, and tracks GDPR recitals with email + Google Sheets notifications.

Features

  • Upload PDF/DOCX, get chunked LLM summaries (Groq) and risk/clauses detection.
  • Visualizations: risk matrix, heatmap, distributions, and exportable PDF report.
  • Q&A over the uploaded document via Groq.
  • GDPR recital tracker with email + Sheets push.
  • RAG CLI helper in rag_pipeline.py (HF Hub LLM + FAISS embeddings).

Quickstart

  1. Clone and set up Python 3.10+.
  2. Create a virtualenv and install:
    pip install -r requirements.txt
  3. Copy .env.example to .env and fill in keys/secrets.
  4. Run the Streamlit app:
    streamlit run app.py
  5. Optional: RAG CLI
    python rag_pipeline.py --pdf path/to/file.pdf

Required Environment

  • GROQ_API_KEY – Groq LLM key.
  • HUGGINGFACEHUB_API_TOKEN – for HF Hub LLM (RAG path).
  • Email: SENDER_EMAIL, EMAIL_PASS, RECEIVER_EMAIL, FEEDBACK_EMAIL (and EMAIL_ADDRESS/EMAIL_PASSWORD for tracker).
  • Google Sheets: GOOGLE_CREDENTIALS_PATH, GOOGLE_SHEET_ID.

Project Structure

  • app.py – Streamlit navigation.
  • legal_document_analysis.py – document ingestion, summarization, risk visuals, PDF export, feedback.
  • Update_tracker.py – GDPR recital tracker + email/Sheets.
  • rag_pipeline.py – CLI RAG pipeline.
  • requirements.txt – runtime deps; see requirements-dev.txt for tooling.

Development

  • Lint/format: ruff and black (see requirements-dev.txt).
  • Tests: pytest in tests/ (offline; no API calls).
  • CI: basic GitHub Actions workflow at .github/workflows/ci.yml.

Notes

  • No secrets are committed; keep credentials.json and .env out of git (see .gitignore).
  • The PDF export now uses built-in fonts and cleans up temporary files.
  • Network calls include timeouts and user-agent headers.

License

MIT License – see LICENSE.

About

Streamlit app that summarizes legal docs and flags GDPR risks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages