Streamlit app that summarizes legal documents, surfaces risks/clauses, and tracks GDPR recitals with email + Google Sheets notifications.
- Upload PDF/DOCX, get chunked LLM summaries (Groq) and risk/clauses detection.
- Visualizations: risk matrix, heatmap, distributions, and exportable PDF report.
- Q&A over the uploaded document via Groq.
- GDPR recital tracker with email + Sheets push.
- RAG CLI helper in
rag_pipeline.py(HF Hub LLM + FAISS embeddings).
- Clone and set up Python 3.10+.
- Create a virtualenv and install:
pip install -r requirements.txt
- Copy
.env.exampleto.envand fill in keys/secrets. - Run the Streamlit app:
streamlit run app.py
- Optional: RAG CLI
python rag_pipeline.py --pdf path/to/file.pdf
GROQ_API_KEY– Groq LLM key.HUGGINGFACEHUB_API_TOKEN– for HF Hub LLM (RAG path).- Email:
SENDER_EMAIL,EMAIL_PASS,RECEIVER_EMAIL,FEEDBACK_EMAIL(andEMAIL_ADDRESS/EMAIL_PASSWORDfor tracker). - Google Sheets:
GOOGLE_CREDENTIALS_PATH,GOOGLE_SHEET_ID.
app.py– Streamlit navigation.legal_document_analysis.py– document ingestion, summarization, risk visuals, PDF export, feedback.Update_tracker.py– GDPR recital tracker + email/Sheets.rag_pipeline.py– CLI RAG pipeline.requirements.txt– runtime deps; seerequirements-dev.txtfor tooling.
- Lint/format:
ruffandblack(seerequirements-dev.txt). - Tests:
pytestintests/(offline; no API calls). - CI: basic GitHub Actions workflow at
.github/workflows/ci.yml.
- No secrets are committed; keep
credentials.jsonand.envout of git (see.gitignore). - The PDF export now uses built-in fonts and cleans up temporary files.
- Network calls include timeouts and user-agent headers.
MIT License – see LICENSE.