Awesome AI Security

A curated, annotated list of resources for AI security.

View the interactive roadmap

ML Foundations

Essential machine learning concepts and courses to build a foundation before diving into AI security.

Stanford CS229: Machine Learning - Stanford's foundational ML course covering supervised learning, deep learning, generalization, and unsupervised learning. Taught by leaders in the field.
fast.ai Practical Deep Learning for Coders - Free, top-down approach to deep learning with PyTorch. Covers computer vision, NLP, and tabular data with hands-on Jupyter notebooks.
Deep Learning Specialization by Andrew Ng - A fantastic 5-course series on neural networks and deep learning fundamentals—beginner-friendly and taught by AI pioneer Andrew Ng. Auditable for free.

Deep Learning

Deep dive into neural networks, transformers, and the architectures behind modern AI systems.

Dive into Deep Learning - Interactive deep learning book with code (PyTorch, JAX, TensorFlow), math, and exercises. Adopted at 500+ universities including Stanford, MIT, Harvard.
Neural Networks and Deep Learning (Michael Nielsen) - Free online book explaining the core concepts behind neural networks with excellent intuition and interactive visualizations.
Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville - The go-to textbook for deep learning theory and math. Freely available online.

Prompt Injection

Understand prompt injection attacks that manipulate LLM behavior through crafted inputs.

Prompt Injection & the Rise of Prompt Attacks: All You Need to Know - Explains prompt injection threats, examples, and mitigations.
OpenAI Says AI Browsers May Always Be Vulnerable to Prompt Injection Attacks - Discusses ongoing risks and hardening efforts for agentic AI like Atlas.
Prompt Injection Attacks in 2025: Risks, Defenses & Testing - Mainstream risks and testing strategies for prompt injections.
Prompt Injection Attacks: The Most Common AI Exploit in 2025 - Detection, blocking, and mitigation for growing prompt injection threats.
LLM01:2025 Prompt Injection - Updated risk overview for manipulating model behavior. OWASP Gen AI Security Project, 2025.
Rebuff - Self-hardening prompt injection detector by ProtectAI.
Garak - NVIDIA's LLM vulnerability scanner with dozens of plugins testing for jailbreaks, prompt injection, data leakage, and more.
Vigil LLM - Detects prompt injections and risky inputs.
EasyJailbreak - Framework for adversarial jailbreak prompts.
AI Village @ DEF CON - Challenges like LLM Jailbreak and AI security research.

Adversarial Attacks

Learn how adversarial examples fool neural networks and methods to defend against them.

A Brief Introduction to Adversarial Examples - High-level overview of adversarial examples fooling neural networks by Madry and Schmidt.
Key Concepts in AI Safety: Robustness and Adversarial Examples - Places adversarial robustness in AI safety context.
A Meta-Survey of Adversarial Attacks Against Artificial Intelligence Systems - Umbrella review of attacks on DNNs. Neurocomputing, 2025.
Adversarial Attacks and Defenses in AI Systems: Challenges, Strategies, and Future Directions - Comprehensive review of attacks and defenses. RSIS International, 2025.
A Survey of Adversarial Examples in Computer Vision - Attack algorithms and defenses for vision models. 2025.
Adversarial Robustness Toolbox (ART) - IBM library for attacks and defenses for ML security.
TextAttack - Library for adversarial attacks on NLP models.
NIST Adversarial ML Taxonomy - Standardized terminology and mitigations. NIST IR 100-2, 2025.
ACL 2024 Tutorial: Vulnerabilities of LLMs to Adversarial Attacks - Comprehensive overview of vulnerabilities in unimodal and multimodal LLMs from NLP and cybersecurity perspectives.

Poisoning & Backdoors

Data poisoning attacks and neural network backdoors that compromise model integrity.

Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching - Clean-label poisoning at scale. arXiv:2009.02276, 2021.
Instruction Backdoor Attacks on Customized LLMs - Prompt-based backdoors in custom LLMs without weight modification. arXiv:2402.09179, 2024.
Poisoning Attacks Need Only a Few Points - Constant-size poisoning effective even on web-scale models. arXiv:2510.07192, 2025.
MNTD: Detecting AI Trojans Using Meta Neural Analysis - Black-box Trojan detection with high AUC. arXiv:1910.03137, 2021.
Beatrix: Robust Backdoor Detection via Gram Matrices - Activation-based detection effective against advanced backdoors. NDSS 2024.
OWASP Top 10 for LLM Applications 2025 - Critical risks including prompt injection, sensitive info disclosure, supply chain, and data poisoning.

Privacy & Extraction

Model extraction, membership inference, and training data extraction attacks.

Extracting Training Data from Large Language Models - Privacy attacks via memorization. USENIX Security 2021.
Model Leeching: An Extraction Attack Targeting LLMs - Practical model stealing from GPT-3.5 via API queries. arXiv:2309.10544, 2023.
A Watermark for Large Language Models - Statistical watermarking for detecting AI-generated text. arXiv:2301.10226, 2023.
Membership Inference Attacks on Machine Learning: A Survey - First comprehensive survey on MIAs with taxonomies for attacks and defenses. ACM Computing Surveys.
A Survey of Privacy Attacks in Machine Learning - Covers membership inference, reconstruction, and model extraction attacks. ACM Computing Surveys.
Membership Inference Attacks on Large-Scale Models: A Survey - MIAs targeting LLMs and LMMs across pre-training, fine-tuning, and RAG. arXiv:2503.19338, 2025.
TrustLLM Benchmark - Comprehensive trustworthiness benchmark spanning truthfulness, safety, fairness, robustness, privacy, and ethics.
awesome-ml-privacy-attacks - Curated list of 100+ papers on privacy attacks against machine learning.
LLM Security Papers (chawins/llm-sp) - Papers and resources on LLM security and privacy including indirect prompt injection research.

Tools & Frameworks

Security tools for testing and defending AI systems against adversarial attacks.

Counterfit - Microsoft penetration testing tool for ML systems.
AI Coding Tools Exploded in 2025: The First Security Exploits Followed - Discusses vulnerabilities in AI-generated code.
AI Agent Exploit Generation in Smart Contracts - Autonomous exploit generation using LLMs.
AI-Powered Attack Automation: When Machine Learning Writes the Exploit Code - Projections on AI-driven cyberattacks.
NeMo Guardrails - NVIDIA programmable guardrails for LLM safety and security.
SecML - Secure and explainable ML library with attacks and defenses.
Purple Llama (Meta) - Open-source LLM safety tools including Llama Guard, Prompt Guard, Code Shield, and CyberSec Eval benchmarks.

AI Pentesting

Using AI assistants and agents for automated penetration testing and security assessments.

PentestGPT - GPT-4 powered autonomous penetration testing agent. Published at USENIX Security 2024.
Top 10 AI Pentesting Tools (2025) - Highlights tools like Mindgard, Burp Suite, and PentestGPT.
Best AI Pentesting Tools in 2026 - Focus on detecting business logic flaws.
9 AI Enabled Cybersecurity Tools in 2025 - Includes PenTest++ and CIPHER for ethical hacking.
Top AI Pentesting Tools in 2025: PentestGPT vs. Penligent vs. PentestAI - Comparison of features and automation.
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool - Design and evaluation of autonomous pentesting. arXiv:2308.06782, 2024.

Vulnerability Detection

AI-powered vulnerability scanning, code analysis, and bug detection.

A Survey of Bugs in AI-Generated Code - Empirical study on functional and security bugs. arXiv:2512.05239, 2025.
Exploring the Role of Generative AI in Enhancing Cybersecurity - GenAI for vulnerability detection and secure coding. Computers & Security, 2025.
GhidraGPT - Integrates GPT models into Ghidra for automated code analysis, vulnerability detection, and explanation generation.
Semgrep - AI-assisted SAST combining rules-based scanning with LLM-powered detection for business logic flaws like IDORs.
Everything You Wanted to Know About LLM-based Vulnerability Detection - Context-rich evaluation showing strong LLM performance. arXiv:2504.13474, 2025.
GitHub Copilot Security Evaluation - ~40% of Copilot-generated code contained vulnerabilities. NYU 2022.
LLMs in Software Security: A Survey of Vulnerability Detection Techniques - Comprehensive survey on using LLMs for code structure analysis and vulnerability detection. ACM Computing Surveys.

Exploit Generation

AI-assisted exploit development and attack automation techniques.

LLM Agents can Autonomously Exploit One-day Vulnerabilities - GPT-4 agents can exploit real CVEs given descriptions. Raises questions about LLM deployment. arXiv:2404.08144, 2024.
OWASP Gen AI Incident & Exploit Round-up, Q2'25 - Tracks exploits targeting/involving GenAI.

AI Security Tools

Tools that leverage AI for offensive security operations and analysis.

Hound - AI auditor that builds adaptive knowledge graphs for deep code reasoning. Uses tiered AI approach for autonomous vulnerability discovery.
HackGPT - LLM toolkit for offensive security.
HackingBuddyGPT - Autonomous red-teaming agent with benchmarks.
GhidrAssist - LLM extension for Ghidra with ReAct agentic mode for autonomous reverse engineering investigation.
PyRIT (Python Risk Identification Tool) - Microsoft red-teaming framework for generative AI. Automates adversarial prompt generation and risk assessment.
AI Security Analyzer - Generates security docs from codebases.
BurpGPT - Burp Suite extension for AI-powered vulnerability scanning.
CAI: Cybersecurity AI - Framework for building AI-driven security tools by Alias Robotics.

Benchmarks & Standards

Industry standards, threat frameworks, and evaluation benchmarks for AI security.

ScaBench - Smart contract audit benchmark with 500+ real-world vulnerabilities from Code4rena, Cantina, and Sherlock for evaluating AI audit agents.
RobustBench - Leaderboard for adversarial robustness benchmarking.
JailbreakBench - Benchmark for LLM jailbreak attacks and defenses.
Stanford AIR-Bench 2024 - AI safety benchmark aligned with emerging government regulations and company policies.
FLI AI Safety Index 2024 - Future of Life Institute's assessment of AI company safety practices and accountability.
MITRE ATLAS - Adversarial Threat Landscape for AI Systems. Threat matrix documenting real-world attacks on ML (like ATT&CK for AI).
NIST AI Risk Management Framework - Framework for managing AI risks throughout the AI lifecycle.

Books

Essential books covering AI security, adversarial ML, and security applications.

Adversarial Machine Learning (Cambridge) - Complete introduction to building robust ML in adversarial environments. By Joseph, Nelson, Rubinstein, and Tygar.
Adversarial Learning and Secure AI (Cambridge, 2023) - First textbook on adversarial learning. Hands-on projects for defending against attacks.
Adversarial Robustness for Machine Learning (Elsevier) - Comprehensive coverage of adversarial attack, defense, and verification by Pin-Yu Chen (IBM Research).
Machine Learning and Security - ML in cybersecurity and evasions by Clarence Chio and David Freeman (2018).
Artificial Intelligence: A Modern Approach - Broad AI algorithms background by Stuart Russell and Peter Norvig.

Communities & Events

AI security communities, conferences, and events to stay connected.

OWASP GenAI Security Project - Global initiative for GenAI security including Top 10 for LLMs, Agentic AI risks, and red teaming guides.
MLSecOps Podcast - ML security discussions and interviews.
GenAI Security Podcast - Generative AI security topics and news.
AI Vulnerability Database (AVID) - Community database of AI vulnerabilities and incidents.

Newsletters & Lists

Newsletters and awesome lists to stay current with AI security developments.

AI Security Newsletter - Research and threats digest collection.
TalEliyahu/Awesome-AI-Security - Governance and tools focus awesome list.
ottosulin/awesome-ai-security - Offensive tools and labs awesome list.
ElNiak/awesome-ai-cybersecurity - AI in cybersecurity awesome list.
corca-ai/awesome-llm-security - LLM-specific security awesome list.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
public		public
scripts		scripts
src		src
.gitignore		.gitignore
BUIDL.md		BUIDL.md
CONTRIBUTION.md		CONTRIBUTION.md
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome AI Security

ML Foundations

Deep Learning

Prompt Injection

Adversarial Attacks

Poisoning & Backdoors

Privacy & Extraction

Tools & Frameworks

AI Pentesting

Vulnerability Detection

Exploit Generation

AI Security Tools

Benchmarks & Standards

Books

Communities & Events

Newsletters & Lists

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome AI Security

ML Foundations

Deep Learning

Prompt Injection

Adversarial Attacks

Poisoning & Backdoors

Privacy & Extraction

Tools & Frameworks

AI Pentesting

Vulnerability Detection

Exploit Generation

AI Security Tools

Benchmarks & Standards

Books

Communities & Events

Newsletters & Lists

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages