A curated, annotated list of resources for AI security.
Essential machine learning concepts and courses to build a foundation before diving into AI security.
- Stanford CS229: Machine Learning - Stanford's foundational ML course covering supervised learning, deep learning, generalization, and unsupervised learning. Taught by leaders in the field.
- fast.ai Practical Deep Learning for Coders - Free, top-down approach to deep learning with PyTorch. Covers computer vision, NLP, and tabular data with hands-on Jupyter notebooks.
- Deep Learning Specialization by Andrew Ng - A fantastic 5-course series on neural networks and deep learning fundamentals—beginner-friendly and taught by AI pioneer Andrew Ng. Auditable for free.
Deep dive into neural networks, transformers, and the architectures behind modern AI systems.
- Dive into Deep Learning - Interactive deep learning book with code (PyTorch, JAX, TensorFlow), math, and exercises. Adopted at 500+ universities including Stanford, MIT, Harvard.
- Neural Networks and Deep Learning (Michael Nielsen) - Free online book explaining the core concepts behind neural networks with excellent intuition and interactive visualizations.
- Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville - The go-to textbook for deep learning theory and math. Freely available online.
Understand prompt injection attacks that manipulate LLM behavior through crafted inputs.
- Prompt Injection & the Rise of Prompt Attacks: All You Need to Know - Explains prompt injection threats, examples, and mitigations.
- OpenAI Says AI Browsers May Always Be Vulnerable to Prompt Injection Attacks - Discusses ongoing risks and hardening efforts for agentic AI like Atlas.
- Prompt Injection Attacks in 2025: Risks, Defenses & Testing - Mainstream risks and testing strategies for prompt injections.
- Prompt Injection Attacks: The Most Common AI Exploit in 2025 - Detection, blocking, and mitigation for growing prompt injection threats.
- LLM01:2025 Prompt Injection - Updated risk overview for manipulating model behavior. OWASP Gen AI Security Project, 2025.
- Rebuff - Self-hardening prompt injection detector by ProtectAI.
- Garak - NVIDIA's LLM vulnerability scanner with dozens of plugins testing for jailbreaks, prompt injection, data leakage, and more.
- Vigil LLM - Detects prompt injections and risky inputs.
- EasyJailbreak - Framework for adversarial jailbreak prompts.
- AI Village @ DEF CON - Challenges like LLM Jailbreak and AI security research.
Learn how adversarial examples fool neural networks and methods to defend against them.
- A Brief Introduction to Adversarial Examples - High-level overview of adversarial examples fooling neural networks by Madry and Schmidt.
- Key Concepts in AI Safety: Robustness and Adversarial Examples - Places adversarial robustness in AI safety context.
- A Meta-Survey of Adversarial Attacks Against Artificial Intelligence Systems - Umbrella review of attacks on DNNs. Neurocomputing, 2025.
- Adversarial Attacks and Defenses in AI Systems: Challenges, Strategies, and Future Directions - Comprehensive review of attacks and defenses. RSIS International, 2025.
- A Survey of Adversarial Examples in Computer Vision - Attack algorithms and defenses for vision models. 2025.
- Adversarial Robustness Toolbox (ART) - IBM library for attacks and defenses for ML security.
- TextAttack - Library for adversarial attacks on NLP models.
- NIST Adversarial ML Taxonomy - Standardized terminology and mitigations. NIST IR 100-2, 2025.
- ACL 2024 Tutorial: Vulnerabilities of LLMs to Adversarial Attacks - Comprehensive overview of vulnerabilities in unimodal and multimodal LLMs from NLP and cybersecurity perspectives.
Data poisoning attacks and neural network backdoors that compromise model integrity.
- Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching - Clean-label poisoning at scale. arXiv:2009.02276, 2021.
- Instruction Backdoor Attacks on Customized LLMs - Prompt-based backdoors in custom LLMs without weight modification. arXiv:2402.09179, 2024.
- Poisoning Attacks Need Only a Few Points - Constant-size poisoning effective even on web-scale models. arXiv:2510.07192, 2025.
- MNTD: Detecting AI Trojans Using Meta Neural Analysis - Black-box Trojan detection with high AUC. arXiv:1910.03137, 2021.
- Beatrix: Robust Backdoor Detection via Gram Matrices - Activation-based detection effective against advanced backdoors. NDSS 2024.
- OWASP Top 10 for LLM Applications 2025 - Critical risks including prompt injection, sensitive info disclosure, supply chain, and data poisoning.
Model extraction, membership inference, and training data extraction attacks.
- Extracting Training Data from Large Language Models - Privacy attacks via memorization. USENIX Security 2021.
- Model Leeching: An Extraction Attack Targeting LLMs - Practical model stealing from GPT-3.5 via API queries. arXiv:2309.10544, 2023.
- A Watermark for Large Language Models - Statistical watermarking for detecting AI-generated text. arXiv:2301.10226, 2023.
- Membership Inference Attacks on Machine Learning: A Survey - First comprehensive survey on MIAs with taxonomies for attacks and defenses. ACM Computing Surveys.
- A Survey of Privacy Attacks in Machine Learning - Covers membership inference, reconstruction, and model extraction attacks. ACM Computing Surveys.
- Membership Inference Attacks on Large-Scale Models: A Survey - MIAs targeting LLMs and LMMs across pre-training, fine-tuning, and RAG. arXiv:2503.19338, 2025.
- TrustLLM Benchmark - Comprehensive trustworthiness benchmark spanning truthfulness, safety, fairness, robustness, privacy, and ethics.
- awesome-ml-privacy-attacks - Curated list of 100+ papers on privacy attacks against machine learning.
- LLM Security Papers (chawins/llm-sp) - Papers and resources on LLM security and privacy including indirect prompt injection research.
Security tools for testing and defending AI systems against adversarial attacks.
- Counterfit - Microsoft penetration testing tool for ML systems.
- AI Coding Tools Exploded in 2025: The First Security Exploits Followed - Discusses vulnerabilities in AI-generated code.
- AI Agent Exploit Generation in Smart Contracts - Autonomous exploit generation using LLMs.
- AI-Powered Attack Automation: When Machine Learning Writes the Exploit Code - Projections on AI-driven cyberattacks.
- NeMo Guardrails - NVIDIA programmable guardrails for LLM safety and security.
- SecML - Secure and explainable ML library with attacks and defenses.
- Purple Llama (Meta) - Open-source LLM safety tools including Llama Guard, Prompt Guard, Code Shield, and CyberSec Eval benchmarks.
Using AI assistants and agents for automated penetration testing and security assessments.
- PentestGPT - GPT-4 powered autonomous penetration testing agent. Published at USENIX Security 2024.
- Top 10 AI Pentesting Tools (2025) - Highlights tools like Mindgard, Burp Suite, and PentestGPT.
- Best AI Pentesting Tools in 2026 - Focus on detecting business logic flaws.
- 9 AI Enabled Cybersecurity Tools in 2025 - Includes PenTest++ and CIPHER for ethical hacking.
- Top AI Pentesting Tools in 2025: PentestGPT vs. Penligent vs. PentestAI - Comparison of features and automation.
- PentestGPT: An LLM-empowered Automatic Penetration Testing Tool - Design and evaluation of autonomous pentesting. arXiv:2308.06782, 2024.
AI-powered vulnerability scanning, code analysis, and bug detection.
- A Survey of Bugs in AI-Generated Code - Empirical study on functional and security bugs. arXiv:2512.05239, 2025.
- Exploring the Role of Generative AI in Enhancing Cybersecurity - GenAI for vulnerability detection and secure coding. Computers & Security, 2025.
- GhidraGPT - Integrates GPT models into Ghidra for automated code analysis, vulnerability detection, and explanation generation.
- Semgrep - AI-assisted SAST combining rules-based scanning with LLM-powered detection for business logic flaws like IDORs.
- Everything You Wanted to Know About LLM-based Vulnerability Detection - Context-rich evaluation showing strong LLM performance. arXiv:2504.13474, 2025.
- GitHub Copilot Security Evaluation - ~40% of Copilot-generated code contained vulnerabilities. NYU 2022.
- LLMs in Software Security: A Survey of Vulnerability Detection Techniques - Comprehensive survey on using LLMs for code structure analysis and vulnerability detection. ACM Computing Surveys.
AI-assisted exploit development and attack automation techniques.
- LLM Agents can Autonomously Exploit One-day Vulnerabilities - GPT-4 agents can exploit real CVEs given descriptions. Raises questions about LLM deployment. arXiv:2404.08144, 2024.
- OWASP Gen AI Incident & Exploit Round-up, Q2'25 - Tracks exploits targeting/involving GenAI.
Tools that leverage AI for offensive security operations and analysis.
- Hound - AI auditor that builds adaptive knowledge graphs for deep code reasoning. Uses tiered AI approach for autonomous vulnerability discovery.
- HackGPT - LLM toolkit for offensive security.
- HackingBuddyGPT - Autonomous red-teaming agent with benchmarks.
- GhidrAssist - LLM extension for Ghidra with ReAct agentic mode for autonomous reverse engineering investigation.
- PyRIT (Python Risk Identification Tool) - Microsoft red-teaming framework for generative AI. Automates adversarial prompt generation and risk assessment.
- AI Security Analyzer - Generates security docs from codebases.
- BurpGPT - Burp Suite extension for AI-powered vulnerability scanning.
- CAI: Cybersecurity AI - Framework for building AI-driven security tools by Alias Robotics.
Industry standards, threat frameworks, and evaluation benchmarks for AI security.
- ScaBench - Smart contract audit benchmark with 500+ real-world vulnerabilities from Code4rena, Cantina, and Sherlock for evaluating AI audit agents.
- RobustBench - Leaderboard for adversarial robustness benchmarking.
- JailbreakBench - Benchmark for LLM jailbreak attacks and defenses.
- Stanford AIR-Bench 2024 - AI safety benchmark aligned with emerging government regulations and company policies.
- FLI AI Safety Index 2024 - Future of Life Institute's assessment of AI company safety practices and accountability.
- MITRE ATLAS - Adversarial Threat Landscape for AI Systems. Threat matrix documenting real-world attacks on ML (like ATT&CK for AI).
- NIST AI Risk Management Framework - Framework for managing AI risks throughout the AI lifecycle.
Essential books covering AI security, adversarial ML, and security applications.
- Adversarial Machine Learning (Cambridge) - Complete introduction to building robust ML in adversarial environments. By Joseph, Nelson, Rubinstein, and Tygar.
- Adversarial Learning and Secure AI (Cambridge, 2023) - First textbook on adversarial learning. Hands-on projects for defending against attacks.
- Adversarial Robustness for Machine Learning (Elsevier) - Comprehensive coverage of adversarial attack, defense, and verification by Pin-Yu Chen (IBM Research).
- Machine Learning and Security - ML in cybersecurity and evasions by Clarence Chio and David Freeman (2018).
- Artificial Intelligence: A Modern Approach - Broad AI algorithms background by Stuart Russell and Peter Norvig.
AI security communities, conferences, and events to stay connected.
- OWASP GenAI Security Project - Global initiative for GenAI security including Top 10 for LLMs, Agentic AI risks, and red teaming guides.
- MLSecOps Podcast - ML security discussions and interviews.
- GenAI Security Podcast - Generative AI security topics and news.
- AI Vulnerability Database (AVID) - Community database of AI vulnerabilities and incidents.
Newsletters and awesome lists to stay current with AI security developments.
- AI Security Newsletter - Research and threats digest collection.
- TalEliyahu/Awesome-AI-Security - Governance and tools focus awesome list.
- ottosulin/awesome-ai-security - Offensive tools and labs awesome list.
- ElNiak/awesome-ai-cybersecurity - AI in cybersecurity awesome list.
- corca-ai/awesome-llm-security - LLM-specific security awesome list.
© muellerberndt · GitHub