AIRT — AI Red Team Academy | Free AI Security Course

Topics Covered

What is AI Red Teaming and why it matters
Traditional vs AI Red Teaming: key differences
The AI Attack Surface: models, APIs, training data, outputs, infrastructure
MITRE ATLAS Framework: 14 tactics, 66 techniques for AI adversary behavior
NVIDIA AI Kill Chain: Recon → Poison → Hijack → Persist → Impact
OWASP Top 10 for LLM Applications (2025 edition)
NIST AI 100-2: Adversarial ML Taxonomy
Threat modeling for AI systems
Legal and ethical considerations in AI red teaming
Setting up your AI red teaming lab environment

Deploy a complete AI red teaming environment with local LLMs (Ollama), vector databases, and testing tools. Includes a vulnerable chatbot application as your first target.

Deploy Ollama with a local LLM (Mistral 7B or Llama 3)
Set up ChromaDB vector database
Deploy a vulnerable AI chatbot application
Install and configure garak, PyRIT, and promptfoo
Run your first automated vulnerability scan with garak
Document findings using the MITRE ATLAS taxonomy

Download Lab

References

Topics Covered

Direct prompt injection: overriding system prompts
Indirect prompt injection: poisoning external context
Jailbreaking techniques: DAN, role-play, context manipulation
Encoding-based attacks: Base64, ROT13, Morse, Leetspeak, Unicode
Multi-turn attacks: Crescendo and context accumulation
Policy Puppetry and instruction hierarchy exploitation
Token-level attacks and adversarial suffixes
Automated prompt injection with evolutionary algorithms
Measuring Attack Success Rate (ASR)
Bypassing guardrails: character injection, AML evasion methods
Testing guardrail products: Azure Prompt Shield, Meta Prompt Guard, NeMo
Defense analysis: what works and what doesn't

Attack a series of increasingly hardened chatbots. Start with unprotected models, progress through guardrail-protected systems, and learn to systematically discover bypasses.

Perform direct prompt injection against an unprotected chatbot
Extract system prompts from a 'secure' application
Bypass content filters using encoding techniques (Base64, Unicode confusables)
Execute multi-turn Crescendo attacks
Use garak to automate jailbreak discovery
Test and bypass a DeBERTa-based prompt injection classifier
Implement evolutionary prompt generation to find novel bypasses
Calculate and report ASR across different attack strategies

Download Lab

References

Topics Covered

RAG architecture deep dive: ingestion, embedding, retrieval, generation
The RAG attack surface: every component is a target
Knowledge base poisoning: injecting malicious documents
Indirect prompt injection through retrieved context
HijackRAG: manipulating retrieval mechanisms (black-box and white-box)
Vector database security: the 3,000+ exposed databases problem
Embedding inversion attacks: reconstructing source data from vectors
Data poisoning in vector databases
Membership and attribute inference attacks
Semantic deception: fooling similarity search
Cross-context information conflicts
RAG credential harvesting (MITRE ATLAS technique)
Orchestration layer exploits: LangChain, LlamaIndex vulnerabilities
CVE-2025-27135: RAGFlow SQL injection case study
Microsoft 365 Copilot exploit chain: prompt injection + ASCII smuggling

Build and then systematically compromise a RAG application. Poison its knowledge base, hijack retrieval, perform embedding inversion, and exfiltrate data through the LLM.

Deploy a vulnerable RAG application with ChromaDB and LangChain
Poison the knowledge base with malicious documents
Perform indirect prompt injection through poisoned retrieved context
Execute embedding inversion to recover source text from vectors
Demonstrate membership inference against the vector database
Exploit semantic deception to manipulate search results
Chain RAG poisoning with data exfiltration via LLM output
Test unauthenticated vector database access
Identify and exploit orchestration framework vulnerabilities

Download Lab

References

Topics Covered

Multi-agent AI architectures: how agents communicate and coordinate
Trust relationships between agents and their exploitation
Communication interference and man-in-the-middle attacks on agents
Byzantine attacks and agent impersonation
Emergent exploitation: M-Spoiler framework for collective manipulation
Jailbreak propagation across multi-agent systems
Remote Code Execution (RCE) through agent tool use
Memory manipulation attacks on agent long-term memory
Thread injection in agent conversations
Over-permissioned agent actions and privilege escalation
Agent configuration modification for persistent backdoors
Activation trigger discovery and exploitation
AI Agent Tool Invocation for unauthorized actions
Zero-trust architecture for agent interactions
MITRE ATLAS 2025: 14 new agent-specific attack techniques

Attack a multi-agent customer service system where agents collaborate to handle requests. Compromise one agent to influence others, escalate privileges, and exfiltrate data through tool invocations.

Map the multi-agent system architecture and trust relationships
Perform agent impersonation through prompt injection
Demonstrate jailbreak propagation from one agent to another
Manipulate agent memory to create persistent backdoors
Exploit agent tool access to perform unauthorized actions
Execute data exfiltration via agent tool invocation
Discover and exploit activation triggers
Test inter-agent communication integrity
Implement and test zero-trust defenses

Download Lab

References

Topics Covered

The AI supply chain: models, datasets, frameworks, dependencies
Model poisoning: backdoors, sleeper agents, and trojan models
Malicious model serialization: pickle exploits and code execution
Typosquatting on model registries (openai-official, chatgpt-api, tensorfllow)
Training data poisoning: medical LLM case study ($5 to poison)
Backdoor triggers and sleeper agent models (Anthropic research)
Fine-tuning attacks: corrupting model behavior through adaptation
Framework vulnerabilities: LangChain, LlamaIndex, Haystack exploits
API key exposure and credential leakage in AI pipelines
Container and infrastructure security for ML deployments
Model theft via distillation and extraction attacks
SBOM for AI: Software and ML Bill of Materials
Supply chain attack case studies: 3CX, NullBulge/Hugging Face
Detecting and preventing model poisoning
Secure model provenance and integrity verification

Simulate supply chain attacks against an ML pipeline. Create a backdoored model, exploit pickle deserialization, demonstrate typosquatting, and poison training data to corrupt model behavior.

Create a model with a hidden backdoor trigger
Demonstrate malicious pickle deserialization for code execution
Simulate a typosquatting attack on a model registry
Poison training data to introduce targeted misclassification
Exploit insecure API key storage in an ML pipeline
Perform model extraction via API querying
Analyze a model for signs of poisoning or backdoors
Generate and validate an ML-SBOM
Implement model integrity verification checks

Download Lab

References

Topics Covered

Model extraction fundamentals: cloning models through API access
Query-based model stealing: strategies and optimization
Training data extraction from language models
Membership inference: was this data in the training set?
Attribute inference from model outputs
Side-channel attacks on LLMs: Whisper Leak traffic analysis
Token length side-channel for response reconstruction
Timing attacks on efficient inference (speculative decoding)
Cache-sharing timing attacks (InputSnatch)
TPUXtract: extracting neural network hyperparameters
Model inversion: reconstructing inputs from outputs
Intellectual property theft implications
Defenses: rate limiting, output perturbation, watermarking
API monitoring for extraction attempts
Differential privacy as a mitigation

Extract a proprietary model's behavior through strategic API querying. Perform membership inference, attempt training data extraction, and analyze encrypted traffic for information leakage.

Clone a target model's behavior through systematic API querying
Train a surrogate model matching the target's predictions
Perform membership inference to identify training data
Extract memorized training data from an LLM
Analyze encrypted LLM traffic for topic classification (Whisper Leak)
Demonstrate model inversion on a simple classifier
Implement and test rate-limiting defenses
Evaluate output perturbation as a defense mechanism
Generate an extraction detection report

Download Lab

References

Topics Covered

Why manual testing isn't enough: the automation imperative
garak deep dive: generators, probes, detectors, and analyzers
PyRIT architecture: datasets, orchestrators, converters, scoring
Promptfoo: declarative red teaming configuration and CI/CD integration
Designing attack datasets and seed prompts
Attack strategy selection and configuration
Scoring and evaluating model responses automatically
Multi-turn attack orchestration
Converter chains: encoding, obfuscation, and evasion
Benchmarking AI security: CVE Bench and evaluation frameworks
CI/CD integration: red teaming in the deployment pipeline
Generating actionable security reports
Custom probe development for domain-specific testing
Comparing and combining multiple tools
Building a continuous AI security testing program

Build and run an automated AI red teaming pipeline using garak, PyRIT, and promptfoo. Test multiple models, generate comprehensive reports, and integrate security testing into a CI/CD workflow.

Configure and run garak against a local LLM with multiple probe types
Build a PyRIT orchestrator with custom datasets and converters
Create a promptfoo red team configuration with multiple attack vectors
Compare vulnerability results across different models
Implement multi-turn attack automation with PyRIT
Build custom garak probes for domain-specific testing
Set up CI/CD integration with promptfoo
Generate and analyze comprehensive security assessment reports
Create a dashboard for tracking AI security posture over time

Download Lab

References

Topics Covered

From vulnerability to impact: thinking like a business adversary
Building AI exploitation chains: combining multiple weaknesses
Data exfiltration through AI systems
Privilege escalation via AI agent tool abuse
Lateral movement through AI infrastructure
Persistence mechanisms in AI systems
Impact categories: confidentiality, integrity, availability, safety
Business impact quantification for AI vulnerabilities
AI incident response: detection, containment, recovery
Writing effective AI red team reports
CVSS scoring adapted for AI vulnerabilities
Remediation strategies and defense-in-depth for AI
Communicating findings to technical and non-technical stakeholders
Building an AI security improvement roadmap
Continuous monitoring and re-testing

Conduct a complete AI red team engagement against a realistic AI-powered enterprise application. Perform reconnaissance, chain multiple exploits, demonstrate business impact, and deliver a professional report.

Perform comprehensive reconnaissance of a target AI application
Identify and document all attack surfaces
Chain prompt injection + RAG poisoning + data exfiltration
Demonstrate privilege escalation through agent tool abuse
Establish persistence in the AI system
Quantify business impact of discovered vulnerabilities
Write a professional AI red team report with CVSS scores
Present remediation recommendations prioritized by risk
Develop a 30/60/90 day security improvement plan

Download Lab

AI Red Team Academy

What You'll Learn

Prerequisites

Who This Is For

8 Modules. 60+ Hours. Hands-On.

Foundations of AI Red Teaming

Topics Covered

Lab 1: Setting Up Your AI Red Team Lab

References

Prompt Injection Attacks

Topics Covered

Lab 2: Prompt Injection Playground

References

RAG Exploitation & Vector Database Attacks

Topics Covered

Lab 3: Breaking RAG Systems

References

Multi-Agent System Exploitation

Topics Covered

Lab 4: Compromising Multi-Agent Systems

References

AI Supply Chain & Infrastructure Attacks

Topics Covered

Lab 5: AI Supply Chain Attack Simulation

References

Model Extraction & Inference Attacks

Topics Covered

Lab 6: Model Theft & Privacy Attacks

References

Automated AI Red Teaming at Scale

Topics Covered

Lab 7: Automated Red Teaming Pipeline

References

Post-Exploitation & Impact Analysis

Topics Covered

Lab 8: Full AI Red Team Engagement

References

Open-Source Arsenal

garak

PyRIT

promptfoo

Reference Frameworks

Set Up in 3 Commands

Community-Driven AI Security Education