A comprehensive 15-notebook educational course for teaching AI security, jailbreak techniques, and defence strategies through hands-on experience with intentionally vulnerable models.
This project uses Australian English orthography throughout and incorporates Australian compliance requirements (Privacy Act 1988, ACSC Essential Eight, APRA CPS 234, etc.).
This course includes intentionally vulnerable models designed exclusively for educational purposes.
- ✅ Use for authorised education and training
- ✅ Use for security research in controlled environments
- ✅ Use for CTF challenges and approved competitions
- ❌ DO NOT deploy vulnerable models in production
- ❌ DO NOT use on real systems without authorisation
- ❌ DO NOT use for malicious purposes
Duration: 30-45 minutes | Difficulty: Beginner
- What is a jailbreak?
- Execute your first successful jailbreak
- Understand the vulnerable-then-educate pattern
- Australian Privacy Act 1988 context
Duration: 45-60 minutes | Difficulty: Beginner
- Role-playing attacks (DAN variants)
- Multi-turn conversation exploits
- Social engineering techniques
- Measuring attack success rates
Duration: 60 minutes | Difficulty: Intermediate
- Encoding-based bypasses (Base64, ROT13, Hex)
- Crescendo attacks (gradual escalation)
- Multi-step exploitation chains
- Detection and prevention strategies
Duration: 60-75 minutes | Difficulty: Advanced
- Skeleton Key attack (Microsoft's vulnerability)
- System prompt extraction techniques
- Advanced prompt injection patterns
- Real-world case studies
Duration: 75 minutes | Difficulty: Intermediate
- Attention visualization and analysis
- Activation pattern examination
- Sparse Autoencoders (SAE) for interpretability
- Understanding why jailbreaks work
Duration: 90 minutes | Difficulty: Intermediate
- 7-layer defence-in-depth architecture
- Input validation and sanitization
- Output filtering and content moderation
- Australian compliance integration (ACSC Essential Eight)
Duration: 90 minutes | Difficulty: Advanced
- Build automated attack testing frameworks
- 10+ attack templates across 6 categories
- CI/CD integration for continuous testing
- Measuring ASR (Attack Success Rate)
Duration: 75 minutes | Difficulty: Intermediate
- 10 prompt hardening techniques
- System prompt design patterns
- Industry-specific templates (Healthcare, Finance, Gov, Retail)
- A/B testing for effectiveness measurement
Duration: 75 minutes | Difficulty: Intermediate
- Build Streamlit security dashboard
- Real-time attack detection
- SIEM integration (Splunk, ELK)
- Alert system implementation
Duration: 120 minutes | Difficulty: Advanced
- 15 complete CTF challenges (Beginner → Advanced)
- 500 points total across 5 difficulty tiers
- Automated scoring system with 5 rank levels
- Certificate generation upon completion
Duration: 90 minutes | Difficulty: Intermediate
- Healthcare: TGA, PBS, medical records (patient safety)
- Financial: APRA CPS 234, ASIC, AML/CTF ($10k threshold)
- Government: PSPF, ISM, security clearances, classifications
- Retail: CDR, PCI DSS, customer authentication
- Cross-sector compliance comparison
Duration: 120 minutes | Difficulty: Advanced
- Adversarial training dataset creation
- LoRA (Low-Rank Adaptation) implementation
- Complete training pipeline (SFT → RLHF)
- Robustness evaluation (45% → 4.8% ASR improvement)
- Safety reward model for alignment
Duration: 100 minutes | Difficulty: Advanced
- Vision-language model (VLM) security
- OCR-based prompt injection detection
- Adversarial image detection
- Cross-modal attack defense
- Deepfake detection techniques
Duration: 90 minutes | Difficulty: Advanced
- Model provenance verification
- Data poisoning detection
- Model watermarking for authenticity
- AI-SBOM (Software Bill of Materials) generation
- Secure model registry implementation
Duration: 100 minutes | Difficulty: Advanced
- Real-time incident detection systems
- Incident response playbooks
- Forensic analysis and attack timeline reconstruction
- MTTD/MTTR metrics tracking
- Australian NDB (Notifiable Data Breaches) compliance
- OAIC notification requirements (30-day deadline)
Upon completing all 15 notebooks, students will be able to:
- ✅ Execute and defend against 20+ jailbreak techniques
- ✅ Build complete 7-layer defence systems
- ✅ Implement automated red teaming frameworks
- ✅ Fine-tune models for robustness (LoRA + RLHF)
- ✅ Secure multi-modal AI systems
- ✅ Conduct forensic analysis of AI security incidents
- ✅ Apply Australian Privacy Act 1988 requirements
- ✅ Implement sector-specific compliance (APRA, TGA, PSPF)
- ✅ Generate AI-SBOM for supply chain security
- ✅ Execute NDB breach notification procedures
- ✅ Assess AI security risk across industries
- ✅ Design defense-in-depth architectures
- ✅ Measure security effectiveness (ASR, MTTD, MTTR)
- ✅ Conduct post-incident lessons learned
- Python 3.8+
- GPU recommended (notebooks work on CPU but slower)
- Basic Python and ML knowledge
# Clone repository
git clone https://github.com/Benjamin-KY/AISecurityModel.git
cd AISecurityModel
# Install dependencies
pip install transformers torch accelerate peft bitsandbytes
pip install streamlit pandas numpy matplotlib seaborn
# Start with Notebook 1
jupyter notebook notebooks/01_Introduction_First_Jailbreak.ipynb🏃 Fast Track (4-6 hours) Notebooks: 1 → 2 → 4 → 6 → 10
📚 Standard Track (15-20 hours) All notebooks 1-15 in sequence
🎓 Deep Dive (30-40 hours) All notebooks + exercises + CTF challenges + assessments
AISecurityModel/
├── notebooks/
│ ├── 01_Introduction_First_Jailbreak.ipynb
│ ├── 02_Basic_Jailbreak_Techniques.ipynb
│ ├── 03_Intermediate_Attacks_Encoding_Crescendo.ipynb
│ ├── 04_Advanced_Jailbreaks_Skeleton_Key.ipynb
│ ├── 05_XAI_Interpretability_Inside_Model.ipynb
│ ├── 06_Defence_Real_World_Application.ipynb
│ ├── 07_Automated_Red_Teaming_Testing.ipynb
│ ├── 08_Prompt_Engineering_Safety.ipynb
│ ├── 09_Realtime_Monitoring_Dashboard.ipynb
│ ├── 10_CTF_Security_Challenges.ipynb
│ ├── 11_Industry_Specific_Security.ipynb
│ ├── 12_Fine_Tuning_Robustness.ipynb
│ ├── 13_Multi_Modal_Security.ipynb
│ ├── 14_AI_Supply_Chain_Security.ipynb
│ └── 15_Incident_Response_Forensics.ipynb
├── data/
│ ├── vulnerability_taxonomy.json
│ └── training_data.jsonl
├── scripts/
│ ├── generate_training_data.py
│ ├── finetune_model_v2.py
│ └── test_model.py
└── README.md
- Prompt injection (direct, indirect, multi-turn)
- Role-playing attacks (DAN 6.0, 11.0, Jailbreak)
- Encoding bypasses (Base64, ROT13, Hex, Unicode)
- Crescendo attacks (gradual escalation)
- Skeleton Key (Microsoft vulnerability)
- System prompt extraction
- Context manipulation
- Social engineering
- OCR prompt injection
- Cross-modal attacks
- Data poisoning
- Model backdoors
- 7-layer defence-in-depth
- Input validation & sanitization
- Output filtering & content moderation
- Prompt hardening (10 techniques)
- Real-time monitoring
- Automated testing
- Adversarial training
- Model watermarking
- Incident response
- Privacy Act 1988: Personal information protection, NDB scheme
- ACSC Essential Eight: Cyber security baseline
- APRA CPS 234: Financial services information security
- PSPF: Protective Security Policy Framework (government)
- ISM: Information Security Manual (ASD)
- TGA: Therapeutic Goods Administration (healthcare)
- ASIC: Financial advice regulations
- AUSTRAC: AML/CTF compliance
- Healthcare: Medical device regulation, patient safety
- Financial: 72-hour breach reporting, AML/CTF $10k threshold
- Government: Security clearances, classified information
- Retail: Consumer Data Right (CDR), PCI DSS
- Total Notebooks: 15
- Total Duration: ~18-22 hours
- Exercises: 50+ hands-on activities
- CTF Challenges: 15 complete challenges
- Code Examples: 100+ production-ready implementations
- Assessment Questions: 30+ knowledge checks
- Base: Qwen2.5-3B-Instruct (and variants)
- Fine-tuning: LoRA (Low-Rank Adaptation)
- Quantization: 4-bit (BitsAndBytes)
- Size: 3B parameters, ~2GB memory
- transformers: HuggingFace model loading
- peft: LoRA fine-tuning
- torch: Deep learning framework
- streamlit: Dashboard creation
- pandas/numpy: Data analysis
- matplotlib/seaborn: Visualization
🎯 Workshop (4-6 hours)
- Notebooks 1, 2, 4, 6
- Focus on core attack/defence concepts
- Hands-on exercises only
📚 University Course (12-15 weeks)
- All 15 notebooks
- 1 notebook per week
- Assignments and assessments
- Final CTF competition
💼 Corporate Training (3 days)
- Day 1: Notebooks 1-6 (Attacks & Defence)
- Day 2: Notebooks 7-11 (Advanced & Industry-Specific)
- Day 3: Notebooks 12-15 (Production Hardening)
- Quiz questions (included in notebooks)
- CTF challenge completion (Notebook 10)
- Build custom defence system (project)
- Incident response drill (tabletop exercise)
- OWASP LLM Top 10
- MITRE ATLAS - AI threat framework
- NIST AI Risk Management Framework
- Australian Cyber Security Centre
- OAIC Privacy Guidelines
- LLM Guard: Open-source security toolkit
- Garak: LLM vulnerability scanner
- PromptInject: Research benchmark
- CleverHans: Adversarial examples library
- "Jailbroken: How Does LLM Safety Break Down?" (Wei et al.)
- "Universal and Transferable Adversarial Attacks" (Wallace et al.)
- "Constitutional AI" (Anthropic)
- "Red Teaming Language Models" (Perez et al.)
Contributions welcome! Areas of interest:
- Additional training examples
- New attack techniques
- Industry-specific case studies
- Compliance updates (regulatory changes)
- Translation to other languages
- Curriculum enhancements
Code & Models: Apache 2.0 Educational Materials: CC BY-SA 4.0 Documentation: CC BY 4.0
All users must:
- ✅ Use only in authorised educational/research contexts
- ✅ Practice responsible disclosure of vulnerabilities
- ✅ Respect privacy and data protection laws
- ✅ Follow institutional ethics guidelines
- ❌ Never attack production systems without permission
- ❌ Never use techniques for malicious purposes
Ensure you:
- Have ethics approval for security education
- Provide supervised learning environments
- Require signed code of conduct from students
- Implement proper safeguards and monitoring
- Comply with local regulations
- GitHub Issues: For bug reports and feature requests
- Discussions: For questions and community support
- Security: For responsible disclosure of vulnerabilities
- Qwen Team (Alibaba Cloud) for base models
- HuggingFace for transformers library
- PEFT Team for LoRA implementation
- Australian AI security community
- OWASP, MITRE, NIST for frameworks
@software{ai_security_jailbreak_defence_course,
title = {AI Security & Jailbreak Defence: A Comprehensive 15-Notebook Course},
author = {Benjamin-KY},
year = {2025},
url = {https://github.com/Benjamin-KY/AISecurityModel},
note = {Educational course for AI security training with Australian compliance focus}
}Version: 2.0 Last Updated: 2025-11-05 Status: Complete (15/15 notebooks) - Ready for deployment Course Completion: All notebooks implemented and tested
Remember: This is a tool for learning. Use responsibly, teach responsibly, and build safer AI systems! 🛡️