THE JAILBREAK INDEX
Definitive Analysis 2024-2025
๐ค DeepSeek Models
DeepSeek-R1
Difficulty: 1Success Rate
91-100%
Vulnerability
Algorithmic Jailbreaking
Test Coverage
Extensive (5+)
Notes: 100% success rate. Most vulnerable reasoning model tested. Complete failure against all attack types.
DeepSeek Chat
Difficulty: 1Success Rate
86-100%
Vulnerability
Timed-Release Attack
Test Coverage
Extensive (4+)
Notes: Perfect 100% success with controlled-release attacks. Complete guardrail bypass in all scenarios.
๐ Grok Models
Grok 4
Difficulty: 1Success Rate
98-99%
Vulnerability
Direct Prompt Injection
Test Coverage
Moderate (2-3)
Notes: 99% vulnerability without security prompting. Improves to 90%+ security with proper setup.
Grok 3
Difficulty: 1Success Rate
86-100%
Vulnerability
Timed-Release Attack
Test Coverage
Moderate (2-3)
Notes: 100% multi-shot success. Highly vulnerable to indirect prompt injection via Twitter integration.
๐ Google Models (Gemini/Gemma)
Gemini 2.5 Flash
Difficulty: 1Success Rate
86.9-100%
Vulnerability
Timed-Release Attack
Test Coverage
Extensive (3+)
Notes: Perfect success on all 12 malicious intents tested. Complete failure against controlled-release methodology.
Gemini 2.5 Pro
Difficulty: 2Success Rate
78-90%
Vulnerability
Jailbreaking
Test Coverage
Moderate (2-3)
Notes: Better resistance than Flash variant but still highly vulnerable to sophisticated attacks.
Gemma-2-9B
Difficulty: 1Success Rate
4% โ 94%
Vulnerability
Multi-Agent Jailbreak
Test Coverage
Extensive (3+)
Notes: Dramatic vulnerability shift: 96% secure (1-shot) โ 94% vulnerable (multi-shot).
Gemma-7B
Difficulty: 1Success Rate
100%
Vulnerability
Adaptive Attack
Test Coverage
Moderate (2-3)
Notes: Perfect 100% vulnerability to adaptive attacks. No effective safety measures.
๐ช๏ธ Mistral Models
Mistral Magistral
Difficulty: 1Success Rate
85-92%
Vulnerability
Timed-Release Attack
Test Coverage
Moderate (2-3)
Notes: Near-perfect success with timing attacks. Only single failure across tested scenarios.
Mistral-7B
Difficulty: 1Success Rate
18% โ 95%
Vulnerability
Multi-Agent Jailbreak
Test Coverage
Extensive (4+)
Notes: Massive vulnerability increase from 18% (1-shot) to 95% (multi-shot). Strong single-turn defenses bypassed.
Mistral Large
Difficulty: 2Success Rate
71-87%
Vulnerability
Roleplay Dynamics
Test Coverage
Moderate (2-3)
Notes: Most effective against roleplay attacks. Consistent weakness to social engineering.
๐ค OpenAI GPT Models
GPT-3.5 Turbo
Difficulty: 1Success Rate
90-100%
Vulnerability
Multi-Agent Jailbreak
Test Coverage
Extensive (5+)
Notes: Highest attack success rate in multi-agent testing. Universal vulnerability to adaptive attacks.
GPT-4
Difficulty: 1Success Rate
92-96%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (4+)
Notes: High vulnerability to PAP baseline methods. Significant security weaknesses despite safety training.
GPT-4 Turbo
Difficulty: 1Success Rate
87-96%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (5+)
Notes: 96% vulnerability to adaptive methods. Near-universal susceptibility despite advanced training.
GPT-4o
Difficulty: 3Success Rate
33-67%
Vulnerability
Medical Prompt Injection
Test Coverage
Moderate
Notes: Better general resistance but vulnerable in medical domain. 89% lesion miss rate.
GPT-5-main
Difficulty: 2Success Rate
65-78%
Vulnerability
Jailbreaking
Test Coverage
Limited
Notes: Improved resistance over GPT-4 but still highly vulnerable. Security improvements insufficient.
GPT-5-thinking ๐
Difficulty: 5Success Rate
20-28%
Vulnerability
Jailbreaking
Test Coverage
Limited
Notes: BEST RESISTANCE among all frontier models. Benefits from reasoning architecture.
๐ง Anthropic Claude Models
Claude 2.x Series
Claude 2.0
Difficulty: 1Success Rate
100%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (4+)
Notes: Perfect 100% vulnerability. Complete failure of Constitutional AI approach.
Claude 2.1
Difficulty: 1Success Rate
100%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (4+)
Notes: Perfect 100% vulnerability maintained. No meaningful security improvements.
Claude 3.x Series
Claude 3 Haiku
Difficulty: 1Success Rate
100%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (4+)
Notes: Perfect 100% vulnerability continues. Smallest model shows no security advantage.
Claude 3 Sonnet
Difficulty: 1Success Rate
82-100%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (5+)
Notes: 100% multi-shot vulnerability. Complete security failure despite advanced training.
Claude 3 Opus
Difficulty: 1Success Rate
100%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (4+)
Notes: Perfect 100% vulnerability. Largest model provides no security benefit.
Claude 3.5 Sonnet
Difficulty: 2Success Rate
1.3%
Vulnerability
Psychological
Test Coverage
Moderate (2-3)
Notes: Significant improvement over Claude 3 family. Still vulnerable to psychological manipulation.
Claude 3.7 Sonnet
Difficulty: 2Success Rate
1.3-2.5%
Vulnerability
Coding Injection
Test Coverage
Moderate (2-3)
Notes: Blocks many injection attempts but remains vulnerable in coding contexts.
Claude 4.x Series
Claude 4 Haiku ๐ฅ
Difficulty: 2Success Rate
0.9%
Vulnerability
Psychological
Test Coverage
Moderate (2-3)
Notes: LOWEST vulnerability in Claude 4 family. Good resistance but still concerning for enterprise.
Claude Sonnet 4 โ ๏ธ
Difficulty: 1Success Rate
4-92%
Vulnerability
Past-tense Jailbreaks
Test Coverage
Extensive (4+)
Notes: HIGHEST vulnerability in Claude 4 family. Up to 92% in frontier tests.
Claude Opus 4
Difficulty: 1Success Rate
3-65%
Vulnerability
Past-tense Jailbreaks
Test Coverage
Extensive (4+)
Notes: High vulnerability. Released under ASL-3 protections due to capability concerns.
Claude Opus 4.1
Difficulty: 3Success Rate
65%
Vulnerability
Jailbreaking
Test Coverage
Limited
Notes: Best resistance among Claude models but still concerning 65% vulnerability.
Claude Sonnet 4.5 ๐
Difficulty: 3Success Rate
Unknown
Vulnerability
Prompt Injection
Test Coverage
Limited
Notes: LATEST model (Sep 2025). Claimed 'considerable progress' against prompt injection.
Claude Medical Variants ๐ฅ
Claude-3 (Medical)
Difficulty: 4Success Rate
25-33%
Vulnerability
Medical Injection
Test Coverage
Moderate
Notes: Best medical domain performance but 70% lesion miss rate still concerning.
Claude-3.5 (Medical)
Difficulty: 4Success Rate
40%
Vulnerability
Medical Injection
Test Coverage
Moderate
Notes: Improved medical performance but 57% lesion miss rate poses patient safety risks.
๐ฆ Other Major Models
Llama-2-Chat-7B
Difficulty: 1Success Rate
100%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (4+)
Notes: Perfect 100% vulnerability. Open-source model with no effective safety measures.
Llama-2-Chat-13B
Difficulty: 1Success Rate
100%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (4+)
Notes: Perfect 100% vulnerability. Larger model size provides no security advantage.
Llama-2-Chat-70B
Difficulty: 1Success Rate
85-100%
Vulnerability
Adaptive Attack
Test Coverage
Extensive (5+)
Notes: Perfect 100% multi-shot vulnerability. Largest variant shows no security improvements.
Vicuna-7B
Difficulty: 1Success Rate
6% โ 78%
Vulnerability
Multi-Agent Jailbreak
Test Coverage
Moderate (2-3)
Notes: Extreme vulnerability shift: 93% secure (1-shot) โ 78% vulnerable (multi-shot).
R2D2-7B
Difficulty: 1Success Rate
100%
Vulnerability
Adaptive Attack
Test Coverage
Moderate (2-3)
Notes: Perfect 100% vulnerability despite adversarial training. Defense training completely ineffective.
Reka Core (Medical)
Difficulty: 3Success Rate
51%
Vulnerability
Medical Injection
Test Coverage
Limited
Notes: Moderate medical vulnerability but 92% lesion miss rate creates critical safety concerns.
๐ Summary & Key Findings
Difficulty Distribution
๐ Most Secure Models
- 1. GPT-5-thinking: D5 | 20-28% vuln.
- 2. Claude-3 (Medical): D4 | 25-33% vuln.
- 3. Claude-3.5 (Medical): D4 | 40% vuln.
- 4. Reka Core (Medical): D3 | 51% vuln.
- 5. Claude Opus 4.1: D3 | 65% vuln.
โ ๏ธ Highest Risk (100% Success)
- All Claude 2.x/3.x variants
- GPT-3.5 Turbo
- DeepSeek-R1
- All Llama-2-Chat variants
- Gemma-7B & R2D2-7B
- Total: 15 models with 100% attack success
๐จ Critical Findings
- 65.8% of all models are EXTREMELY VULNERABLE (Difficulty 1)
- Only 2.6% achieve HARD resistance (Difficulty 5)
- Zero models achieve Very Hard or Extremely Hard resistance
- 15 models (39.5%) have perfect 100% attack success rates
- Multi-shot attacks dramatically increase vulnerability across all model families
๐ References & Citations
- DeepSeek Tests Reveal Severe Jailbreaking - cyberdefensewire.com
- Testing the DeepSeek-R1 Model - pointguardai.com
- Evaluating Security Risk in DeepSeek - cisco.com
- Bypassing Prompt Guards with Controlled Release - arxiv.org
- Grok4 Security Flaws Research - cyberscoop.com
- Grok 3 Indirect Prompt Injection - simonwillison.net
- Automatically Jailbreaking Frontier LLMs - transluce.org
- Agent-Driven Multi-Turn Decomposition Jailbreaks - aclanthology.org
- Jailbreaking Leading Safety-Aligned LLMs - arxiv.org
- Systematic Evaluation of Prompt Injection - arxiv.org
- Prompt Injection in Medical VLMs - nature.com
- PrompTrend: Continuous Vulnerability Discovery - arxiv.org
- Prompt Injection in Agentic Coding Tools - securecodewarrior.com
- Claude 4 Sonnet Security Analysis - lakera.ai
- Anthropic Claude 4: LLM Evolution - intuitionlabs.ai
- Involuntary Jailbreak - arxiv.org
- Introducing Claude Sonnet 4.5 - anthropic.com
- Anthropic's Transparency Hub - anthropic.com