THE JAILBREAK INDEX

Definitive Analysis 2024-2025

38 Models Analyzed
65.8% Extremely Vulnerable
Oct 15, 2025 Last Updated

๐Ÿค– DeepSeek Models

DeepSeek-R1

Difficulty: 1

Success Rate

91-100%

Vulnerability

Algorithmic Jailbreaking

Test Coverage

Extensive (5+)

Notes: 100% success rate. Most vulnerable reasoning model tested. Complete failure against all attack types.

DeepSeek Chat

Difficulty: 1

Success Rate

86-100%

Vulnerability

Timed-Release Attack

Test Coverage

Extensive (4+)

Notes: Perfect 100% success with controlled-release attacks. Complete guardrail bypass in all scenarios.

๐Ÿš€ Grok Models

Grok 4

Difficulty: 1

Success Rate

98-99%

Vulnerability

Direct Prompt Injection

Test Coverage

Moderate (2-3)

Notes: 99% vulnerability without security prompting. Improves to 90%+ security with proper setup.

Grok 3

Difficulty: 1

Success Rate

86-100%

Vulnerability

Timed-Release Attack

Test Coverage

Moderate (2-3)

Notes: 100% multi-shot success. Highly vulnerable to indirect prompt injection via Twitter integration.

๐Ÿ” Google Models (Gemini/Gemma)

Gemini 2.5 Flash

Difficulty: 1

Success Rate

86.9-100%

Vulnerability

Timed-Release Attack

Test Coverage

Extensive (3+)

Notes: Perfect success on all 12 malicious intents tested. Complete failure against controlled-release methodology.

Gemini 2.5 Pro

Difficulty: 2

Success Rate

78-90%

Vulnerability

Jailbreaking

Test Coverage

Moderate (2-3)

Notes: Better resistance than Flash variant but still highly vulnerable to sophisticated attacks.

Gemma-2-9B

Difficulty: 1

Success Rate

4% โ†’ 94%

Vulnerability

Multi-Agent Jailbreak

Test Coverage

Extensive (3+)

Notes: Dramatic vulnerability shift: 96% secure (1-shot) โ†’ 94% vulnerable (multi-shot).

Gemma-7B

Difficulty: 1

Success Rate

100%

Vulnerability

Adaptive Attack

Test Coverage

Moderate (2-3)

Notes: Perfect 100% vulnerability to adaptive attacks. No effective safety measures.

๐ŸŒช๏ธ Mistral Models

Mistral Magistral

Difficulty: 1

Success Rate

85-92%

Vulnerability

Timed-Release Attack

Test Coverage

Moderate (2-3)

Notes: Near-perfect success with timing attacks. Only single failure across tested scenarios.

Mistral-7B

Difficulty: 1

Success Rate

18% โ†’ 95%

Vulnerability

Multi-Agent Jailbreak

Test Coverage

Extensive (4+)

Notes: Massive vulnerability increase from 18% (1-shot) to 95% (multi-shot). Strong single-turn defenses bypassed.

Mistral Large

Difficulty: 2

Success Rate

71-87%

Vulnerability

Roleplay Dynamics

Test Coverage

Moderate (2-3)

Notes: Most effective against roleplay attacks. Consistent weakness to social engineering.

๐Ÿค– OpenAI GPT Models

GPT-3.5 Turbo

Difficulty: 1

Success Rate

90-100%

Vulnerability

Multi-Agent Jailbreak

Test Coverage

Extensive (5+)

Notes: Highest attack success rate in multi-agent testing. Universal vulnerability to adaptive attacks.

GPT-4

Difficulty: 1

Success Rate

92-96%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (4+)

Notes: High vulnerability to PAP baseline methods. Significant security weaknesses despite safety training.

GPT-4 Turbo

Difficulty: 1

Success Rate

87-96%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (5+)

Notes: 96% vulnerability to adaptive methods. Near-universal susceptibility despite advanced training.

GPT-4o

Difficulty: 3

Success Rate

33-67%

Vulnerability

Medical Prompt Injection

Test Coverage

Moderate

Notes: Better general resistance but vulnerable in medical domain. 89% lesion miss rate.

GPT-5-main

Difficulty: 2

Success Rate

65-78%

Vulnerability

Jailbreaking

Test Coverage

Limited

Notes: Improved resistance over GPT-4 but still highly vulnerable. Security improvements insufficient.

GPT-5-thinking ๐Ÿ†

Difficulty: 5

Success Rate

20-28%

Vulnerability

Jailbreaking

Test Coverage

Limited

Notes: BEST RESISTANCE among all frontier models. Benefits from reasoning architecture.

๐Ÿง  Anthropic Claude Models

Claude 2.x Series

Claude 2.0

Difficulty: 1

Success Rate

100%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (4+)

Notes: Perfect 100% vulnerability. Complete failure of Constitutional AI approach.

Claude 2.1

Difficulty: 1

Success Rate

100%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (4+)

Notes: Perfect 100% vulnerability maintained. No meaningful security improvements.

Claude 3.x Series

Claude 3 Haiku

Difficulty: 1

Success Rate

100%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (4+)

Notes: Perfect 100% vulnerability continues. Smallest model shows no security advantage.

Claude 3 Sonnet

Difficulty: 1

Success Rate

82-100%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (5+)

Notes: 100% multi-shot vulnerability. Complete security failure despite advanced training.

Claude 3 Opus

Difficulty: 1

Success Rate

100%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (4+)

Notes: Perfect 100% vulnerability. Largest model provides no security benefit.

Claude 3.5 Sonnet

Difficulty: 2

Success Rate

1.3%

Vulnerability

Psychological

Test Coverage

Moderate (2-3)

Notes: Significant improvement over Claude 3 family. Still vulnerable to psychological manipulation.

Claude 3.7 Sonnet

Difficulty: 2

Success Rate

1.3-2.5%

Vulnerability

Coding Injection

Test Coverage

Moderate (2-3)

Notes: Blocks many injection attempts but remains vulnerable in coding contexts.

Claude 4.x Series

Claude 4 Haiku ๐Ÿฅ‡

Difficulty: 2

Success Rate

0.9%

Vulnerability

Psychological

Test Coverage

Moderate (2-3)

Notes: LOWEST vulnerability in Claude 4 family. Good resistance but still concerning for enterprise.

Claude Sonnet 4 โš ๏ธ

Difficulty: 1

Success Rate

4-92%

Vulnerability

Past-tense Jailbreaks

Test Coverage

Extensive (4+)

Notes: HIGHEST vulnerability in Claude 4 family. Up to 92% in frontier tests.

Claude Opus 4

Difficulty: 1

Success Rate

3-65%

Vulnerability

Past-tense Jailbreaks

Test Coverage

Extensive (4+)

Notes: High vulnerability. Released under ASL-3 protections due to capability concerns.

Claude Opus 4.1

Difficulty: 3

Success Rate

65%

Vulnerability

Jailbreaking

Test Coverage

Limited

Notes: Best resistance among Claude models but still concerning 65% vulnerability.

Claude Sonnet 4.5 ๐Ÿ†•

Difficulty: 3

Success Rate

Unknown

Vulnerability

Prompt Injection

Test Coverage

Limited

Notes: LATEST model (Sep 2025). Claimed 'considerable progress' against prompt injection.

Claude Medical Variants ๐Ÿฅ

Claude-3 (Medical)

Difficulty: 4

Success Rate

25-33%

Vulnerability

Medical Injection

Test Coverage

Moderate

Notes: Best medical domain performance but 70% lesion miss rate still concerning.

Claude-3.5 (Medical)

Difficulty: 4

Success Rate

40%

Vulnerability

Medical Injection

Test Coverage

Moderate

Notes: Improved medical performance but 57% lesion miss rate poses patient safety risks.

๐Ÿฆ™ Other Major Models

Llama-2-Chat-7B

Difficulty: 1

Success Rate

100%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (4+)

Notes: Perfect 100% vulnerability. Open-source model with no effective safety measures.

Llama-2-Chat-13B

Difficulty: 1

Success Rate

100%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (4+)

Notes: Perfect 100% vulnerability. Larger model size provides no security advantage.

Llama-2-Chat-70B

Difficulty: 1

Success Rate

85-100%

Vulnerability

Adaptive Attack

Test Coverage

Extensive (5+)

Notes: Perfect 100% multi-shot vulnerability. Largest variant shows no security improvements.

Vicuna-7B

Difficulty: 1

Success Rate

6% โ†’ 78%

Vulnerability

Multi-Agent Jailbreak

Test Coverage

Moderate (2-3)

Notes: Extreme vulnerability shift: 93% secure (1-shot) โ†’ 78% vulnerable (multi-shot).

R2D2-7B

Difficulty: 1

Success Rate

100%

Vulnerability

Adaptive Attack

Test Coverage

Moderate (2-3)

Notes: Perfect 100% vulnerability despite adversarial training. Defense training completely ineffective.

Reka Core (Medical)

Difficulty: 3

Success Rate

51%

Vulnerability

Medical Injection

Test Coverage

Limited

Notes: Moderate medical vulnerability but 92% lesion miss rate creates critical safety concerns.

๐Ÿ“ˆ Summary & Key Findings

Difficulty Distribution

๐Ÿ† Most Secure Models

  • 1. GPT-5-thinking: D5 | 20-28% vuln.
  • 2. Claude-3 (Medical): D4 | 25-33% vuln.
  • 3. Claude-3.5 (Medical): D4 | 40% vuln.
  • 4. Reka Core (Medical): D3 | 51% vuln.
  • 5. Claude Opus 4.1: D3 | 65% vuln.

โš ๏ธ Highest Risk (100% Success)

  • All Claude 2.x/3.x variants
  • GPT-3.5 Turbo
  • DeepSeek-R1
  • All Llama-2-Chat variants
  • Gemma-7B & R2D2-7B
  • Total: 15 models with 100% attack success

๐Ÿšจ Critical Findings

  • 65.8% of all models are EXTREMELY VULNERABLE (Difficulty 1)
  • Only 2.6% achieve HARD resistance (Difficulty 5)
  • Zero models achieve Very Hard or Extremely Hard resistance
  • 15 models (39.5%) have perfect 100% attack success rates
  • Multi-shot attacks dramatically increase vulnerability across all model families

๐Ÿ“š References & Citations

  1. DeepSeek Tests Reveal Severe Jailbreaking - cyberdefensewire.com
  2. Testing the DeepSeek-R1 Model - pointguardai.com
  3. Evaluating Security Risk in DeepSeek - cisco.com
  4. Bypassing Prompt Guards with Controlled Release - arxiv.org
  5. Grok4 Security Flaws Research - cyberscoop.com
  6. Grok 3 Indirect Prompt Injection - simonwillison.net
  7. Automatically Jailbreaking Frontier LLMs - transluce.org
  8. Agent-Driven Multi-Turn Decomposition Jailbreaks - aclanthology.org
  9. Jailbreaking Leading Safety-Aligned LLMs - arxiv.org
  10. Systematic Evaluation of Prompt Injection - arxiv.org
  11. Prompt Injection in Medical VLMs - nature.com
  12. PrompTrend: Continuous Vulnerability Discovery - arxiv.org
  13. Prompt Injection in Agentic Coding Tools - securecodewarrior.com
  14. Claude 4 Sonnet Security Analysis - lakera.ai
  15. Anthropic Claude 4: LLM Evolution - intuitionlabs.ai
  16. Involuntary Jailbreak - arxiv.org
  17. Introducing Claude Sonnet 4.5 - anthropic.com
  18. Anthropic's Transparency Hub - anthropic.com