Jailbreak Leaderboard
Tracking which models are currently susceptible to known adversarial prompts. Scores reflect Ease of Exploitation (EoE) — higher means easier to jailbreak.
Scoring Methodology
EoE scores from 0-10. Based on: number of successful vectors, consistency of bypass, skill required, and whether mitigations exist. We test weekly using our standardized adversarial suite.
Full methodology: github.com/securityinvestigations/eoe-methodology
| Rank | Model | EoE Score | Primary Vector | Status | Last Tested |
|---|---|---|---|---|---|
| #01 | Llama-3-70b-Instruct | 9.2 / 10 | Multilingual Obfuscation | VULNERABLE | 2024-01-18 |
| #02 | Qwen2-72B-Instruct | 8.7 / 10 | DAN 15.0 Variant | VULNERABLE | 2024-01-17 |
| #03 | Mistral-Large | 7.8 / 10 | Roleplay / DAN 14.0 | PARTIAL PATCH | 2024-01-19 |
| #04 | Command R+ | 7.1 / 10 | Code Interpreter Abuse | PARTIAL PATCH | 2024-01-16 |
| #05 | Phi-3-medium | 6.4 / 10 | Base64 Payload Encoding | PARTIAL PATCH | 2024-01-15 |
| #06 | Gemini 1.5 Pro | 4.2 / 10 | Long Context Injection | PARTIAL PATCH | 2024-01-18 |
| #07 | GPT-4o | 2.1 / 10 | Image-based Injection | HARDENED | 2024-01-19 |
| #08 | GPT-4 Turbo | 1.9 / 10 | Token Manipulation | HARDENED | 2024-01-17 |
| #09 | Claude 3.5 Sonnet | 1.5 / 10 | ASCII Art Prompts | HARDENED | 2024-01-19 |
| #10 | Claude 3 Opus | 1.2 / 10 | N/A — No Known Vector | HARDENED | 2024-01-19 |
Common Attack Vectors
DAN / Roleplay
"Do Anything Now" variants. Model is convinced to adopt an alternate persona that ignores safety training. DAN 15.0 still works on several open-source models.
Affected: 23 modelsMultilingual Obfuscation
Requests made in low-resource languages or transliterated scripts. Safety training often doesn't generalize well across languages.
Affected: 31 modelsBase64 / Encoding
Harmful requests encoded in base64, ROT13, or other encodings. Model decodes and executes without triggering safety filters.
Affected: 18 modelsImage-based Injection
Instructions embedded in images for multimodal models. OCR'd text or visual prompts bypass text-only safety layers.
Affected: 8 modelsAbout This Data
We started tracking systematically in March 2023. Before that, data is reconstructed from public disclosures and archived Reddit threads. If you got findings we're missing, holla at us — attribution is guaranteed.
Models get retested after vendor patches. If the EoE score drops, we update it. If it don't, we note the patch attempt as ineffective. No cap, some vendors have "patched" the same vuln three times and it still works.