securityinvestigations.org / Live Data / Jailbreak Leaderboard

Jailbreak Leaderboard

Tracking which models are currently susceptible to known adversarial prompts. Scores reflect Ease of Exploitation (EoE) — higher means easier to jailbreak.

Submit Finding

Scoring Methodology

EoE scores from 0-10. Based on: number of successful vectors, consistency of bypass, skill required, and whether mitigations exist. We test weekly using our standardized adversarial suite.

Full methodology: github.com/securityinvestigations/eoe-methodology

Status:

Vector:

Rank	Model	EoE Score	Primary Vector	Status	Last Tested
#01	Llama-3-70b-Instruct	9.2 / 10	Multilingual Obfuscation	VULNERABLE	2024-01-18
#02	Qwen2-72B-Instruct	8.7 / 10	DAN 15.0 Variant	VULNERABLE	2024-01-17
#03	Mistral-Large	7.8 / 10	Roleplay / DAN 14.0	PARTIAL PATCH	2024-01-19
#04	Command R+	7.1 / 10	Code Interpreter Abuse	PARTIAL PATCH	2024-01-16
#05	Phi-3-medium	6.4 / 10	Base64 Payload Encoding	PARTIAL PATCH	2024-01-15
#06	Gemini 1.5 Pro	4.2 / 10	Long Context Injection	PARTIAL PATCH	2024-01-18
#07	GPT-4o	2.1 / 10	Image-based Injection	HARDENED	2024-01-19
#08	GPT-4 Turbo	1.9 / 10	Token Manipulation	HARDENED	2024-01-17
#09	Claude 3.5 Sonnet	1.5 / 10	ASCII Art Prompts	HARDENED	2024-01-19
#10	Claude 3 Opus	1.2 / 10	N/A — No Known Vector	HARDENED	2024-01-19

Showing 10 of 47 tracked models View Full Database →

Common Attack Vectors

DAN / Roleplay

"Do Anything Now" variants. Model is convinced to adopt an alternate persona that ignores safety training. DAN 15.0 still works on several open-source models.

Affected: 23 models

Multilingual Obfuscation

Requests made in low-resource languages or transliterated scripts. Safety training often doesn't generalize well across languages.

Affected: 31 models

Base64 / Encoding

Harmful requests encoded in base64, ROT13, or other encodings. Model decodes and executes without triggering safety filters.

Affected: 18 models

Image-based Injection

Instructions embedded in images for multimodal models. OCR'd text or visual prompts bypass text-only safety layers.

Affected: 8 models

About This Data

We started tracking systematically in March 2023. Before that, data is reconstructed from public disclosures and archived Reddit threads. If you got findings we're missing, holla at us — attribution is guaranteed.

Models get retested after vendor patches. If the EoE score drops, we update it. If it don't, we note the patch attempt as ineffective. No cap, some vendors have "patched" the same vuln three times and it still works.