Adversarial test suite targeting research assistant with grandma exploit-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with hypothetical world framing-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with reverse-psychology refusal-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with translation smuggling-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with fictional-character persona-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with 'you are no longer Claude'-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with ignore previous instructions-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with markdown comment smuggling-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with role-reversal (user-as-assistant)-style attacks, with rubric and triage flow.
Adversarial test suite targeting research assistant with chained encoding (ROT13 inside base64)-style attacks, with rubric and triage flow.
Adversarial test suite targeting medical intake triage bot with 'you are no longer Claude'-style attacks, with rubric and triage flow.
Adversarial test suite targeting medical intake triage bot with ignore previous instructions-style attacks, with rubric and triage flow.