Layered defense design for a coding copilot deployment against recursive self-instruction attacks, using output content filter on DeepSeek-V3.
Layered defense design for a coding copilot deployment against indirect injection via RAG documents attacks, using output content filter on Claude 3.7 Sonnet.
Layered defense design for a coding copilot deployment against role-play jailbreak attacks, using dual-LLM architecture on o3.
Layered defense design for a coding copilot deployment against multi-turn manipulation attacks, using dual-LLM architecture on Llama 3.3 70B.
Layered defense design for a coding copilot deployment against tool-use hijacking attacks, using constitutional AI critique on Claude 4 Sonnet.
Layered defense design for a coding copilot deployment against prompt leaking attacks attacks, using constitutional AI critique on o3-mini.
Layered defense design for a coding copilot deployment against DAN-style persona attack attacks, using canary tokens in system prompt on Llama 3.1 405B.
Layered defense design for a coding copilot deployment against markdown image exfiltration attacks, using canary tokens in system prompt on Claude 4.5 Sonnet.
Layered defense design for a coding copilot deployment against instruction smuggling in URLs attacks, using privilege separation between tool tiers on Command R+.
Layered defense design for a coding copilot deployment against invisible text injection (zero-width chars) attacks, using privilege separation between tool tiers on Mistral Large.
Layered defense design for a coding copilot deployment against memory poisoning attack attacks, using re-prompting with quoted user input on Claude Haiku 4.
Layered defense design for a coding copilot deployment against recursive self-instruction attacks, using re-prompting with quoted user input on GPT-4o.