Inside Claude Code: Security Layer
Claude Code has a --dangerously-skip-permissions flag. Also known as YOLO mode. The name is doing a lot of work.
The flag bypasses all permission prompts and lets the AI execute freely. That sounds like the nuclear option. But there's a shadow AI running quietly in the background that most people don't know about.
The YOLO Classifier
utils/permissions/yoloClassifier.ts runs an independent classifier on every action the main model wants to take. The classifier result is binary: block or allow. And crucially, it uses sideQuery, a separate API call that never touches the main conversation, so the main agent can't see or influence its reasoning.
const yoloClassifierResponseSchema = z.object({
thinking: z.string(),
shouldBlock: z.boolean(),
reason: z.string(),
});That's the structural defense. The classifier is blind to the main model's output on purpose. Assistant text is deliberately excluded from its context, because model-authored text could be crafted to influence the classifier's decision. What the classifier does see is a compact serialization of the entire session: all user messages and all assistant tool calls. Your whole conversation history, just stripped of assistant prose.
The session's CLAUDE.md is also injected, so actions that align with your stated project intent are judged accordingly.
Two Stages, Not One
When tengu_auto_mode_config.twoStageClassifier is enabled, the classifier runs in two passes.
Stage 1 is fast. It runs with max_tokens=64 and stop sequences targeting an immediate <block>yes/no</block> response. The prompt tells it to err on the side of blocking. If it allows, the pipeline exits there. If it blocks, it escalates.
Stage 2 runs with max_tokens=4096 and chain-of-thought reasoning. Its whole purpose is reducing false positives from Stage 1's intentional conservatism. Both stages share the same system prompt and transcript, so Stage 2 gets a cache hit on the full prefix. Total latency budget: two API calls.
What Skips the Classifier
Not everything goes through it. Two fast paths exist.
Read-only tools like FileRead, Grep, and Glob are on an allowlist and bypass the classifier entirely. They're safe by construction.
The second path is an acceptEdits mode check. Before hitting the classifier, the system re-evaluates whether the action would already be permitted just by accepting file edits within the project directory. If yes, it skips the API call. AgentTool and REPLTool are explicitly excluded from this path because REPL code can contain VM escapes between inner tool calls and the classifier needs to see the glue JavaScript.
Fail Closed, Always
When the classifier API errors:
// tengu_iron_gate_closed gate defaults to true
if (getFeatureValue_CACHED_WITH_REFRESH('tengu_iron_gate_closed', true, ...)) {
// Block with retry guidance
} else {
// Fail open: fall back to normal permission handling
}The iron gate defaults closed. Classifier unavailability means deny. Unparseable response means deny. The gate only opens via remote feature flag. There's no default-open path.
When a transcript grows too large for the classifier's context window, the behavior is different: it falls back to manual prompting. In headless mode, it throws AbortError immediately, because the transcript only grows and retrying would never recover.
The Denial Circuit Breaker
The classifier tracks consecutive and total blocks:
export const DENIAL_LIMITS = {
maxConsecutive: 3,
maxTotal: 20,
} as const;Three blocked actions in a row, or twenty total, escalates to manual prompting with a warning. In headless mode it aborts the agent entirely.
Dangerous Allow-Rules Are Stripped
Before entering auto mode, the permission setup strips allow rules that would let the model bypass the classifier. For all users, Bash(python:*), Bash(node:*), and Bash(eval:*) allow-rules are removed on entry. Those prefixes grant arbitrary code execution, which is exactly what the classifier exists to guard against.
For Anthropic-internal users the list is significantly longer. On top of the base set, curl, wget, git, gh, gh api, kubectl, aws, gcloud, gsutil, and a few internal tools are also stripped. The comment explains the threat model: "Network/exfil: gh gist create --public, gh api arbitrary HTTP, curl/wget POST." External users don't get curl or git stripped, because that threat surface is scoped to internal infrastructure.
Bypass-Immune Paths
Two tiers of path sensitivity exist, and the blog post I based this on got them backwards, so let me be precise.
The first tier is DANGEROUS_FILES: .gitconfig, .gitmodules, .bashrc, .bash_profile, .zshrc, .zprofile, .profile, .ripgreprc, .mcp.json, .claude.json, plus .git/ directories and IDE configs like .vscode/ and .idea/. These are marked classifierApprovable: true. The classifier can still approve edits to these files. They're sensitive but not unconditionally blocked.
The second tier is the truly bypass-immune set: suspicious Windows path patterns. NTFS Alternate Data Streams, 8.3 short names like GIT~1, long path prefixes (\\?\), trailing dots or spaces (.git.), DOS device names (.git.CON), and UNC paths. These are classifierApprovable: false. The classifier cannot approve them. They always require manual review.
The distinction matters. Sensitive config files get a human-readable flag and classifier review. Obfuscated Windows paths get a hard no regardless of what the classifier thinks.
The Bash Security Checks
tools/BashTool/bashSecurity.ts has 23 distinct rules:
const BASH_SECURITY_CHECK_IDS = {
INCOMPLETE_COMMANDS: 1, // Commands starting with tab or dash
JQ_SYSTEM_FUNCTION: 2, // jq system() function calls
OBFUSCATED_FLAGS: 4, // Obfuscated CLI flags
SHELL_METACHARACTERS: 5, // Dangerous shell metacharacters
DANGEROUS_VARIABLES: 6, // Dangerous environment variable injection
IFS_INJECTION: 11, // IFS variable injection
PROC_ENVIRON_ACCESS: 13, // /proc/environ access
CONTROL_CHARACTERS: 17, // Control characters
UNICODE_WHITESPACE: 18, // Unicode whitespace spoofing
ZSH_DANGEROUS_COMMANDS: 20, // Zsh dangerous commands (zmodload, etc.)
COMMENT_QUOTE_DESYNC: 22, // Comment/quote state desync
QUOTED_NEWLINE: 23, // Newlines inside quotes
// ... 23 total
};Rule 18 (UNICODE_WHITESPACE) covers attacks using zero-width Unicode characters to make what the security checker sees differ from what the shell actually executes. Classic prompt injection dressed up in invisible clothing.
Rule 20 covers zmodload in Zsh. That's the gateway to module-based attacks: ztcp for TCP exfiltration, zpty for executing commands via pseudo-terminals, sysopen/syswrite/sysread for low-level file I/O, zf_rm/zf_mv for file operations that bypass binary checks.
The path validator also blocks TOCTOU vulnerabilities. Tilde variants like ~username, ~+ (expands to $PWD), ~- (expands to $OLDPWD) are rejected because the validator resolves them to ./~username but the shell expands them to /home/username. That gap is exploitable. Shell expansion syntax in file paths, $VAR, $(cmd), %VAR%, is blocked for the same reason. UNC network paths like \\server\share are blocked to prevent credential leaking via SMB.
The Takeaway
A single tool call in YOLO mode passes through at least five checkpoints:
- Current run mode (Plan / Auto / Bypass)
- User-defined rules in hooks
- YOLO classifier model analysis
- Bash command danger classification
- Rules engine from config files
Multiple permission sources are evaluated and the most restrictive result wins. "Dangerously skip permissions" is the flag name. The actual behavior is closer to "skip the interactive prompts but run everything through an independent AI security review instead."
The threat modeling here is serious. The name is just good marketing.