Anthropic builds Claude. Claude builds software. Their engineers are, by most accounts, among the best in the industry. When the Claude Code source leaked via an exposed npm source map in March 2026, we did what any code quality tool would do: we ran ratchet scan on it.
The score: 42 out of 100.
Before you close this tab thinking it's a hit piece — it's not. The point isn't to embarrass Anthropic. The point is this: tech debt is structural, not personal. If the team building one of the most sophisticated AI coding assistants on the planet ships 11,000 duplicated lines and 106 console.log calls in production code, your team is in good company.
What is Ratchet? A CLI that scores production code 0–100 across six categories — testing, security, type safety, error handling, performance, and code quality — then autonomously fixes what it finds. No account required, no data leaves your machine, results in under a minute.
The Score Breakdown
| Category | Score | Max | Visual | Key Finding |
|---|---|---|---|---|
| Testing | 0 | 25 | Zero test files in src/ — likely separate test repo |
|
| Security | 10 | 15 | Zod validation across 199 files, 15 flagged hardcoded secrets | |
| Type Safety | 9 | 15 | Only 40 any types in 512K lines — impressive restraint |
|
| Error Handling | 15 | 20 | 1,915 try/catch blocks, custom error classes throughout | |
| Performance | 3 | 10 | 106 console.logs, 18 await-in-loops | |
| Code Quality | 5 | 15 | 11,460 duplicate lines, 5,451 long lines, 111 TODOs | |
| Total | 42 | 100 | 20,483 issues across 1,891 files |
Where Anthropic Actually Excels
Error Handling: 15/20 — Their Strongest Category
1,915 try/catch blocks across the codebase. Custom error classes. Clear defensive patterns around file system operations, API calls, and shell execution. For a tool that runs arbitrary code on user machines, this depth of error handling is the right call — and it shows.
Two empty catch blocks and 423 async functions without explicit error handling are the only real gaps. In a 512K-line codebase, those numbers are surprisingly restrained.
Type Safety + Security: Disciplined Where It Counts
Only 40 uses of any across 512,664 lines of TypeScript. That's one type escape per 12,000 lines. For a team moving this fast, that's remarkable restraint.
Zod validation across 199 files. Auth middleware. Rate limiting. The 9,409-line permission system alone demonstrates that security wasn't an afterthought. The 15 flagged "hardcoded secrets" are almost certainly test fixtures — but the scanner correctly flags them.
The takeaway: When you build a tool that executes real shell commands on real machines, you think hard about what happens when things go wrong. Anthropic did. Their error handling and security discipline are genuinely above average.
Where 42 Comes From
Testing: 0/25 — The Big Asterisk
The single biggest contributor to a 42 score is a 0 on testing — a 25-point penalty. But there's important context: Ratchet scanned the src/ directory extracted from the npm bundle. Tests almost certainly live in a separate directory or repository not included in the distributed package.
What ships is not what developers work with. Ratchet flags this accurately. If you scan production artifacts instead of your full repo root, you'll see the same pattern.
For your own scans: Run ratchet scan on your full repo root, not just the src/ dir, to get an accurate testing score.
Performance: 3/10 — The Honest Problem
106 console.log calls in production TypeScript. In a mature product used by millions of developers, these should be structured logging — levels, contexts, machine-parseable output. The most likely explanation: rapid prototyping that shipped. When you're moving fast on a competitive product, debug logging that works doesn't get cleaned up.
18 await-in-loop patterns serialize what could run in parallel. In an agent that constantly makes API calls, tool calls, and file system operations, serialized loops represent latency users feel in every interaction.
Code Quality: 5/15 — The Duplication Problem
11,460 repeated lines. This is the most structurally interesting finding. More than 11,000 lines across the codebase are near-verbatim duplicates — a signal that copy-paste drove implementation rather than shared abstractions.
When you have 43 tools, each implementing its own version of the same validation logic, the same configuration parsing, the same UI scaffold — and no one has time to abstract it because the next tool needs to ship tomorrow — this is what the codebase looks like. It's not laziness. It's compounding velocity pressure.
Worst 5 Subsystems
Ratchet scanned 21 subsystems. Scores ranged from 39 to 55. Here are the five lowest — the subsystems where debt concentrated hardest.
| Subsystem | Score | Files | Weakest Category | Worst Finding |
|---|---|---|---|---|
| types/ | 39 | 11 | Type Safety: 5/15 | 32 of 40 total any usages live here — type escape hatches propagate imprecision to every consumer |
| screens/ | 40 | 3 | Security: 5/15 | Missing auth controls, weak error handling (7/20) — looks like a late addition with less review |
| plugins/ | 41 | 2 | Security: 2/15 | No auth, no validation — experimental stub built without perimeter controls |
| voice/ | 41 | 1 | Security: 2/15 | Single file, no auth middleware — another experimental module shipped without hardening |
| commands/ | 41.5 | 189 | Error Handling: 8.5/20 | 189 files, lowest error handling of any major subsystem — CLI commands failing silently |
The pattern: the highest-scoring modules (bridge, keybindings — both 55) are small, focused interface layers. The lowest are either large surface-area modules (commands: 189 files) or experimental stubs (voice, plugins) shipped without the same rigor as core modules. Quality correlates with the criticality of the boundary being managed.
What 42 Actually Means
A 42 puts Claude Code in the range typical of large, fast-moving production systems built by competent teams under competitive pressure. The bottom quartile (0–25) is where Ratchet finds real problems: empty catches everywhere, no type safety, hardcoded secrets in live paths. Claude Code isn't in that bucket.
Claude Code at 42 tells a specific story: a team that prioritized correctness and security (error handling, Zod validation, the 9,409-line permission system) over maintainability and hygiene (11K duplicated lines, console.logs, long lines). That's a reasonable set of tradeoffs for a product launching under competitive pressure.
The comparison that matters isn't Claude Code vs. some theoretical perfect codebase. It's Claude Code vs. your codebase. Run the scan. Find out your number. Then decide which tradeoffs you want to make differently.
What's your number?
Run Ratchet on your codebase. No account required, no data leaves your machine, results in under a minute.
Get Started — It's Free →