Ratchet Scan · March 31, 2026

Claude Code Scored
42 out of 100.

42/100

20,483 issues. Zero test files. 11,460 duplicate lines. 106 console.log calls in production. We scanned Anthropic's leaked Claude Code source with Ratchet. Here's every finding.

512,664 lines 1,891 files 20,483 issues Ratchet v1.1.1
42/100 Overall Score
0/25 Testing
20,483 Total Issues
11,460 Duplicate Lines
1,915 Try/Catch Blocks
106 console.log calls

Anthropic builds Claude. Claude builds software. Their engineers are, by most accounts, among the best in the industry. When the Claude Code source leaked via an exposed npm source map in March 2026, we did what any code quality tool would do: we ran ratchet scan on it.

The score: 42 out of 100.

Before you close this tab thinking it's a hit piece — it's not. The point isn't to embarrass Anthropic. The point is this: tech debt is structural, not personal. If the team building one of the most sophisticated AI coding assistants on the planet ships 11,000 duplicated lines and 106 console.log calls in production code, your team is in good company.

What is Ratchet? A CLI that scores production code 0–100 across six categories — testing, security, type safety, error handling, performance, and code quality — then autonomously fixes what it finds. No account required, no data leaves your machine, results in under a minute.

The Score Breakdown

Category Score Max Visual Key Finding
Testing 0 25
Zero test files in src/ — likely separate test repo
Security 10 15
Zod validation across 199 files, 15 flagged hardcoded secrets
Type Safety 9 15
Only 40 any types in 512K lines — impressive restraint
Error Handling 15 20
1,915 try/catch blocks, custom error classes throughout
Performance 3 10
106 console.logs, 18 await-in-loops
Code Quality 5 15
11,460 duplicate lines, 5,451 long lines, 111 TODOs
Total 42 100
20,483 issues across 1,891 files

Where Anthropic Actually Excels

Error Handling: 15/20 — Their Strongest Category

1,915 try/catch blocks across the codebase. Custom error classes. Clear defensive patterns around file system operations, API calls, and shell execution. For a tool that runs arbitrary code on user machines, this depth of error handling is the right call — and it shows.

Two empty catch blocks and 423 async functions without explicit error handling are the only real gaps. In a 512K-line codebase, those numbers are surprisingly restrained.

Type Safety + Security: Disciplined Where It Counts

Only 40 uses of any across 512,664 lines of TypeScript. That's one type escape per 12,000 lines. For a team moving this fast, that's remarkable restraint.

Zod validation across 199 files. Auth middleware. Rate limiting. The 9,409-line permission system alone demonstrates that security wasn't an afterthought. The 15 flagged "hardcoded secrets" are almost certainly test fixtures — but the scanner correctly flags them.

The takeaway: When you build a tool that executes real shell commands on real machines, you think hard about what happens when things go wrong. Anthropic did. Their error handling and security discipline are genuinely above average.

Where 42 Comes From

Testing: 0/25 — The Big Asterisk

The single biggest contributor to a 42 score is a 0 on testing — a 25-point penalty. But there's important context: Ratchet scanned the src/ directory extracted from the npm bundle. Tests almost certainly live in a separate directory or repository not included in the distributed package.

What ships is not what developers work with. Ratchet flags this accurately. If you scan production artifacts instead of your full repo root, you'll see the same pattern.

For your own scans: Run ratchet scan on your full repo root, not just the src/ dir, to get an accurate testing score.

Performance: 3/10 — The Honest Problem

106 console.log calls in production TypeScript. In a mature product used by millions of developers, these should be structured logging — levels, contexts, machine-parseable output. The most likely explanation: rapid prototyping that shipped. When you're moving fast on a competitive product, debug logging that works doesn't get cleaned up.

18 await-in-loop patterns serialize what could run in parallel. In an agent that constantly makes API calls, tool calls, and file system operations, serialized loops represent latency users feel in every interaction.

Code Quality: 5/15 — The Duplication Problem

11,460 repeated lines. This is the most structurally interesting finding. More than 11,000 lines across the codebase are near-verbatim duplicates — a signal that copy-paste drove implementation rather than shared abstractions.

When you have 43 tools, each implementing its own version of the same validation logic, the same configuration parsing, the same UI scaffold — and no one has time to abstract it because the next tool needs to ship tomorrow — this is what the codebase looks like. It's not laziness. It's compounding velocity pressure.

0
Test files in src/
106
console.log calls
11,460
Duplicate lines
111
TODOs left behind

Worst 5 Subsystems

Ratchet scanned 21 subsystems. Scores ranged from 39 to 55. Here are the five lowest — the subsystems where debt concentrated hardest.

Subsystem Score Files Weakest Category Worst Finding
types/ 39 11 Type Safety: 5/15 32 of 40 total any usages live here — type escape hatches propagate imprecision to every consumer
screens/ 40 3 Security: 5/15 Missing auth controls, weak error handling (7/20) — looks like a late addition with less review
plugins/ 41 2 Security: 2/15 No auth, no validation — experimental stub built without perimeter controls
voice/ 41 1 Security: 2/15 Single file, no auth middleware — another experimental module shipped without hardening
commands/ 41.5 189 Error Handling: 8.5/20 189 files, lowest error handling of any major subsystem — CLI commands failing silently

The pattern: the highest-scoring modules (bridge, keybindings — both 55) are small, focused interface layers. The lowest are either large surface-area modules (commands: 189 files) or experimental stubs (voice, plugins) shipped without the same rigor as core modules. Quality correlates with the criticality of the boundary being managed.

What 42 Actually Means

0–25 Critical
26–40 Poor
41–55 ← Claude Code
56–70 Moderate
71–85 Good
86–100 Excellent

A 42 puts Claude Code in the range typical of large, fast-moving production systems built by competent teams under competitive pressure. The bottom quartile (0–25) is where Ratchet finds real problems: empty catches everywhere, no type safety, hardcoded secrets in live paths. Claude Code isn't in that bucket.

Claude Code at 42 tells a specific story: a team that prioritized correctness and security (error handling, Zod validation, the 9,409-line permission system) over maintainability and hygiene (11K duplicated lines, console.logs, long lines). That's a reasonable set of tradeoffs for a product launching under competitive pressure.

The comparison that matters isn't Claude Code vs. some theoretical perfect codebase. It's Claude Code vs. your codebase. Run the scan. Find out your number. Then decide which tradeoffs you want to make differently.

Free to Start

What's your number?

Run Ratchet on your codebase. No account required, no data leaves your machine, results in under a minute.

$ npm install -g ratchet-run && ratchet scan

Get Started — It's Free →