Home How It Works Commands Scoring Vision Pricing Docs Sandbox GitHub
⚙ Case Study · Dogfooding

How Ratchet Improved Itself:
74 98

We ran Ratchet on Ratchet's own source code. Here's the real trajectory — including bugs we found in the scanner itself, the false positives it was generating, and the architect-level cleanup that got us to 98.

74
Starting score
98
Final score (v1.1.0)
12 days, Mar 13–25 2026
5 rollbacks auto-caught
1,725 tests passing
scanner bugs found in ourselves

$ the full timeline

Six inflection points over 12 days. The dip at Day 5–9 is the interesting one.

74
74
Baseline
Mar 13
start
83
83
Pino + rate
limiters · Mar 16
+9
85
85
Auth DRY +
errors · Mar 17
+2
86
86
Scanner fixed
Mar 22–23
+1
93
93
Webhook + file
classifier · Mar 24
+7
98
98
Architect cleanup
Mar 25
+5
< 75 — Needs work
75–89 — Good
90+ — Strong
Current
+24
Points gained (74→98)
1,725
Tests passing (89 files)
5
Rollbacks auto-caught
567
Duplicated lines eliminated

The Baseline: 74/100

First commit. We ran ratchet scan . on Ratchet's own source directory. Score: 74 out of 100.

The scan flagged what you'd expect from a fast-moving early codebase: overly broad rate limiters that treated all endpoints identically, unstructured console.* calls throughout the server, and a pattern-matching approach to security scanning that used regex without AST confirmation.

The score was accurate. That was the point — the tool wasn't going to flatter itself.

ratchet scan . — Mar 13
Ratchet Code Quality Scan ========================== Scanning ./ (ratchet source) Parsed TypeScript files Type checked (tsc --noEmit) Running detectors... Score Breakdown --------------- 🔒 Security 11/15 📝 TypeSafety 12/15 ⚠️ ErrorHandling 14/20 ⚡ Performance 9/10 📖 CodeQuality 11/15 🧪 Testing 17/25 Quality Score: 74 / 100 Top issues: [rate-limiter] overly broad — all routes same limit [logging] unstructured console.* calls [security] unconfirmed regex patterns

Structured Logging + Rate Limiters: 74 → 83

Two passes drove the first big jump. First: migrating all console.* calls to Pino — structured, leveled, machine-readable. A central logger.ts module, consistent log levels across the server.

Second: the rate limiter was applying a single broad limit to every route. Authentication endpoints, scan endpoints, and webhook endpoints all behave differently under load. We split the limiters by domain — stricter on auth, more permissive on read-heavy scan endpoints.

Both were real problems the tool correctly identified in itself. Neither fix was glamorous. Both moved the needle.

ratchet improve — Mar 16
Improvement: Structured logging ------------------------------- - Migrate console.* → pino logger - Create src/logger.ts (shared instance) - Update log calls across server files Applied · Tests passing Score: 7479 (+5) Improvement: Route-aware rate limiting -------------------------------------- - Split global limiter → per-domain limits - auth: strict (20 req/min) - scan: moderate (60 req/min) - webhooks: burst-tolerant Applied · Tests passing Score: 7983 (+4)

Auth Utils DRY + Error Handling: 83 → 85

Two targeted improvements. Auth utility functions had grown duplicated across the codebase — token validation logic repeated in multiple handlers rather than centralized. We extracted a shared auth utils module.

The mutation error handler was catching errors but re-throwing them with the original stack lost. Routes that modified state weren't producing useful error context on failure. A structured handleMutationError() helper unified the pattern.

Small increments. The kind that compound. The tool is good at finding them — and at applying them without touching unrelated code.

ratchet improve — Mar 17
Improvement: Auth utils DRY --------------------------- - Extract shared src/utils/auth.ts - Deduplicate token validation logic - 3 files updated, 1 new module Applied · TypeScript clean Score: 8384 (+1) Improvement: Mutation error handler ----------------------------------- - Add handleMutationError() utility - Preserve stack context on re-throw - Structured error shape for mutations Applied · Tests passing Score: 8485 (+1)

We Found Bugs in the Scanner: 85 → 86

The net change looks small (+1). What happened underneath was not.

The scanner had false positives. Example code in the repository — fake API keys used in documentation and test fixtures — was triggering the security detector. The regex-based patterns couldn't distinguish a literal example string from a real leaked secret.

More critically: the file classifier wasn't excluding documentation directories and test fixtures from production code analysis. Test files were being scored as production coverage, inflating the apparent test/source ratio. When we fixed it, some scores recalibrated downward before the real improvements took hold.

We replaced naive regex matching with AST confirmation: patterns now require a valid AST node context before firing. The file classifier gained production exclusion rules. Both changes made the tool more honest — and its scores more trustworthy.

scanner accuracy overhaul — Mar 22
# False positive found in security scanner Secret detector firing on example code: docs/examples/config.ts:12 API_KEY = "sk-example-not-real-1234" Pattern matched but not in production path. # Fix 1: AST confirmation - Require node context before flagging - regex match alone → insufficient - Must confirm: non-test, non-doc scope # Fix 2: File classifier - Exclude: docs/**, fixtures/**, examples/** - Production code only for scoring - Test ratio recalculated accurately False positives eliminated Score recalibrated (honest baseline) Score: 8586 (net +1, accuracy: significantly improved)

Webhook Verification + Security Push: 86 → 93

With the scanner now accurate, the remaining security points became visible. The webhook handler was accepting payloads without verifying the HMAC signature — a real security gap, not a false positive.

Adding verifyWebhookSignature() brought the security category from partial to near-complete. The file classifier also picked up additional production exclusion rules, further tightening the accuracy of the production code surface area.

This jump (+7) was the payoff from having fixed the scanner first. A less accurate scanner wouldn't have shown the real security gap — it would have been hidden behind noise.

ratchet scan . — Mar 24
Score Breakdown --------------- 🔒 Security 14/15 ← webhook sig added 📝 TypeSafety 15/15 maxed ⚠️ ErrorHandling 20/20 maxed ⚡ Performance 10/10 maxed 📖 CodeQuality 12/15 ← duplication remains 🧪 Testing 22/25 Quality Score: 93 / 100 Remaining gaps: Security -1 (minor: 1 input validation gap) Quality -3 (567 duplicated lines across helpers) Testing -3 (assertion density in 4 test files)

Architect Mode Finds What Clicks Miss: 93 → 98

The 567 duplicated lines were spread across shared engine helpers — similar patterns repeated across multiple files that individual click-by-click improvements had worked around but never eliminated. Each click improved something. None of them could see the full pattern.

Architect mode operates differently: it analyses the entire codebase graph first, identifies cross-file duplication, then generates a coordinated refactor. One pass. It extracted the shared helpers, updated all references, and removed the duplication cleanly.

That was the last 5 points. Combined with the security and testing work already done, the final score settled at 98/100 — perfect in 4 of 6 categories. This was the v1.1.0 release commit.

ratchet architect — Mar 25
Architect Analysis ------------------ Scanning cross-file patterns... Duplication cluster found: src/engine/scanner.ts (lines 44–91) src/engine/improve.ts (lines 12–58) src/engine/architect.ts (lines 77–124) Pattern: shared helper logic, 567 lines total Proposed: extract src/engine/helpers.ts - 3 files updated - 1 new shared module - Zero behavior change Approve? [y/N] y Extracted helpers.ts TypeScript compiles cleanly Tests passing (1725/1725) Score: 9398/100 +5

$ score breakdown: 98/100

Perfect in 4 of 6 categories. The remaining 2 points are in Testing (assertion density).

Production Readiness Score
98/100
🔒 Security15/15
📝 TypeSafety15/15
⚠️ ErrorHandling20/20
⚡ Performance10/10
📖 CodeQuality15/15
🧪 Testing23/25

Remaining 2 points: assertion density in 4 test files (threshold: 2.0 assertions/test average).

$ 6 lessons from running it on ourselves

These became product improvements. Each one was a real finding, not a hypothetical.

🔬
Fix your scanner before trusting your score
We had false positives: fake secrets in example code triggering the security detector, test fixtures inflating coverage ratios. The 85→86 step was mostly accuracy work. A score from an inaccurate scanner is worse than no score.
🌳
AST confirmation beats regex alone
Pattern matching without AST context generates noise. A string that looks like a secret in a documentation example is not a secret. Requiring a valid AST node context before flagging eliminated false positives without missing real issues.
📁
Score production code, not test fixtures
The file classifier had to learn what "production code" means in this codebase: not docs/**, not fixtures/**, not examples/**. Without that distinction, coverage ratios are meaningless. Getting it right took iteration.
🏗️
Architect mode sees what clicks can't
567 duplicated lines spread across three engine files were invisible to individual improvements — each click improved something adjacent. Architect mode analyzed the full graph, found the pattern, and eliminated it in one coordinated refactor.
🛡️
The guard system earned its keep
5 rollbacks over 12 days. Each one was a real problem caught before it reached main: import ordering issues, a removed null check depended on downstream, partial applies from concurrent edits. The guard is not overhead — it's the whole point.
📈
98 is achievable in 12 days
The score went from 74 to 98 on a real production codebase with 1,725 passing tests, zero broken builds, and every change reviewable in git. The last two points (Testing: 23/25) are assertion density — a known, bounded problem.

$ the complete run log

ratchet · Full dogfood summary
Run Summary — ratchet self-improvement (v1.1.0) ================================================ Metric Value ------ ----- Duration 12 days (Mar 13–25, 2026) Starting score 74 / 100 Final score 98 / 100 Net gain +24 pts Tests passing (final) 1725 / 1725 (89 files) Rollbacks auto-caught 5 Duplicated lines eliminated 567 False positives fixed yes (AST confirmation + file classifier) Build broken at any point never Release tag v1.1.0 Final category scores --------------------- 🔒 Security 15/15 perfect 📝 TypeSafety 15/15 perfect ⚠️ ErrorHandling 20/20 perfect ⚡ Performance 10/10 perfect 📖 CodeQuality 15/15 perfect 🧪 Testing 23/25 2 pts remaining (assertion density) Verdict: 74→98 is real. Every commit is in git. Every rollback was automatic. Build never broke.
Free scan · No credit card

Try Ratchet on your codebase

Get your score in under 60 seconds. See exactly what's holding you back — before you commit to anything.

npm install -g ratchet-run && ratchet scan
Start improving → Try the sandbox

Builder $19/mo · Pro $49/mo · BYOK (bring your own API key)