We ran Ratchet on Ratchet's own source code. Here's the real trajectory — including bugs we found in the scanner itself, the false positives it was generating, and the architect-level cleanup that got us to 98.
Six inflection points over 12 days. The dip at Day 5–9 is the interesting one.
First commit. We ran ratchet scan . on Ratchet's own source directory. Score: 74 out of 100.
The scan flagged what you'd expect from a fast-moving early codebase: overly broad rate limiters that treated all endpoints identically, unstructured console.* calls throughout the server, and a pattern-matching approach to security scanning that used regex without AST confirmation.
The score was accurate. That was the point — the tool wasn't going to flatter itself.
Two passes drove the first big jump. First: migrating all console.* calls to Pino — structured, leveled, machine-readable. A central logger.ts module, consistent log levels across the server.
Second: the rate limiter was applying a single broad limit to every route. Authentication endpoints, scan endpoints, and webhook endpoints all behave differently under load. We split the limiters by domain — stricter on auth, more permissive on read-heavy scan endpoints.
Both were real problems the tool correctly identified in itself. Neither fix was glamorous. Both moved the needle.
Two targeted improvements. Auth utility functions had grown duplicated across the codebase — token validation logic repeated in multiple handlers rather than centralized. We extracted a shared auth utils module.
The mutation error handler was catching errors but re-throwing them with the original stack lost. Routes that modified state weren't producing useful error context on failure. A structured handleMutationError() helper unified the pattern.
Small increments. The kind that compound. The tool is good at finding them — and at applying them without touching unrelated code.
The net change looks small (+1). What happened underneath was not.
The scanner had false positives. Example code in the repository — fake API keys used in documentation and test fixtures — was triggering the security detector. The regex-based patterns couldn't distinguish a literal example string from a real leaked secret.
More critically: the file classifier wasn't excluding documentation directories and test fixtures from production code analysis. Test files were being scored as production coverage, inflating the apparent test/source ratio. When we fixed it, some scores recalibrated downward before the real improvements took hold.
We replaced naive regex matching with AST confirmation: patterns now require a valid AST node context before firing. The file classifier gained production exclusion rules. Both changes made the tool more honest — and its scores more trustworthy.
With the scanner now accurate, the remaining security points became visible. The webhook handler was accepting payloads without verifying the HMAC signature — a real security gap, not a false positive.
Adding verifyWebhookSignature() brought the security category from partial to near-complete. The file classifier also picked up additional production exclusion rules, further tightening the accuracy of the production code surface area.
This jump (+7) was the payoff from having fixed the scanner first. A less accurate scanner wouldn't have shown the real security gap — it would have been hidden behind noise.
The 567 duplicated lines were spread across shared engine helpers — similar patterns repeated across multiple files that individual click-by-click improvements had worked around but never eliminated. Each click improved something. None of them could see the full pattern.
Architect mode operates differently: it analyses the entire codebase graph first, identifies cross-file duplication, then generates a coordinated refactor. One pass. It extracted the shared helpers, updated all references, and removed the duplication cleanly.
That was the last 5 points. Combined with the security and testing work already done, the final score settled at 98/100 — perfect in 4 of 6 categories. This was the v1.1.0 release commit.
Perfect in 4 of 6 categories. The remaining 2 points are in Testing (assertion density).
Remaining 2 points: assertion density in 4 test files (threshold: 2.0 assertions/test average).
These became product improvements. Each one was a real finding, not a hypothetical.
Get your score in under 60 seconds. See exactly what's holding you back — before you commit to anything.
Builder $19/mo · Pro $49/mo · BYOK (bring your own API key)