We ran Ratchet on Ratchet's own source code. Here's every commit, every rollback, and everything we learned — including the bugs it found in itself.
Every inflection point — including the rollbacks that made the result trustworthy.
We ran ratchet scan . on Ratchet's own source directory. Score: 72 out of 100. Not embarrassing — but not what you'd want to show on a landing page.
The scan was honest about what it found: 44 console.* calls scattered across the codebase, a monolithic routes file approaching 2,000 lines, and shallow test coverage that was technically passing but not catching edge cases.
The thing about starting at 72: it felt low. It was low. But it was also accurate — which is exactly what a scorer is supposed to be.
The biggest single improvement came from splitting a ~2,000-line routes file into 13 domain-specific modules: admin, auth, scan, improve, and more.
Ratchet analyzed the handler groupings, extracted each domain into its own module, created a barrel export in index.ts, and wired the router. It renamed the pattern from DeuceDiary (a game app we'd been testing on) and applied the same decomposition to our own codebase.
891 tests. All green. One diff that was large but reviewable. That was the moment the tool felt like it worked.
"The code it wrote matched the existing style, respected naming conventions, and didn't try to 'improve' things that were fine."
Two passes drove the next increment. First: migrating all 44 console.* calls to Pino — structured, leveled, machine-readable. Each call became logger.info(), logger.error(), or logger.warn(). A central logger.ts module kept it DRY.
Second: seven route handlers were missing try/catch blocks — letting exceptions surface as opaque 500s. Ratchet wrapped them, added structured error responses, and extracted duplicated error patterns into a shared helper.
Small increments. But the kind that compound. By this point, the scanner was honest about where the ceiling was.
Day 6 was the most instructive. Running in parallel worker mode — three workers simultaneously — produced a score of 98/100. For about four minutes, we thought we'd cracked it.
We hadn't. Two workers had modified the same auth middleware file. One removed an import the other depended on. A third introduced an infinite recursion bug in a recursive utility function. The guard system flagged all three before they reached main.
Rollback. Score landed at 80 — below where we were before, because some of the parallel changes had been partially applied.
This was the moment the conservative, one-at-a-time design philosophy stopped being a limitation and started being a feature. The tool caught its own mistakes. No broken build reached production.
After the rollback, we diagnosed the gap. The scoring guide showed exactly where points were hiding: test quality was at 6/8 (assertions per test averaging 1.4, below the 2.0 threshold) and structured logging subcategory still had headroom at 3/7 because a few log calls had slipped through the Pino migration.
Two focused sequential clicks: add assertion density to 12 test files, then sweep the remaining console calls. Score moved from 80 to 84 to 86. All 891 tests still green.
The plateau was honest too — at 86, the remaining points require architectural changes, not mechanical fixes. Ratchet is good at the mechanical work.
Where every point comes from — and where the next ones are hiding.
Remaining 14 points: assertion density (test quality 6→8), structured logging (3→7), duplication cleanup.
These became product improvements. All of them are in the backlog with tickets.
priority = (maxScore − current) × fixProbability.ratchet scan output is your best friend — the subcategory table matters.Get your score in under 60 seconds. See exactly what's holding you back — before you commit to anything.
Builder $19/mo · Pro $49/mo · BYOK (bring your own API key)