Ratchet Score Specification
An open, citable standard for measuring code quality across 6 weighted categories and 9 programming languages. Aligned with ISO/IEC 25010 and NIST SSDF.
This document defines the Ratchet Score — a normalized, reproducible, language-aware measure of software code quality expressed as a dimensionless integer on a scale of 0 to 100. The score quantifies quality across six weighted categories: Testing (25%), Security (15%), TypeSafety (15%), ErrorHandling (20%), Performance (10%), and CodeQuality (15%). It is designed to be implementation-agnostic, cite-able in regulatory and compliance contexts, and suitable for automated quality gates, insurance underwriting, procurement evaluations, and developer tooling.
§1 Scope
1.1 What This Specification Covers
This specification defines the six scoring categories and their relative weights, the subcategory rubrics used to derive per-category scores, the normalization procedure that maps raw scores to a 0–100 final score, language-specific adaptations for nine supported languages, conformance levels for implementations and scored repositories, and the step-by-step scoring methodology a conformant implementation MUST follow.
1.2 What This Specification Does Not Cover
This specification does not address: execution correctness, runtime performance, functional completeness, business logic correctness, dependency vulnerability scanning (CVEs in third-party packages), or licensing compliance.
1.3 Supported Languages
§2 Normative References
The following documents are referenced normatively. In cases of conflict, this specification takes precedence.
| Reference | Title | Relevance |
|---|---|---|
| ISO/IEC 25010:2011 | Systems and software Quality Requirements and Evaluation (SQuaRE) | Quality characteristic taxonomy informing the Ratchet category model |
| NIST SP 800-218 (SSDF v1.1) | Secure Software Development Framework | Secure development practices against which Security and ErrorHandling are calibrated |
| CWE Top 25 (2023) | Common Weakness Enumeration Most Dangerous Weaknesses | Anti-patterns detected under the Security category |
| OWASP Top 10 (2021) | Open Web Application Security Project Top 10 | Input validation, secrets management, and authentication subcategories |
| RFC 2119 | Key words for use in RFCs to Indicate Requirement Levels | Normative language (MUST, SHOULD, MAY) used throughout |
§3 Terms and Definitions
3.1 Ratchet Score
The final normalized integer score on [0, 100] produced by applying the scoring model in §4. Higher is better.
3.2 Category
One of six top-level scoring dimensions: Testing, Security, TypeSafety, ErrorHandling, Performance, CodeQuality. Each has a defined maximum score (weight) and subcategories.
3.3 Subcategory
A measurable signal within a category. Subcategory maximum scores sum to the category maximum.
3.4 Raw Score
The integer score assigned to a subcategory, from 0 to the subcategory's maximum, computed by the rubric in §4.3.
3.5 Production Code
Source files that are not test files and not in non-production directories (scripts, migrations, fixtures, etc.). The primary input to all categories except Testing.
3.6 Comment-Stripped Source
A version of a source file with comments and string literals replaced by whitespace of equal length, used for pattern matching to eliminate false positives from commented-out code.
3.7 Conformant Implementation
A scoring engine that produces Ratchet Scores by following §7, applying rubrics in §4.3, and respecting language-specific provisions in §5.
§4 Scoring Model
4.1 Category Weights
The Ratchet Score is the sum of six category scores. The sum of all category maxima MUST equal 100.
ISO 25010 alignment: Testing maps to Reliability→Testability; Security maps to Security; TypeSafety maps to Reliability→Fault Tolerance; ErrorHandling maps to Reliability→Recoverability; Performance maps to Performance Efficiency; CodeQuality maps to Maintainability.
4.2 Subcategory Structure
| Category | Subcategory | Max |
|---|---|---|
| Testing (25) | Coverage Ratio | 8 |
| Edge Case Depth | 9 | |
| Test Quality | 8 | |
| Security (15) | Secrets & Env Vars | 3 |
| Input Validation | 6 | |
| Auth & Rate Limiting | 6 | |
| TypeSafety (15) | Strict Config | 7 |
| Type Density | 8 | |
| ErrorHandling (20) | Coverage | 8 |
| Empty Catches | 5 | |
| Structured Logging | 7 | |
| Performance (10) | Async Patterns | 3 |
| Console Cleanup | 5 | |
| Import Hygiene | 2 | |
| CodeQuality (15) | Function Length | 4 |
| Line Length | 4 | |
| Dead Code | 4 | |
| Duplication | 3 |
4.3 Selected Rubrics
Testing — Coverage Ratio (max: 8)
| Ratio (test files / prod files) | Score |
|---|---|
| R ≥ 1.0 | 8 |
| 0.75 ≤ R < 1.0 | 6 |
| 0.50 ≤ R < 0.75 | 4 |
| 0.25 ≤ R < 0.50 | 2 |
| 0 < R < 0.25 | 1 |
| R = 0 (no tests) | 0 |
Security — Secrets & Env Vars (max: 3)
| Condition | Score |
|---|---|
| No secrets detected | 3 |
| 1 potential secret detected | 1 |
| 2+ secrets detected | 0 |
ErrorHandling — Empty Catches (max: 5)
Counts silent failure patterns: empty catch blocks, bare except:, ignored error returns (_ = f()), unguarded .unwrap() calls.
| Silent failure count | Score |
|---|---|
| 0 | 5 |
| 1–2 | 4 |
| 3–5 | 3 |
| 6–10 | 2 |
| 11–20 | 1 |
| > 20 | 0 |
4.4 Score Normalization
The final Ratchet Score is computed as:
RatchetScore = Σ(category_score_i)
where i ∈ {Testing, Security, TypeSafety, ErrorHandling, Performance, CodeQuality}
and Σ(category_max_i) = 100
Because category maxima sum to 100, no additional scaling is required. The score is naturally bounded to [0, 100].
| Score | Grade | Label |
|---|---|---|
| 95–100 | A+ | Exceptional |
| 85–94 | A | Excellent |
| 70–84 | B | Good |
| 50–69 | C | Fair |
| 30–49 | D | Poor |
| 0–29 | F | Critical |
§5 Language-Specific Provisions
The Ratchet Score MUST be adapted to the idioms of the language under analysis. For categories not mentioned under a given language, the general rubrics in §4.3 apply.
TypeScript
TypeSafety Strict Config is evaluated against tsconfig.json: "strict": true → 7pts; noImplicitAny + strictNullChecks → 5pts; noImplicitAny alone → 3pts; TypeScript present, no flags → 1pt. Type Density measures any annotation density per 100 lines.
JavaScript
JavaScript receives a fixed TypeSafety score of 0/15. JavaScript is dynamically typed by design; scoring 0 accurately reflects the inherent risk without distorting other categories. Implementations SHOULD communicate this clearly.
Python
TypeSafety evaluated against pyrightconfig.json, mypy.ini, or pyproject.toml sections → up to 7pts. Bare except: (no exception type) MUST be counted as empty catch equivalents. print() calls are penalized unless a logging library is imported.
Go
Strict Config: 7/7 (Go compiler enforces types by design). Type Density measures interface{} / any usage. ErrorHandling counts if err != nil {} patterns; ignored errors (_ = f()) MUST be penalized as empty catches.
Rust
Strict Config: 7/7. Type Density: 8/8 (idiomatic Rust has no type escape hatches). .unwrap() and .expect() MUST be counted as empty catch equivalents (they panic on error). The ? operator is correct error propagation and MUST NOT be penalized.
Java
TypeSafety evaluated against Maven compiler plugin configuration or Gradle's sourceCompatibility. Async patterns detected via CompletableFuture, @Async, ExecutorService.
Kotlin
TypeSafety evaluated against build.gradle.kts; -Werror or allWarningsAsErrors → 7pts. Type Density measures !! (non-null assertion operator) usage — a null-safety escape hatch.
C# (CSharp)
TypeSafety evaluated against <Nullable>enable</Nullable> in .csproj → 7pts. Supports MSTest, NUnit, and xUnit test frameworks. Async patterns via async Task and async ValueTask.
PHP
TypeSafety evaluated against PHPStan (phpstan.neon) or Psalm (psalm.xml) → 7pts each; no static analysis → 1pt. Debug output includes var_dump(), print_r(), and dd() calls.
§6 Conformance Levels
Four graduated conformance levels indicate overall code quality, analogous to WCAG accessibility conformance levels (A/AA/AAA). They provide actionable quality targets rather than a binary pass/fail threshold.
6.1 Conformance Claims
A repository MAY claim a conformance level if a Ratchet Score produced by a conformant implementation meets the level's minimum score. Claims SHOULD include the specification version, score, date, implementation version, and git commit SHA.
This repository scored 87/100 under Ratchet Score Specification v1.0.0 (Ratchet Gold), assessed on 2026-03-27 using ratchet-run v2.4.1 at commit a3f9d2c.
6.2 Implementation Conformance
An implementation is conformant if it: (1) implements all six categories with specified maxima, (2) applies language-specific provisions in §5, (3) scores ≥90% of benchmark repositories within ±3 points of the reference implementation, (4) reports the specification version, and (5) discloses its methodology without undisclosed normalization factors.
§7 Scoring Methodology
A conformant implementation MUST follow these steps in order.
Step 1 — File Discovery
Recursively traverse the target directory. MUST exclude: node_modules, dist, .git, .next, build, coverage, __pycache__, .cache, vendor, out. MUST exclude non-production directories from production code analysis: scripts, migrations, seed, fixtures, examples, __mocks__. Files in .ratchetignore MUST be excluded. Only supported file extensions (§5) are included. Files MUST be separated into production and test sets using language-specific rules.
Step 2 — Content Loading & Preprocessing
Read all files into memory. For pattern matching, use comment-stripped source (comments and string literals replaced with whitespace of equal length). Use original source for line length and function length calculations.
Step 3 — Language Detection
Determine the primary language as the language with the most production source files. If multiple languages exist, apply language-specific rules per file. TypeSafety MUST use the language of each individual file.
Step 4 — Category Scoring
For each category: compute all subcategory raw scores using §4.3 rubrics and §5 provisions; sum to produce the category score; clamp to [0, category_maximum]. Scoring is performed over the production file set, except Testing which uses both sets.
Step 5 — Aggregation
Sum all six category scores. The result is naturally bounded to [0, 100].
Step 6 — Required Output
A conformant implementation MUST produce: final Ratchet Score; each category name, score, and maximum; each subcategory name, score, maximum, and summary; specification version (e.g., RSS v1.0.0); file counts; primary language detected.
A conformant implementation SHOULD also produce actionable remediation suggestions and file paths for each finding.
§8 Versioning
The specification uses Semantic Versioning 2.0.0 (MAJOR.MINOR.PATCH).
| Version Type | Trigger | Compatibility Guarantee |
|---|---|---|
| MAJOR | Breaking changes producing materially different scores | 90-day migration grace period; both versions reported during transition |
| MINOR | New subcategories, new language support, rubric clarifications | Scores MUST NOT decrease; new points only |
| PATCH | Bug fixes and clarifications with no intended scoring change | Fully backwards compatible |
Backwards Compatibility Guarantees
Within a MAJOR version: scores MUST NOT decrease due to specification updates alone; category maxima are immutable; the six core category names are immutable; conformance level thresholds are immutable. The reference implementation will support scoring under previous MAJOR versions for at least 24 months after a new MAJOR release.
A Reference Implementation
ratchet-run (npm)
The primary reference implementation is the ratchet-run npm package, the authoritative reference for resolving rubric ambiguities.
npx ratchet-run scan
Source: github.com/kcemate/ratchet
The reference implementation contains: the scoring engine (src/commands/scan.ts), language pattern libraries (src/core/language-rules.ts), shared constants (src/core/scan-constants.ts), and a test suite (2,247+ cases as of v1.0). Implementors building alternative conformant implementations SHOULD use the ratchet-oss test suite as a validation harness.
This specification is hosted canonically at ratchetcli.com/docs/specification and version-controlled in the docs/ directory of the ratchet-oss repository.
B Benchmark Dataset
Ratchet Code Quality Benchmark v0.1
The Ratchet Code Quality Benchmark v0.1 is a curated dataset of open-source repositories with human-validated Ratchet Scores, used to validate conformance of alternative implementations, calibrate rubric updates, and provide industry percentile baselines.
The benchmark includes repositories across all nine supported languages with scores ranging from 30 to 98, representing real-world codebases from the open-source ecosystem.
A conformant implementation MUST score ≥90% of benchmark repositories within ±3 points of the reference implementation. Differences of ±1 point due to whitespace normalization are not considered non-conformant.
Contact the Ratchet Standards Working Group for benchmark dataset access.
"Ratchet Score Specification v1.0.0 (RSS v1.0.0)" Ratchet Standards Working Group, March 2026. https://ratchetcli.com/docs/specification Licensed under CC BY 4.0.
Ratchet Score Specification v1.0.0 — 2026-03-27
© 2026 Ratchet. Licensed under Creative Commons Attribution 4.0 International.