This document defines the Ratchet Score — a normalized, reproducible, language-aware measure of software code quality expressed as a dimensionless integer on a scale of 0 to 100. The score quantifies quality across six weighted categories: Testing (25%), Security (15%), TypeSafety (15%), ErrorHandling (20%), Performance (10%), and CodeQuality (15%). It is designed to be implementation-agnostic, cite-able in regulatory and compliance contexts, and suitable for automated quality gates, insurance underwriting, procurement evaluations, and developer tooling.

§1 Scope

1.1 What This Specification Covers

This specification defines the six scoring categories and their relative weights, the subcategory rubrics used to derive per-category scores, the normalization procedure that maps raw scores to a 0–100 final score, language-specific adaptations for nine supported languages, conformance levels for implementations and scored repositories, and the step-by-step scoring methodology a conformant implementation MUST follow.

1.2 What This Specification Does Not Cover

This specification does not address: execution correctness, runtime performance, functional completeness, business logic correctness, dependency vulnerability scanning (CVEs in third-party packages), or licensing compliance.

1.3 Supported Languages

TypeScript JavaScript Python Go Rust Java Kotlin C# PHP

§2 Normative References

The following documents are referenced normatively. In cases of conflict, this specification takes precedence.

ReferenceTitleRelevance
ISO/IEC 25010:2011 Systems and software Quality Requirements and Evaluation (SQuaRE) Quality characteristic taxonomy informing the Ratchet category model
NIST SP 800-218 (SSDF v1.1) Secure Software Development Framework Secure development practices against which Security and ErrorHandling are calibrated
CWE Top 25 (2023) Common Weakness Enumeration Most Dangerous Weaknesses Anti-patterns detected under the Security category
OWASP Top 10 (2021) Open Web Application Security Project Top 10 Input validation, secrets management, and authentication subcategories
RFC 2119 Key words for use in RFCs to Indicate Requirement Levels Normative language (MUST, SHOULD, MAY) used throughout

§3 Terms and Definitions

3.1 Ratchet Score

The final normalized integer score on [0, 100] produced by applying the scoring model in §4. Higher is better.

3.2 Category

One of six top-level scoring dimensions: Testing, Security, TypeSafety, ErrorHandling, Performance, CodeQuality. Each has a defined maximum score (weight) and subcategories.

3.3 Subcategory

A measurable signal within a category. Subcategory maximum scores sum to the category maximum.

3.4 Raw Score

The integer score assigned to a subcategory, from 0 to the subcategory's maximum, computed by the rubric in §4.3.

3.5 Production Code

Source files that are not test files and not in non-production directories (scripts, migrations, fixtures, etc.). The primary input to all categories except Testing.

3.6 Comment-Stripped Source

A version of a source file with comments and string literals replaced by whitespace of equal length, used for pattern matching to eliminate false positives from commented-out code.

3.7 Conformant Implementation

A scoring engine that produces Ratchet Scores by following §7, applying rubrics in §4.3, and respecting language-specific provisions in §5.

§4 Scoring Model

4.1 Category Weights

The Ratchet Score is the sum of six category scores. The sum of all category maxima MUST equal 100.

25
Testing
max: 25 pts
20
Error Handling
max: 20 pts
15
Security
max: 15 pts
15
Type Safety
max: 15 pts
15
Code Quality
max: 15 pts
10
Performance
max: 10 pts

ISO 25010 alignment: Testing maps to Reliability→Testability; Security maps to Security; TypeSafety maps to Reliability→Fault Tolerance; ErrorHandling maps to Reliability→Recoverability; Performance maps to Performance Efficiency; CodeQuality maps to Maintainability.

4.2 Subcategory Structure

CategorySubcategoryMax
Testing (25)Coverage Ratio8
Edge Case Depth9
Test Quality8
Security (15)Secrets & Env Vars3
Input Validation6
Auth & Rate Limiting6
TypeSafety (15)Strict Config7
Type Density8
ErrorHandling (20)Coverage8
Empty Catches5
Structured Logging7
Performance (10)Async Patterns3
Console Cleanup5
Import Hygiene2
CodeQuality (15)Function Length4
Line Length4
Dead Code4
Duplication3

4.3 Selected Rubrics

Testing — Coverage Ratio (max: 8)

Ratio (test files / prod files)Score
R ≥ 1.08
0.75 ≤ R < 1.06
0.50 ≤ R < 0.754
0.25 ≤ R < 0.502
0 < R < 0.251
R = 0 (no tests)0

Security — Secrets & Env Vars (max: 3)

ConditionScore
No secrets detected3
1 potential secret detected1
2+ secrets detected0

ErrorHandling — Empty Catches (max: 5)

Counts silent failure patterns: empty catch blocks, bare except:, ignored error returns (_ = f()), unguarded .unwrap() calls.

Silent failure countScore
05
1–24
3–53
6–102
11–201
> 200

4.4 Score Normalization

The final Ratchet Score is computed as:

RatchetScore = Σ(category_score_i)
  where i ∈ {Testing, Security, TypeSafety, ErrorHandling, Performance, CodeQuality}
  and Σ(category_max_i) = 100

Because category maxima sum to 100, no additional scaling is required. The score is naturally bounded to [0, 100].

ScoreGradeLabel
95–100A+Exceptional
85–94AExcellent
70–84BGood
50–69CFair
30–49DPoor
0–29FCritical

§5 Language-Specific Provisions

The Ratchet Score MUST be adapted to the idioms of the language under analysis. For categories not mentioned under a given language, the general rubrics in §4.3 apply.

TypeScript

TypeSafety Strict Config is evaluated against tsconfig.json: "strict": true → 7pts; noImplicitAny + strictNullChecks → 5pts; noImplicitAny alone → 3pts; TypeScript present, no flags → 1pt. Type Density measures any annotation density per 100 lines.

JavaScript

JavaScript receives a fixed TypeSafety score of 0/15. JavaScript is dynamically typed by design; scoring 0 accurately reflects the inherent risk without distorting other categories. Implementations SHOULD communicate this clearly.

Python

TypeSafety evaluated against pyrightconfig.json, mypy.ini, or pyproject.toml sections → up to 7pts. Bare except: (no exception type) MUST be counted as empty catch equivalents. print() calls are penalized unless a logging library is imported.

Go

Strict Config: 7/7 (Go compiler enforces types by design). Type Density measures interface{} / any usage. ErrorHandling counts if err != nil {} patterns; ignored errors (_ = f()) MUST be penalized as empty catches.

Rust

Strict Config: 7/7. Type Density: 8/8 (idiomatic Rust has no type escape hatches). .unwrap() and .expect() MUST be counted as empty catch equivalents (they panic on error). The ? operator is correct error propagation and MUST NOT be penalized.

Java

TypeSafety evaluated against Maven compiler plugin configuration or Gradle's sourceCompatibility. Async patterns detected via CompletableFuture, @Async, ExecutorService.

Kotlin

TypeSafety evaluated against build.gradle.kts; -Werror or allWarningsAsErrors → 7pts. Type Density measures !! (non-null assertion operator) usage — a null-safety escape hatch.

C# (CSharp)

TypeSafety evaluated against <Nullable>enable</Nullable> in .csproj → 7pts. Supports MSTest, NUnit, and xUnit test frameworks. Async patterns via async Task and async ValueTask.

PHP

TypeSafety evaluated against PHPStan (phpstan.neon) or Psalm (psalm.xml) → 7pts each; no static analysis → 1pt. Debug output includes var_dump(), print_r(), and dd() calls.

§6 Conformance Levels

Four graduated conformance levels indicate overall code quality, analogous to WCAG accessibility conformance levels (A/AA/AAA). They provide actionable quality targets rather than a binary pass/fail threshold.

Bronze
50+
Basic test coverage, no critical secrets, some error handling
Silver
70+
Good coverage (≥50% ratio), validation present, structured logging
Gold
85+
Strong coverage (≥75% ratio), strict types, comprehensive error handling
Platinum
95+
Near-perfect across all categories; exemplary engineering practice

6.1 Conformance Claims

A repository MAY claim a conformance level if a Ratchet Score produced by a conformant implementation meets the level's minimum score. Claims SHOULD include the specification version, score, date, implementation version, and git commit SHA.

Example Conformance Statement
This repository scored 87/100 under Ratchet Score Specification v1.0.0
(Ratchet Gold), assessed on 2026-03-27 using ratchet-run v2.4.1
at commit a3f9d2c.

6.2 Implementation Conformance

An implementation is conformant if it: (1) implements all six categories with specified maxima, (2) applies language-specific provisions in §5, (3) scores ≥90% of benchmark repositories within ±3 points of the reference implementation, (4) reports the specification version, and (5) discloses its methodology without undisclosed normalization factors.

§7 Scoring Methodology

A conformant implementation MUST follow these steps in order.

Step 1 — File Discovery

Recursively traverse the target directory. MUST exclude: node_modules, dist, .git, .next, build, coverage, __pycache__, .cache, vendor, out. MUST exclude non-production directories from production code analysis: scripts, migrations, seed, fixtures, examples, __mocks__. Files in .ratchetignore MUST be excluded. Only supported file extensions (§5) are included. Files MUST be separated into production and test sets using language-specific rules.

Step 2 — Content Loading & Preprocessing

Read all files into memory. For pattern matching, use comment-stripped source (comments and string literals replaced with whitespace of equal length). Use original source for line length and function length calculations.

Step 3 — Language Detection

Determine the primary language as the language with the most production source files. If multiple languages exist, apply language-specific rules per file. TypeSafety MUST use the language of each individual file.

Step 4 — Category Scoring

For each category: compute all subcategory raw scores using §4.3 rubrics and §5 provisions; sum to produce the category score; clamp to [0, category_maximum]. Scoring is performed over the production file set, except Testing which uses both sets.

Step 5 — Aggregation

Sum all six category scores. The result is naturally bounded to [0, 100].

Step 6 — Required Output

A conformant implementation MUST produce: final Ratchet Score; each category name, score, and maximum; each subcategory name, score, maximum, and summary; specification version (e.g., RSS v1.0.0); file counts; primary language detected.

A conformant implementation SHOULD also produce actionable remediation suggestions and file paths for each finding.

§8 Versioning

The specification uses Semantic Versioning 2.0.0 (MAJOR.MINOR.PATCH).

Version TypeTriggerCompatibility Guarantee
MAJOR Breaking changes producing materially different scores 90-day migration grace period; both versions reported during transition
MINOR New subcategories, new language support, rubric clarifications Scores MUST NOT decrease; new points only
PATCH Bug fixes and clarifications with no intended scoring change Fully backwards compatible

Backwards Compatibility Guarantees

Within a MAJOR version: scores MUST NOT decrease due to specification updates alone; category maxima are immutable; the six core category names are immutable; conformance level thresholds are immutable. The reference implementation will support scoring under previous MAJOR versions for at least 24 months after a new MAJOR release.

A Reference Implementation

ratchet-run (npm)

The primary reference implementation is the ratchet-run npm package, the authoritative reference for resolving rubric ambiguities.

npx ratchet-run scan

Source: github.com/kcemate/ratchet

The reference implementation contains: the scoring engine (src/commands/scan.ts), language pattern libraries (src/core/language-rules.ts), shared constants (src/core/scan-constants.ts), and a test suite (2,247+ cases as of v1.0). Implementors building alternative conformant implementations SHOULD use the ratchet-oss test suite as a validation harness.

This specification is hosted canonically at ratchetcli.com/docs/specification and version-controlled in the docs/ directory of the ratchet-oss repository.

B Benchmark Dataset

Ratchet Code Quality Benchmark v0.1

The Ratchet Code Quality Benchmark v0.1 is a curated dataset of open-source repositories with human-validated Ratchet Scores, used to validate conformance of alternative implementations, calibrate rubric updates, and provide industry percentile baselines.

The benchmark includes repositories across all nine supported languages with scores ranging from 30 to 98, representing real-world codebases from the open-source ecosystem.

A conformant implementation MUST score ≥90% of benchmark repositories within ±3 points of the reference implementation. Differences of ±1 point due to whitespace normalization are not considered non-conformant.

Contact the Ratchet Standards Working Group for benchmark dataset access.

How to Cite This Document
"Ratchet Score Specification v1.0.0 (RSS v1.0.0)"
Ratchet Standards Working Group, March 2026.
https://ratchetcli.com/docs/specification
Licensed under CC BY 4.0.

Ratchet Score Specification v1.0.0 — 2026-03-27
© 2026 Ratchet. Licensed under Creative Commons Attribution 4.0 International.