Ratchet Score Specification v1.0 — The Open Code Quality Standard

This document defines the Ratchet Score — a normalized, reproducible, language-aware measure of software code quality expressed as a dimensionless integer on a scale of 0 to 100. The score quantifies quality across six weighted categories: Testing (25%), Security (15%), TypeSafety (15%), ErrorHandling (20%), Performance (10%), and CodeQuality (15%). It is designed to be implementation-agnostic, cite-able in regulatory and compliance contexts, and suitable for automated quality gates, insurance underwriting, procurement evaluations, and developer tooling.

1 Scope
2 Normative References
3 Terms & Definitions
4 Scoring Model
5 Language Provisions
6 Conformance Levels
7 Methodology
8 Versioning
A Reference Implementation
B Benchmark Dataset

§1 Scope

1.1 What This Specification Covers

This specification defines the six scoring categories and their relative weights, the subcategory rubrics used to derive per-category scores, the normalization procedure that maps raw scores to a 0–100 final score, language-specific adaptations for nine supported languages, conformance levels for implementations and scored repositories, and the step-by-step scoring methodology a conformant implementation MUST follow.

1.2 What This Specification Does Not Cover

This specification does not address: execution correctness, runtime performance, functional completeness, business logic correctness, dependency vulnerability scanning (CVEs in third-party packages), or licensing compliance.

1.3 Supported Languages

TypeScript JavaScript Python Go Rust Java Kotlin C# PHP

§2 Normative References

The following documents are referenced normatively. In cases of conflict, this specification takes precedence.

Reference	Title	Relevance
ISO/IEC 25010:2011	Systems and software Quality Requirements and Evaluation (SQuaRE)	Quality characteristic taxonomy informing the Ratchet category model
NIST SP 800-218 (SSDF v1.1)	Secure Software Development Framework	Secure development practices against which Security and ErrorHandling are calibrated
CWE Top 25 (2023)	Common Weakness Enumeration Most Dangerous Weaknesses	Anti-patterns detected under the Security category
OWASP Top 10 (2021)	Open Web Application Security Project Top 10	Input validation, secrets management, and authentication subcategories
RFC 2119	Key words for use in RFCs to Indicate Requirement Levels	Normative language (MUST, SHOULD, MAY) used throughout

§3 Terms and Definitions

3.1 Ratchet Score

The final normalized integer score on [0, 100] produced by applying the scoring model in §4. Higher is better.

3.2 Category

One of six top-level scoring dimensions: Testing, Security, TypeSafety, ErrorHandling, Performance, CodeQuality. Each has a defined maximum score (weight) and subcategories.

3.3 Subcategory

A measurable signal within a category. Subcategory maximum scores sum to the category maximum.

3.4 Raw Score

The integer score assigned to a subcategory, from 0 to the subcategory's maximum, computed by the rubric in §4.3.

3.5 Production Code

Source files that are not test files and not in non-production directories (scripts, migrations, fixtures, etc.). The primary input to all categories except Testing.

3.6 Comment-Stripped Source

A version of a source file with comments and string literals replaced by whitespace of equal length, used for pattern matching to eliminate false positives from commented-out code.

3.7 Conformant Implementation

A scoring engine that produces Ratchet Scores by following §7, applying rubrics in §4.3, and respecting language-specific provisions in §5.

§4 Scoring Model

4.1 Category Weights

The Ratchet Score is the sum of six category scores. The sum of all category maxima MUST equal 100.

25

Testing

max: 25 pts

20

Error Handling

max: 20 pts

15

Security

max: 15 pts

15

Type Safety

max: 15 pts

15

Code Quality

max: 15 pts

10

Performance

max: 10 pts

ℹ

ISO 25010 alignment: Testing maps to Reliability→Testability; Security maps to Security; TypeSafety maps to Reliability→Fault Tolerance; ErrorHandling maps to Reliability→Recoverability; Performance maps to Performance Efficiency; CodeQuality maps to Maintainability.

4.2 Subcategory Structure

Category	Subcategory	Max
Testing (25)	Coverage Ratio	8
	Edge Case Depth	9
	Test Quality	8
Security (15)	Secrets & Env Vars	3
	Input Validation	6
	Auth & Rate Limiting	6
TypeSafety (15)	Strict Config	7
TypeSafety (15)	Type Density	8
ErrorHandling (20)	Coverage	8
	Empty Catches	5
	Structured Logging	7
Performance (10)	Async Patterns	3
	Console Cleanup	5
	Import Hygiene	2
CodeQuality (15)	Function Length	4
	Line Length	4
	Dead Code	4
	Duplication	3

4.3 Selected Rubrics

Testing — Coverage Ratio (max: 8)

Ratio (test files / prod files)	Score
R ≥ 1.0	8
0.75 ≤ R < 1.0	6
0.50 ≤ R < 0.75	4
0.25 ≤ R < 0.50	2
0 < R < 0.25	1
R = 0 (no tests)	0

Security — Secrets & Env Vars (max: 3)

Condition	Score
No secrets detected	3
1 potential secret detected	1
2+ secrets detected	0

ErrorHandling — Empty Catches (max: 5)

Counts silent failure patterns: empty catch blocks, bare except:, ignored error returns (_ = f()), unguarded .unwrap() calls.

Silent failure count	Score
0	5
1–2	4
3–5	3
6–10	2
11–20	1
> 20	0

4.4 Score Normalization

The final Ratchet Score is computed as:

RatchetScore = Σ(category_score_i)
  where i ∈ {Testing, Security, TypeSafety, ErrorHandling, Performance, CodeQuality}
  and Σ(category_max_i) = 100

Because category maxima sum to 100, no additional scaling is required. The score is naturally bounded to [0, 100].

Score	Grade	Label
95–100	A+	Exceptional
85–94	A	Excellent
70–84	B	Good
50–69	C	Fair
30–49	D	Poor
0–29	F	Critical

§5 Language-Specific Provisions

The Ratchet Score MUST be adapted to the idioms of the language under analysis. For categories not mentioned under a given language, the general rubrics in §4.3 apply.

TypeScript

TypeSafety Strict Config is evaluated against tsconfig.json: "strict": true → 7pts; noImplicitAny + strictNullChecks → 5pts; noImplicitAny alone → 3pts; TypeScript present, no flags → 1pt. Type Density measures any annotation density per 100 lines.

JavaScript

JavaScript receives a fixed TypeSafety score of 0/15. JavaScript is dynamically typed by design; scoring 0 accurately reflects the inherent risk without distorting other categories. Implementations SHOULD communicate this clearly.

Python

TypeSafety evaluated against pyrightconfig.json, mypy.ini, or pyproject.toml sections → up to 7pts. Bare except: (no exception type) MUST be counted as empty catch equivalents. print() calls are penalized unless a logging library is imported.

Go

Strict Config: 7/7 (Go compiler enforces types by design). Type Density measures interface{} / any usage. ErrorHandling counts if err != nil {} patterns; ignored errors (_ = f()) MUST be penalized as empty catches.

Rust

Strict Config: 7/7. Type Density: 8/8 (idiomatic Rust has no type escape hatches). .unwrap() and .expect() MUST be counted as empty catch equivalents (they panic on error). The ? operator is correct error propagation and MUST NOT be penalized.

Java

TypeSafety evaluated against Maven compiler plugin configuration or Gradle's sourceCompatibility. Async patterns detected via CompletableFuture, @Async, ExecutorService.

Kotlin

TypeSafety evaluated against build.gradle.kts; -Werror or allWarningsAsErrors → 7pts. Type Density measures !! (non-null assertion operator) usage — a null-safety escape hatch.

C# (CSharp)

TypeSafety evaluated against <Nullable>enable</Nullable> in .csproj → 7pts. Supports MSTest, NUnit, and xUnit test frameworks. Async patterns via async Task and async ValueTask.

PHP

TypeSafety evaluated against PHPStan (phpstan.neon) or Psalm (psalm.xml) → 7pts each; no static analysis → 1pt. Debug output includes var_dump(), print_r(), and dd() calls.

§6 Conformance Levels

Four graduated conformance levels indicate overall code quality, analogous to WCAG accessibility conformance levels (A/AA/AAA). They provide actionable quality targets rather than a binary pass/fail threshold.

Bronze

50+

Basic test coverage, no critical secrets, some error handling

Silver

70+

Good coverage (≥50% ratio), validation present, structured logging

Gold

85+

Strong coverage (≥75% ratio), strict types, comprehensive error handling

Platinum

95+

Near-perfect across all categories; exemplary engineering practice

6.1 Conformance Claims

A repository MAY claim a conformance level if a Ratchet Score produced by a conformant implementation meets the level's minimum score. Claims SHOULD include the specification version, score, date, implementation version, and git commit SHA.

Example Conformance Statement

This repository scored 87/100 under Ratchet Score Specification v1.0.0
(Ratchet Gold), assessed on 2026-03-27 using ratchet-run v2.4.1
at commit a3f9d2c.

6.2 Implementation Conformance

An implementation is conformant if it: (1) implements all six categories with specified maxima, (2) applies language-specific provisions in §5, (3) scores ≥90% of benchmark repositories within ±3 points of the reference implementation, (4) reports the specification version, and (5) discloses its methodology without undisclosed normalization factors.

§7 Scoring Methodology

A conformant implementation MUST follow these steps in order.

Step 1 — File Discovery

Recursively traverse the target directory. MUST exclude: node_modules, dist, .git, .next, build, coverage, __pycache__, .cache, vendor, out. MUST exclude non-production directories from production code analysis: scripts, migrations, seed, fixtures, examples, __mocks__. Files in .ratchetignore MUST be excluded. Only supported file extensions (§5) are included. Files MUST be separated into production and test sets using language-specific rules.

Step 2 — Content Loading & Preprocessing

Read all files into memory. For pattern matching, use comment-stripped source (comments and string literals replaced with whitespace of equal length). Use original source for line length and function length calculations.

Step 3 — Language Detection

Determine the primary language as the language with the most production source files. If multiple languages exist, apply language-specific rules per file. TypeSafety MUST use the language of each individual file.

Step 4 — Category Scoring

For each category: compute all subcategory raw scores using §4.3 rubrics and §5 provisions; sum to produce the category score; clamp to [0, category_maximum]. Scoring is performed over the production file set, except Testing which uses both sets.

Step 5 — Aggregation

Sum all six category scores. The result is naturally bounded to [0, 100].

Step 6 — Required Output

A conformant implementation MUST produce: final Ratchet Score; each category name, score, and maximum; each subcategory name, score, maximum, and summary; specification version (e.g., RSS v1.0.0); file counts; primary language detected.

✓

A conformant implementation SHOULD also produce actionable remediation suggestions and file paths for each finding.

§8 Versioning

The specification uses Semantic Versioning 2.0.0 (MAJOR.MINOR.PATCH).

Version Type	Trigger	Compatibility Guarantee
MAJOR	Breaking changes producing materially different scores	90-day migration grace period; both versions reported during transition
MINOR	New subcategories, new language support, rubric clarifications	Scores MUST NOT decrease; new points only
PATCH	Bug fixes and clarifications with no intended scoring change	Fully backwards compatible

Backwards Compatibility Guarantees

Within a MAJOR version: scores MUST NOT decrease due to specification updates alone; category maxima are immutable; the six core category names are immutable; conformance level thresholds are immutable. The reference implementation will support scoring under previous MAJOR versions for at least 24 months after a new MAJOR release.

A Reference Implementation

ratchet-run (npm)

The primary reference implementation is the ratchet-run npm package, the authoritative reference for resolving rubric ambiguities.

npx ratchet-run scan

Source: github.com/kcemate/ratchet

The reference implementation contains: the scoring engine (src/commands/scan.ts), language pattern libraries (src/core/language-rules.ts), shared constants (src/core/scan-constants.ts), and a test suite (2,247+ cases as of v1.0). Implementors building alternative conformant implementations SHOULD use the ratchet-oss test suite as a validation harness.

This specification is hosted canonically at ratchetcli.com/docs/specification and version-controlled in the docs/ directory of the ratchet-oss repository.

B Benchmark Dataset

Ratchet Code Quality Benchmark v0.1

The Ratchet Code Quality Benchmark v0.1 is a curated dataset of open-source repositories with human-validated Ratchet Scores, used to validate conformance of alternative implementations, calibrate rubric updates, and provide industry percentile baselines.

The benchmark includes repositories across all nine supported languages with scores ranging from 30 to 98, representing real-world codebases from the open-source ecosystem.

⚠

A conformant implementation MUST score ≥90% of benchmark repositories within ±3 points of the reference implementation. Differences of ±1 point due to whitespace normalization are not considered non-conformant.

Contact the Ratchet Standards Working Group for benchmark dataset access.

How to Cite This Document

"Ratchet Score Specification v1.0.0 (RSS v1.0.0)"
Ratchet Standards Working Group, March 2026.
https://ratchetcli.com/docs/specification
Licensed under CC BY 4.0.

Ratchet Score Specification v1.0.0 — 2026-03-27
© 2026 Ratchet. Licensed under Creative Commons Attribution 4.0 International.

$ ratchet score specification

§1 Scope

1.1 What This Specification Covers

1.2 What This Specification Does Not Cover

1.3 Supported Languages

§2 Normative References

§3 Terms and Definitions

3.1 Ratchet Score

3.2 Category

3.3 Subcategory

3.4 Raw Score

3.5 Production Code

3.6 Comment-Stripped Source

3.7 Conformant Implementation

§4 Scoring Model

4.1 Category Weights

4.2 Subcategory Structure

4.3 Selected Rubrics

Testing — Coverage Ratio (max: 8)

Security — Secrets & Env Vars (max: 3)

ErrorHandling — Empty Catches (max: 5)

4.4 Score Normalization

§5 Language-Specific Provisions

TypeScript

JavaScript

Python

Go

Rust

Java

Kotlin

C# (CSharp)

PHP

§6 Conformance Levels

6.1 Conformance Claims

6.2 Implementation Conformance

§7 Scoring Methodology

Step 1 — File Discovery

Step 2 — Content Loading & Preprocessing

Step 3 — Language Detection

Step 4 — Category Scoring

Step 5 — Aggregation

Step 6 — Required Output

§8 Versioning

Backwards Compatibility Guarantees

A Reference Implementation

ratchet-run (npm)

B Benchmark Dataset

Ratchet Code Quality Benchmark v0.1