Coverage measurement · A methodology note

What 62% MeansJSR 354 RI · v1.4.5

A coverage measurement of the JSR 354 reference implementation, run against the latest stable release. The number that comes back is lower than most readers will expect. The reasons it is lower are more interesting than the number.

Ross Webb

This is the second in a series on test-quality measurement. The first piece argued that mutation reports are routinely read with less methodological care than the underlying data deserves. This piece makes the same argument about coverage reports, on a codebase no one wrote for the purpose of being measured: the reference implementation of JSR 354, the Java standard for monetary values.

JSR 354 is the dependency you reach for when you need correct money handling in a Java application — currency-aware arithmetic, rounding under named modes, decimal precision the JVM's primitives don't give you. A Series-B fintech rebuilding a ledger system, a regtech firm normalising cross-border transaction data, a payments processor reconciling against partner statements — any of them might pull JSR 354 from Maven Central in the next sprint. The version they would pull is 1.4.5, released 22 March 2025. That is the version measured here. The substrate is what their own engineers shipped, with their own tests, on their own latest stable release.

§1One number

61.8%

ProjectJSR 354 reference implementation

Version1.4.5 — released 22 March 2025

Modulemoneta-core · 19,408 instructions

Tests608 · all passing

ToolJaCoCo 0.8.11

Sixty-one point eight per cent line coverage on the core module of a JSR reference implementation. All 608 unit tests pass. The build is green. This is the version on Maven Central. This is the version the production application would pull.

Most readers will expect a higher number. Reference implementations of standards are supposed to be exemplary. JSR 354 has been actively maintained for over a decade, has a Technology Compatibility Kit, and ships with a test suite the maintainers consider release-ready. And the number is 61.8.

The other modules tell a more uneven story:

JaCoCo 0.8.11 · all modules · v1.4.5

Module	Tests	Line %	Branch %	Instr %
moneta-core	608	61.8%	52.1%	58.9%
moneta-test	13	88.9%	84.6%	91.0%
moneta-convert-base	33	33.7%	18.5%	30.4%
moneta-convert-imf	39	87.5%	82.5%	84.6%
moneta-convert-ecb	39	no data	no data	no data

Five modules. Three above 80% on line coverage. One — the largest, the one containing every Money class a consuming application would actually instantiate — at 61.8%. One reporting no data at all despite a passing build. The number is the start of the article, not its conclusion. The next four sections work through what the number means, what it doesn't, what it omits, and what would have to be true for it to mean what most readers assume it does.

§2What the number measures, what it doesn't

Line coverage measures the fraction of source lines executed at least once during a test run. Branch coverage measures the fraction of conditional outcomes (each if's true and false sides, each switch case) reached at least once. Both are observability metrics: they record what was touched. Neither records what was checked.

The distinction is usually treated as a footnote. On a real codebase it is the entire story. Consider org.javamoney.moneta.format.MonetaryAmountDecimalFormat:

MonetaryAmountDecimalFormat · org.javamoney.moneta.format moneta-core · v1.4.5

Line coverage 70.6% Most lines in the class are executed by the test suite at least once.

Branch coverage 0.0% None of the class's ten conditional branches is exercised on both its true and false sides.

The class is touched. The lines run. A coverage dashboard summarising this class as "70.6%" is reporting an honest measurement. But the class contains ten conditional branches — formatting decisions about negative amounts, thousand separators, currency placement, precision rules — and the test suite exercises none of them on both sides. A consuming application that hits any of those conditionals in the false direction is running code the test suite has never seen run. The dashboard does not say that. The dashboard says 70.6%.

MonetaryAmountDecimalFormat is the cleanest example in the codebase but not the only one. Nine other classes in moneta-core show line coverage above 70% with branch coverage below 60%, including spi.AbstractCurrencyConversion, spi.DefaultRounding, and the ConvertNumberValue floating-point conversion variants. In every case the same pattern: tests exercise the class's surface, tests don't probe its decisions.

The fix is not to read branch coverage instead of line coverage. Branch coverage is closer to what the reader wants but still an observability metric — it records that both sides of a conditional were reached, not that the test asserted on what they returned. A test that reaches both sides of an if and asserts nothing about the result still raises branch coverage. The next floor below branch coverage — the one that does measure verification — is mutation testing, which the previous article in this series covered. We come back to mutation testing in §5.

§3The 0% column

Coverage gaps in real codebases are rarely uniformly distributed. They cluster, and the clusters reveal something about how the test harness reaches code rather than what the code does. The moneta-core module contains a striking example: two structurally parallel HTTP loader implementations, in two adjacent packages, with the same class names and (broadly) the same responsibilities. One is exercised by tests. The other is not exercised at all.

spi.loader.okhttp

LoadableHttpResource60.5%
OkHttpLoaderService31.4%
OkHttpScheduler49.1%
HttpLoadDataService44.0%
HttpLoadLocalDataService31.2%

spi.loader.urlconnection

LoadableURLResource12.6%
URLConnectionLoaderService0.0%
URLConnectionScheduler0.0%
URLLoadDataService0.0%
URLLoadLocalDataService0.0%

The two packages exist because JSR 354 supports two HTTP transport strategies. The OkHttp variant is used when the OkHttp dependency is on the classpath. The URLConnection variant is the JDK-native fallback, used when a deploying team prefers not to add a third-party HTTP library. Both are documented. Both are reachable by configuration. Both are part of what a consuming application is depending on when it pulls the artefact.

From the test suite's point of view, only one of them exists.

The pattern continues outside the loader packages. Six classes in the OSGI runtime support code (internal.OSGIServiceProvider, OSGIServiceHelper, internal.OSGIActivator, others) are also at 0% line coverage. None of these classes runs in a standard JUnit-style test environment — they exist to bridge JSR 354 into an OSGI container at runtime — and the test suite does not start an OSGI container. The OSGI integration is reachable in production. It is unreachable in unit tests.

Thirty-one classes in moneta-core sit below 50% line coverage. Roughly half of them follow one of these two patterns: alternate-runtime entrypoints (OSGI, urlconnection-fallback) that are structurally outside the test harness's reach; thin-test core utilities (MonetaryConfig, DefaultMonetaryContextFactory, AbstractRateProvider, DefaultCashRounding, the *Producer factories) where the harness can reach but doesn't.

Test reachability is not behavioural reachability. The 0% column is what that gap looks like in production code.

This is a more honest framing of the headline number than the headline number itself can carry. The 61.8% is a single statistic averaging across a population of classes, some of which are well-tested in their hot paths, some of which are entirely outside the test harness's reach. A consuming team reading 61.8 and asking "is this enough?" is asking a question the number cannot answer. The same number could mean "the tests focus on the parts that matter and leave runtime-only code alone" or it could mean "an entire HTTP transport strategy ships with no tests at all." On moneta-core, it means both.

§4Silent coverage loss

The fifth row in the table at the top of this article is empty. Thirty-nine tests in moneta-convert-ecb ran. All thirty-nine passed. JaCoCo reports no data. The build was green and the report was empty.

The mechanism is mundane and worth understanding because it is reproducible across any Maven project that combines JaCoCo with custom Surefire arguments. Maven's Surefire plugin accepts a single <argLine> configuration value: the JVM arguments to pass to the forked test process. If multiple <argLine> entries are declared in the plugin's configuration block, Maven applies last-one-wins semantics — repeated declarations override earlier ones rather than concatenating. The JaCoCo agent works by injecting itself via the argLine: its prepare-agent goal sets the property to -javaagent:…/jacocoagent.jar. Any subsequent <argLine> declaration in a module's pom.xml overwrites that. The agent never attaches. The tests run uninstrumented. The coverage report finds no execution data and silently reports nothing.

01 jacoco:prepare-agent sets argLine to -javaagent:…/jacocoagent.jar=destfile=…

02 module's pom.xml declares <argLine>-Xmx1g …</argLine> in Surefire config

03 Maven applies last-one-wins; agent argLine is dropped

04 tests fork without the agent, run, all pass

05 JaCoCo report goal finds no jacoco-unit.exec file

06 single line in build log: "Skipping JaCoCo execution due to missing execution data file"

07 build is green; no error; module's coverage line in any aggregated dashboard reads zero or absent

This is not a JSR 354 bug, exactly. The Surefire <argLine> override semantics are documented. JaCoCo's reliance on the property is documented. The interaction is well-known to anyone who has spent time integrating the two. But it is silent: there is no warning, no error, no failed build step. A team setting up CI for a multi-module project, copying coverage thresholds from a template, will see a working build and a report with the headline numbers they expected for the modules where instrumentation worked. The module where it didn't will show as zero, or be omitted, or — depending on how the dashboard aggregates — be invisibly excluded from the average.

The point is not that this is rare. The point is that the coverage number on a dashboard is not necessarily a measurement that ran. A team reading 61.8% on moneta-core and trusting it implicitly trusts a number of upstream things: that the agent attached, that the instrumented bytecode is the bytecode that executed, that the report walked the same compiled artefact the production build will publish. Most of the time, on most projects, those things are true. The article you are reading is the one that exists because, on this project, on this module, one of them wasn't.

§5What mutation testing would tell you, and why no one ran it

The natural next step after a coverage measurement is a mutation testing run. Mutation testing — described in detail in the previous article — measures whether the test suite verifies behaviour, not just whether it touches code. On MonetaryAmountDecimalFormat with its 70.6% line and 0% branch coverage, mutation testing is precisely the tool that would expose what the tests fail to verify. The signal would be unambiguous: a low mutation score on a class with a respectable line coverage number is the canonical "touch but don't verify" pattern made measurable.

It was not run. The methodology section below explains why, in detail and on the record. The short version: PIT, the dominant mutation testing tool in the Java ecosystem, requires a TestNG bridge plugin to discover JSR 354's TestNG-based test suite. That bridge can only be activated by modifying the project's pom.xml. Modifying the project under measurement would compromise the article's central claim: that this is what the codebase ships, with the test suite the maintainers ship, exactly as a downstream consumer would receive it. The pattern of declining to modify the system under test is borrowed from clinical and physical measurement. The cost is that no mutation data is available for this article. The benefit is that what is reported is unambiguously about JSR 354 1.4.5, not about a derivative.

Methodology · PIT mutation testing was attempted and not completed

PIT versions 1.15.0 and 1.18.2 were attempted on moneta-core. Both produced an identical error during plugin initialisation:

"Failed to execute goal org.pitest:pitest-maven:<version>:mutationCoverage … Please check you have correctly installed the pitest plugin for your project's test library (JUnit 5, TestNG, JUnit 4 etc)."

JSR 354 1.4.5 uses TestNG. PIT requires the matching adapter (org.pitest:pitest-testng-plugin) loaded into the pitest-maven plugin's classloader. The adapter cannot be added through Maven's user-level settings.xml: the settings.xml schema (http://maven.apache.org/xsd/settings-1.0.0.xsd) defines its Profile element as accepting only id, activation, properties, repositories, and pluginRepositories. The build, pluginManagement, and plugins elements that would be required to declare a plugin dependency are not permitted in settings.xml profiles, by design. Verified empirically: a fully-formed profile containing those elements is silently stripped by the parser, leaving only the elements the schema accepts.

The remaining route — modifying moneta-core/pom.xml — was declined. Modifying the project's build configuration to enable measurement, even temporarily, would invalidate the claim that the measurement is of JSR 354 1.4.5 as shipped. No PIT data is therefore reported. All numbers in this article are from JaCoCo against the unmodified codebase.

The wider implication is more pointed than the local one. Coverage is observable from outside a project. Anyone with a JDK and a Maven installation can clone JSR 354, run JaCoCo, and produce the numbers in this article. Mutation score, in 2026, is not observable from outside without modification. The tooling friction is not theoretical; the article you are reading is empirical evidence of it. The practical consequence is that coverage numbers go publicly unchallenged in a way that mutation scores cannot, simply because the latter are not available to be challenged in the first place.

This asymmetry is most of the reason that coverage numbers have an outsized role in how teams talk about test quality, and most of the reason mutation scores are conspicuously absent from external technical communication. If you are a CTO wondering why your dependencies' coverage numbers are knowable while their mutation scores are not, the answer is not that mutation testing is unimportant. The answer is that it is currently inaccessible from the seat where the question is being asked.

§6How to read a coverage number

Three questions to bring to any coverage report — your own or a dependency's — that would have caught the issues in this article:

Did instrumentation actually run on every module the report covers?

A passing build and a coverage report are not evidence that the agent attached on every module. Check the build log for "Skipping JaCoCo execution due to missing execution data file" or its equivalent in your tooling. If a module reports zero or no data while its tests passed, the question is whether the tests ran without instrumentation, not whether they failed. moneta-convert-ecb in this article is the worked example: 39 tests, all passing, no coverage data.
What runs only outside the test harness?

OSGI containers, alternate-dependency code paths, JNI bridges, native-runtime fallbacks, framework integration points: code reachable in production but not in mvn test. These are the classes that show 0% in your coverage report and 100% in your dependency-graph reachability. The urlconnection loader package in moneta-core is the worked example: structurally identical to a tested package, structurally invisible to the test suite.
Is the test "touching" or "verifying"?

Branch coverage is the next floor down from line coverage and answers part of the question — a test that reaches both sides of a conditional has at least probed the decision. But branch coverage still does not measure assertion strength. Mutation score does. Where mutation testing is available — your own codebase, where you can install a TestNG bridge — run it. Where it is not — third-party dependencies, where you cannot — read line and branch coverage with full awareness that "touched" is not "checked." MonetaryAmountDecimalFormat at 70.6% line and 0% branch is the worked example.

And one for any team computing coverage on its own code: agree what "good" means before the number arrives. A team that decides 80% is the threshold and then measures 78% is having a different conversation from a team that measures 78% and then asks what the threshold should be. The first conversation is about the test suite. The second is about the number.