NICE ESF Standard 15: Real-World Evidence and the Eternal Pilot Problem

By:
Ollie Curtis, MSc
Last Updated:
April 30, 2026
Date Created:
April 30, 2026

Most DHT companies fail NICE Evidence Standard 15 for the same reason. They confuse a successful pilot with real-world evidence. The two are not the same thing, and the gap between them is the single best explanation of why so many digital health products get stuck in what the NHS knows as the eternal pilot — products that work brilliantly under supported conditions and somehow never quite scale.

This is not a procurement problem. It is an evidence problem. Standard 15 is the part of the NICE Evidence Standards Framework that asks whether your DHT actually delivers its claimed benefits in the conditions the NHS will deploy it in — not the conditions you ran the pilot in. Most submissions fail that test before a commissioner reads past the first page.

If you are a founder reading Standard 14 (effectiveness) and Standard 15 (real-world evidence) for the first time, here is the thing nobody tells you. Standard 14 gets you through the door. Standard 15 keeps you in the building. Get the second one wrong and you will be running pilots for the rest of your company’s life.

What Standard 15 actually asks for

Standard 15 applies to every DHT — Tier A, B and C — and it asks one fundamental question. If you removed the founder team’s weekly site visits, the engaged clinical champion, the consented patient cohort and the bespoke implementation support, would the product still deliver?

NICE breaks that down into five practical dimensions. Adoption and engagement that survives without manufacturer-led prompting. Effectiveness in the full mess of NHS deployment — heterogeneous patients, variable IT infrastructure, real workforce constraints. Generalisability across multiple sites and demographics. A defensible comparator: matched controls, historical baselines, or concurrent non-adopting sites. And, for AI-driven products, ongoing performance monitoring with drift detection and override-rate tracking.

The NICE Real-World Evidence Framework, published in June 2022, made the standard sharper still. NICE no longer treats real-world evidence as a soft supplement to clinical trial data. It treats it as a core part of the evidence base, with explicit expectations about study design, data quality and analytical rigour. If your real-world evidence reads like a marketing case study, NICE will treat it like one.

Pilot data is not real-world evidence

The most expensive misunderstanding in digital health is treating a successful pilot as Standard 15 evidence. It is not, and the differences are not subtle.

A typical NHS digital health pilot has manufacturer support on the ground, a clinical champion personally invested in the product, a recruited and consented patient cohort, and a clear implementation runway. Real-world deployment has none of these. Once you withdraw the scaffolding, performance frequently degrades — sometimes dramatically. Researchers call this gap the efficacy-effectiveness gap. NHS commissioners have a less polite name for it: every product they have ever decommissioned. The framework that has done most to legitimise this concern in the academic literature is Greenhalgh and colleagues’ NASSS model, which catalogues the conditions under which health technologies fail to scale, are abandoned, or never sustain. NHS commissioners increasingly cite it in their evaluations. If your evidence package does not engage with these failure modes, you are leaving the most important question — will this still work next year, without you? — unanswered.

Where companies actually go wrong

Five failure modes recur in Standard 15 submissions, and they are predictable:

1. Mistaking a single supported pilot for real-world evidence

A 50-patient pilot at one trust, with implementation support, is not real-world evidence. It is a controlled study conducted in a real setting. Useful, but not what the standard is asking for.

2. Reporting uptake without engagement depth

Downloads and registrations are not benefits realisation. A platform that 70% of patients sign up to but only 30% actively use is not delivering its claimed benefits to 70% of the population. It is delivering them to 30%, and you need to say that.

3. Cherry-picking sites

Showing the two trusts where the product worked, omitting the three where it struggled or was abandoned, is selection bias. NHS commissioners are sophisticated enough to ask. Honest disclosure of deployment variation builds more trust than curated success stories.

4. No comparator

Real-world data without a comparison — to matched controls, historical rates, or concurrent non-adopting sites — is descriptive, not evaluative. It tells you what happened. It does not tell you whether it was better.

5. Ignoring the performance cliff

Pilots tend to perform well during the supported window and degrade once your team steps back. If your evidence covers only the supported period, you are not demonstrating sustainable performance. You are demonstrating what your team can do when it is in the room.

What good actually looks like

The strongest Standard 15 submissions are organisational programmes, not deliverables. They start the day the product is deployed, not the month before a renewal bid.

In practice that means a structured data-collection protocol running across every active deployment site, capturing operational metrics (uptake, engagement depth), outcome metrics (clinical endpoints, resource use, efficiency gains) and safety metrics (adverse events, errors, override rates) at defined intervals. It means a data governance framework compliant with Standard 5 and UK GDPR before you collect a single record. And it means a reporting structure that distinguishes operational data from outcome data from safety data, because commissioners are tired of being shown a single composite number that conflates all three. Reporting against the STROBE guidelines for observational studies is the closest thing the field has to a defensible default.

For AI and adaptive products, the bar is higher. Continuous performance monitoring infrastructure — model dashboards, drift detection, override-rate tracking, and a defined response protocol when performance degrades — needs to be live from deployment. This is the ongoing version of Standard 16 (post-market surveillance) and the foundation of any defensible response to Standard 4 subgroup-bias concerns.

The structural opportunity is real. The NHS Federated Data Platform is now live or in delivery across the majority of secondary-care trusts. The Clinical Practice Research Datalink continues to grow as a primary-care evidence asset. And the independent evaluation of the NHS AI Lab, published in npj Digital Medicine, makes the case explicit: the products that engaged early with national data infrastructure generated stronger evidence and scaled faster. Companies that build their real-world evidence programme around this infrastructure have an advantage that newer entrants will struggle to replicate.

What this means for founders

If you are building a Tier B or C DHT, three things follow directly.

First, design real-world evidence generation into the product itself, not into a Standard 15 “evidence chapter” you write later. The instrumentation, governance and analytics infrastructure you need to generate real-world evidence at scale is product-level architecture. Bolting it on later is expensive and frequently impossible.

Second, treat your first deployment site not as a pilot but as Site One of an evidence programme. The protocol, the comparator strategy, the subgroup analyses, and the post-support performance window — all of it should be defined before you go live. The King’s Fund’s analysis of innovation spread in the NHS is still essentially correct: spread fails because companies design for proof of concept, not for the conditions of system-wide adoption.

Third, accept that this is a multi-year investment with compounding returns. The companies with the largest real-world evidence bases — multi-site, multi-year, multi-subgroup — develop a defensibility that is genuinely hard to replicate. Standard 15, taken seriously, is one of the strongest moats available in digital health. Most companies treat it as a compliance hurdle. The few that treat it as a strategic asset are the ones that scale.

So What?

NICE ESF Standard 15 is the part of the framework that decides whether your company is in the digital health business or in the pilot business. Controlled effectiveness evidence under Standard 14 is necessary, but it is not what NHS commissioners reach for when they decide whether to renew. They reach for evidence that the product worked at the second site, the third site, and the site you stopped supporting eight months ago.

Build that evidence as a programme — instrumented from product Day One, reported transparently, generalised across the populations the NHS will actually deploy you in. The companies that take this seriously stop running pilots and start running deployments. The companies that don’t, don’t.

Healthonomix helps DHT companies design real-world evidence programmes that satisfy NICE ESF and stand up to commissioner scrutiny. Get in touch to discuss your evidence strategy.

Ollie Curtis, MSc

Founder @ Healthonomix. Health Economist & Digital Healthcare Consultant

Recently Published

We Want To Help You
Transform Your Product

Book in a free introductory call to discuss your product or project

NICE ESF Standard 15: Real-World Evidence and the Eternal Pilot Problem

Table of Contents

What Standard 15 actually asks for

Pilot data is not real-world evidence

Where companies actually go wrong

1. Mistaking a single supported pilot for real-world evidence