LOO-CRPS Conformal Regression with Optimal Binning
Click ▶ Run to generate
Click ▶ Run to generate
Click ▶ Run to generate
Click ▶ Run to generate
Click ▶ Run to generate
Click ▶ Run to generate

About

This demo illustrates optimal binning for regression via LOO-CRPS minimisation, with full conformal prediction intervals and the Venn prediction band.

Method overview

  1. Optimal binning: given n training points (xi, yi) sorted by x, the x-axis is partitioned into K contiguous bins. Bin boundaries minimise the total LOO-CRPS (Leave-One-Out Continuous Ranked Probability Score) — a proper scoring rule for distributional predictions — via dynamic programming in O(n²K) time.
  2. K selection: the number of bins K is chosen by a 50/50 alternating train/test split. Within-sample LOO-CRPS is subject to in-sample optimism, so K must be selected on held-out data.
  3. Predictive distribution: for a new x*, the predictive distribution is the within-bin empirical CDF.
  4. Venn band: the family of ECDFs obtained by adding each hypothetical label yh spans a band of constant width 1/(m+1) around the ECDF.
  5. Conformal prediction: CRPS is used as the nonconformity score. The prediction set Γε = {yh : p(yh) > ε} satisfies P(Y* ∈ Γε) ≥ 1−ε under exchangeability within the bin (finite-sample, distribution-free).

Controls

ControlDescription
PresetChoose a pre-defined data-generating process
Code editorEdit the Data Generating Process directly. The top section must return a sample from Y|X=x. The variable x (number) and rng (RNG object) are in scope.
nNumber of training observations
x lo / x hiRange of x for training data
K_maxMaximum number of bins considered during CV
Coverage (1−ε)Target coverage level for conformal intervals
SeedRandom seed

Writing a custom Data Generating Process

The editor has two sections separated by a special comment line. The first section is the DGP body: a JS function body with x (number) and rng (RNG object) in scope that must return one sample of Y | X=x.

The rng object has: .normal(mean, std), .uniform(lo, hi), .gamma(shape, scale), .random().

The optional second section (after // --- trueQuantile(x, p) ---) is the quantile body: a JS function body with x, p, and erfInv in scope that returns the p-th quantile of Y | X=x. When provided, oracle intervals (dashed red) are shown on the Fan Plot.

// Y | X=x ~ Normal(x, 0.3*(1+x))
const std = 0.3 * (1 + x);
return rng.normal(x, std);

// --- trueQuantile(x, p) ---
const std = 0.3 * (1 + x);
const z = Math.sqrt(2) * erfInv(2 * p - 1);
return x + std * z;

Plots

TabWhat is shown
K SelectionLeft: within-sample LOO-CRPS (subject to in-sample optimism, monotone in K). Right: cross-validated test CRPS with U-shape; K* is marked.
PartitionTraining data scatter with coloured bin spans and boundary lines.
Venn BandWithin-bin ECDF and Venn prediction band (width 1/(m+1)) for four representative test points.
P-value curvesConformal p-value p(yh) for each bin; shaded region is prediction set.
Fan PlotConformal prediction intervals across the x-range.
CoverageP-value histogram + empirical coverage curve on fresh test points.

References