Click ▶ Run to generate
Click ▶ Run to generate
Click ▶ Run to generate
Click ▶ Run to generate
Click ▶ Run to generate
Click ▶ Run to generate
About
This demo illustrates optimal binning for regression via LOO-CRPS minimisation, with full conformal prediction intervals and the Venn prediction band.
Method overview
- Optimal binning: given n training points (xi, yi) sorted by x, the x-axis is partitioned into K contiguous bins. Bin boundaries minimise the total LOO-CRPS (Leave-One-Out Continuous Ranked Probability Score) — a proper scoring rule for distributional predictions — via dynamic programming in O(n²K) time.
- K selection: the number of bins K is chosen by a 50/50 alternating train/test split. Within-sample LOO-CRPS is subject to in-sample optimism, so K must be selected on held-out data.
- Predictive distribution: for a new x*, the predictive distribution is the within-bin empirical CDF.
- Venn band: the family of ECDFs obtained by adding each hypothetical label yh spans a band of constant width 1/(m+1) around the ECDF.
- Conformal prediction: CRPS is used as the nonconformity score. The prediction set Γε = {yh : p(yh) > ε} satisfies P(Y* ∈ Γε) ≥ 1−ε under exchangeability within the bin (finite-sample, distribution-free).
Controls
| Control | Description |
|---|---|
| Preset | Choose a pre-defined data-generating process |
| Code editor | Edit the Data Generating Process directly. The top section must return a sample from Y|X=x. The variable x (number) and rng (RNG object) are in scope. |
| n | Number of training observations |
| x lo / x hi | Range of x for training data |
| K_max | Maximum number of bins considered during CV |
| Coverage (1−ε) | Target coverage level for conformal intervals |
| Seed | Random seed |
Writing a custom Data Generating Process
The editor has two sections separated by a special comment line.
The first section is the DGP body: a JS function body with x (number)
and rng (RNG object) in scope that must return one sample of Y | X=x.
The rng object has: .normal(mean, std), .uniform(lo, hi),
.gamma(shape, scale), .random().
The optional second section (after // --- trueQuantile(x, p) ---) is the
quantile body: a JS function body with x, p, and
erfInv in scope that returns the p-th quantile of Y | X=x.
When provided, oracle intervals (dashed red) are shown on the Fan Plot.
// Y | X=x ~ Normal(x, 0.3*(1+x)) const std = 0.3 * (1 + x); return rng.normal(x, std); // --- trueQuantile(x, p) --- const std = 0.3 * (1 + x); const z = Math.sqrt(2) * erfInv(2 * p - 1); return x + std * z;
Plots
| Tab | What is shown |
|---|---|
| K Selection | Left: within-sample LOO-CRPS (subject to in-sample optimism, monotone in K). Right: cross-validated test CRPS with U-shape; K* is marked. |
| Partition | Training data scatter with coloured bin spans and boundary lines. |
| Venn Band | Within-bin ECDF and Venn prediction band (width 1/(m+1)) for four representative test points. |
| P-value curves | Conformal p-value p(yh) for each bin; shaded region is prediction set. |
| Fan Plot | Conformal prediction intervals across the x-range. |
| Coverage | P-value histogram + empirical coverage curve on fresh test points. |
References
- Vovk, V., Gammerman, A. & Shafer, G. (2005). Algorithmic Learning in a Random World. Springer.
- Source code: github.com/ptocca/RegressionVenn