Learn how Fospha measures model performance using nRMSE, R², and back-testing inside its transparent Glassbox framework. A practical guide to understanding model accuracy, detecting drift, and building trust in your marketing measurement.
Accurate measurement underpins every effective marketing decision. When a model guides how budgets are distributed, its reliability must be clear and demonstrable.
At Fospha, accuracy isn’t a single score; it’s a continuous discipline - a framework of checks that keeps predictions stable, interpretable, and transparent over time. This sits inside a Glassbox, Fospha’s commitment to full transparency across every modeling layer.
.png)
Fospha is a complete glass box: every model layer, validation step, and metric is open to inspection. You can see:
1. how the ensemble model is constructed
2. how different measurement components contribute (click, impression, post-purchase, halo)
3. the validation metrics behind every prediction
4. the daily, ad-level outputs those decisions rely on
This transparency is the foundation for the accuracy signals we report because a model can only be trusted if its workings and its performance are visible.
Within the Glassbox framework, Fospha evaluates accuracy using three complementary components:
1. nRMSE: a measure of predictive error
2. R²: an indicator of how much of your sales variance the model explains
3. Back‑testing: a type of out-of-sample validation that evaluates how well the model generalizes to unseen future periods by preserving the time order of the data.
These three signals work together to provide a complete view of model performance.
We work alongside the rest of your measurement stack — incrementality tests, attribution, and MMM — making each more usable day-to-day through transparent, always-on accuracy signals.
True measurement of model's accuracy requires two views:
1. How well the model learns from historical data
2. How well it performs on data it hasn’t seen
These two dimensions reflect the classic bias–variance tradeoff. A model that matches history too closely will fail tomorrow because it’s over-fitted. A model with a slightly imperfect match during training is often more reliable if it predicts new data consistently.
To balance this, we use:
- Performance metrics such as nRMSE and R²
- Out-of-sample validation to evaluate performance on unseen data
Together, these help us tune model complexity so it learns meaningful structure without absorbing noise - delivering stable, decision-ready predictions.
.png)
Normalized Root Mean Squared Error (nRMSE) measures how closely the model’s predictions align with observed outcomes. We track nRMSE continuously so performance stays transparent and stable over time.
At Fospha, we compute nRMSE by dividing RMSE by the mean of observed outcomes, which makes the metric comparable across brands and scales. Other normalization conventions exist (e.g., using the range or standard deviation), so it’s important to confirm definitions when comparing providers.
In practice, we use nRMSE to:
- Evaluate performance on held-out (unseen) periods as well as training data
- Monitor predictive performance daily to detect issues early
- Provide a consistent, comparable measure of model error across time
Key point: track the trend, not a single number. A low, stable nRMSE time series is a strong signal of dependable predictive performance, subject to the quality and stability of the underlying data. nRMSE is computed daily for every model we run, including click-based components and impression-based MMM, so performance is always visible.
R² represents the proportion of variation in the output that can be explained by the inputs the model learns from. It reflects in-sample fit, how well the model captures patterns in the training data, rather than predictive accuracy.
A practical way to interpret R² is to read it as the share of the “ups and downs” the model is able to explain.
For example:
An R² of 0.90 means the model explains about 90% of the variation in your historical sales - the rises, dips, and shifts - based on the inputs it learns from.
However, in time-series settings, R² can appear artificially inflated due to trends, seasonality, non-stationarity, or leakage. For this reason, we always interpret R² alongside out-of-sample performance metrics such as nRMSE.
Interpreting R² in context:
- High R² with weak predictive accuracy can indicate over-fitting
- Moderate R² with strong predictive accuracy can reflect a well-calibrated model in a complex, noisy environment
Back-testing allows us to evaluate how well a model performs on data it hasn’t seen. While nRMSE measures predictive error and R² reflects in-sample fit, back-testing shows whether those relationships hold outside the training window.
At its simplest, back-testing involves comparing model performance between the data it learns from and the future periods it hasn’t seen. If performance degrades on the unseen periods, it may indicate over-fitting or instability. If performance remains consistent, it suggests the model is learning meaningful structure rather than memorizing historical noise.
Back-testing provides an additional layer of confidence that the model will behave reliably in real-world, forward-facing conditions.
The accuracy of Fospha’s models are monitored continuously. Each modeling cycle follows a structured, repeatable loop:
1. Data refresh and retraining
2. Evaluation on held-out periods to assess generalization
3. Ongoing monitoring of performance metrics such as nRMSE and R² to track predictive error, model fit, and stability over time
4. Transparent reporting, with accuracy measures available to customers on request
Glassbox delivers clarity into how the model is performing. Metrics such as nRMSE trends and other validation outputs are monitored daily and shared with customers whenever they want a deeper look - complete with plain-English definitions and guidance so accuracy is easy to understand and verify.
This reflects Fospha’s Glassbox promise: accuracy metrics are openly available, definitions are clear, and the validation process is evidence-based.
What qualifies as “good” performance depends on the business and data environment. Still, reliable models typically show that:
- nRMSE trend is stable within a customer‑specific, empirically defined healthy range
- R² is credible for market complexity and read alongside nRMSE
Note: Healthy ranges are brand‑specific and derived empirically; we aim to keep accuracy within a stable band while improving generalization.
Whether you’re using Fospha or building in-house, a sound validation process typically includes:
1. Balancing fit and prediction: Use metrics such as R² and nRMSE to evaluate both in-sample fit and out-of-sample predictive performance. Each provides a complementary signal about how well the model reflects historical patterns and how reliably it generalizes to new data.
2. Continuous monitoring: Track model accuracy over time and investigate emerging signs of drift or instability early, rather than waiting for issues to compound.
3. Transparency: Ensure performance metrics and their definitions can be shared openly so stakeholders can understand model behaviour and trust the outputs.
Fospha supports this through daily, internal monitoring of core accuracy indicators such as nRMSE and R². These metrics are available to customers on request and typically shared via their CSM, ensuring brands have full transparency whenever they want to validate model behaviour, understand performance changes, or deepen trust in the outputs.
When comparing accuracy across providers, confirm the error normalization (how nRMSE is computed) and the out of sample performance testing. Different conventions can change reported values without reflecting a real difference in model quality.
Accuracy supports measurable trust across teams. Fospha’s Glassbox approach makes accuracy visible, routine, and actionable:
- You can observe trends, not just single results
- You can review model health whenever you need clarity
When evaluating measurement partners, consider asking:
1. How frequently are models retrained and validated?
2. What defines a healthy accuracy range for our context?
The clarity of their answers reveals the strength of their measurement discipline.

For over 10 years we've been leading the change in marketing measurement.