Quantile Metric: A New Approach to Compare Different Aggregation Methods for Point and Interval Estimates
Four common elicitation formats utilized in JDM research and in forecasting studies are point estimates of quantity, point-probability estimates, subjective confidence intervals of quantity and vague intervals of quantity. Because these four modes use different calibration measures it is difficult to directly compare their relative accuracy. Besides elicitation method, quality of forecasting can be also influenced by variations in the group size of individual forecasts and the various aggregation methods used (Larrick & Soll, 2006; Park & Budescu, 2005; Chen et al., 2016). Much effort was devoted to develop numerous methods to reduce forecasting bias, and it is equally important to develop methods to compare forecasts under different combinations of elicitation and aggregation methods. In this study, we developed a new standardized, easy-to-use new comparison metric that can be used across different elicitation methods, different aggregation methods (or across different combinations of elicitation and aggregation methods). In this method, the quantile metric, the aggregated raw performance measure scores (e.g., Brier score, Q scores, etc.) is mapped onto the cumulative distribution of individual performance scores of the same measures (cumulative distribution of individual Brier scores, Q scores, etc.) and the quality of the aggregated performance is calibrated by comparing it to the distribution of individual forecasters. Since quantile metric scores are defined similarly and use the same scale (0 - 100), it is not only possible, but also convenient, to compare aggregated performance across different elicitation methods, aggregation methods (including different aggregation group sizes) and across different evaluation measures (e.g., comparing Q score to hit rate). To validate the new quantile metric we re-analyzed data from Budescu & Du (2007) and analyzed quarterly forecasts of GDP growth, inflation rate from European Central Bank (ECB)’s Survey of Professional Forecasters. Using multiple illustrative examples, we demonstrate that the quantile metric method can be easily implemented to compare forecasting qualities obtained from (a) different aggregation methods (e.g., mean vs median aggregation), (b) different elicitation methods (e.g., point-probability forecasts vs probability-interval forecasts), (c) different aggregation group sizes, (c) different calibration measures (e.g., hit rate vs Q score), and present longitudinal trajectory of forecasting quality of time series forecasts.
Han, Ying, "Quantile Metric: A New Approach to Compare Different Aggregation Methods for Point and Interval Estimates" (2018). ETD Collection for Fordham University. AAI10929369.