EA - Wisdom of the Crowd vs. "the Best of the Best of the Best" by nikos

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Podcast artwork

Categories:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Wisdom of the Crowd vs. "the Best of the Best of the Best", published by nikos on April 4, 2023 on The Effective Altruism Forum.SummaryThis post asks whether we can improve forecasts for binary questions merely by selecting a few accomplished forecasters from a larger pool.Using Metaculus data, it comparesthe Community Prediction (a recency-weighted median of all forecasts) witha counterfactual Community Prediction that combines forecasts from only the best 5, 10, ..., 30 forecasters based on past performance (the "Best") anda counterfactual Community Prediction with all other forecasters (the "Rest") andthe Metaculus Prediction, Metaculus' proprietary aggregation algorithm that weighs forecasters based on past performance and extremises forecasts (i.e. pushes them towards either 0 or 1)The ensemble of the "Best" almost always performs worse on average than the Community Prediction with all forecastersThe "Best" outperforms the ensemble of all other forecaster (the "Rest") in some instances.the "Best" never outperform the "Rest" on average for questions with more than 200 forecastersperformance of the "Best" improves as their size increases. They never outperform the "Rest" on average at size 5, sometimes outperform it at size 10-20 and reliably outperform it for size 20+ (but only for questions with fewer than 200 forecasters)The Metaculus Prediction on average outperforms all other approaches in most instances, but may have less of an advantage against the Community Prediction for questions with more forecastersThe code is published here.Conflict of interest noteI am an employee of Metaculus. I think this didn't influence my analysis, but then of course I'd think that, and there may be things I haven't thought about.IntroductionLet's say you had access to a large number of forecasters and you were interested in getting the best possible forecast for something. Maybe you're running a prediction platform (good job!). Or you're the head of an important organisation that needs to make an important decision. Or you just really really really want to correctly guess the weight of an ox.What are you going to do? Most likely, you would ask everyone for their forecast, throw the individual predictions together, stir a bit, and pull out some combined forecast. The easiest way to do this is to just take the mean or median of all individual forecasts, or, probably better for binary forecasts, the geometric mean of odds. If you stir a bit harder, you could get a weighted, rather than an unweighted combination of forecasts. That is, when combining predictions you give forecasters different weights based on their past performance. This seems like an obvious idea, but in reality it is really hard to pull off. This is called the forecast combination puzzle: estimating weights from past data is often noisy or biased and therefore a simple unweighted ensemble often performs best.Instead of estimating precise weights, you could just decide to take the X best forecasters based on past performance and use only their forecasts to form a smaller ensemble. (Effectively, this would just give those forecasters a weight of 1 and everyone else a weight of 0). Presumably, when choosing your X, there is a trade-off between "having better forecasters" and "having more forecasters" (see this and this analysis on why more forecasters might be good).(Note that what I'm analysing here is not actually a selection of the best available forecasters. The selection process is quite distinct from the one used for say Superforecasters, or Metaculus Pro Forecasters, who are identified using a variety of criteria. And see the Discussion section for additional factors not studied here that would likely affect the performance of such a forecasting group.)MethodsTo get some insights, I analys...