🚨 Research highlight 🚨

A thread on what @MJvandenAssem and I learned about improving estimation accuracy by studying decisions made in casinos.

A story involving an unlucky ox, pearls, diamonds, poker chips, and €300,000. 👇
The quality of decisions depends on the accuracy of estimates of relevant quantities. One way to improve accuracy is to combine estimates of a group of individuals: aggregated estimates generally outperform most of the underlying estimates, and are often close to the true value.
This phenomenon has become known as 'the wisdom of crowds'. It was first described in Nature by the renowned British scientist Sir Francis Galton. Galton witnessed a weight judging competition at the 1906 West of England Fat Stock and Poultry Exhibition.
Visitors could win a prize by paying sixpence and estimating the weight of an exhibited ox after it had been “slaughtered and dressed”.
Galton collected all 800 tickets with estimates and found that the aggregate judgment of the group closely approximated the true value: the mean judgment was 1,197 lb, and the true value was 1,198 lb.
Similar results have since been observed in a wide range of experiments, and this form of aggregation has been successfully applied to improve, for example, economic forecasts, medical judgments, and meteorological predictions.
Unfortunately, there are many situations in which it is infeasible to collect judgments of others. Recent research proposes that a similar aggregation principle applies to repeated judgments from the same person: ‘the wisdom of the inner crowd’.
We test this promising approach in a real-world context: we use proprietary data comprising 1.2 million observations from three incentivized guessing competitions organized by the Dutch state-owned casino chain Holland Casino.
During the last weeks of 2013, 2014, and 2015, anybody who visited one of the casinos received a voucher with a login code. Via a terminal inside the casino and via the Internet, this code granted access to an estimation competition.
Participants were asked to estimate the number
of objects (representing pearls, diamonds, or poker chips) in a transparent plastic container located just inside the entrance. A prize of €100,000 was shared equally by those whose estimate was closest to the actual
value.
Our pseudonymized data sets contain all entries for the
three years: a total of 369,260 estimates from 163,719 different players in 2013, 388,352 estimates from 154,790 players in 2014, and 407,622 estimates from 162,275 players in 2015. Many players submitted multiple estimates.
The distributions of the estimates have a log-normal, right-skewed shape. People generally estimate large numerical values in a logarithmically compressed manner. This seems to be the result of an innate intuition for numbers, with numbers logarithmically encoded in the brain.
Given the log-normal distributions of the estimates, we follow the convention of using a logarithmic transformation. To make the distributions comparable across the three competitions, we divide the estimates by the true value before taking the logarithm.
This transformation yields approximately normal distributions, where zero represents the true value and deviations from zero measure the estimation error. The main conclusions do not rely on these transformations, but they do make the data much easier to work with.
To aggregate estimates both across and within individuals, we use the arithmetic mean of this transformed variable. That is equivalent to taking the geometric mean of the untransformed data. The aggregates approximate the true values reasonably well (see the previous figure).
Our accuracy measure is the mean squared error
(MSE). The figure shows the MSE of aggregations across the first t estimates for players who provided at least K=5 or K=10 estimates in a given year (in black). In all cases, the MSE declines with t, at a decreasing marginal rate.
The figure also plots the MSEs after averaging different players’ estimates (in dark grey). Aggregating across individuals works substantially better than aggregating judgments from one individual: the ‘outer crowd’ MSE declines at a much faster rate than the 'inner crowd' MSE.
In addition, the figure shows that the expected potential benefit from aggregating an infinite amount of estimates of a single individual (dotted line) barely exceeds the expected benefit from aggregating the judgments of two randomly selected individuals.
Finally, the figure shows that people improve their estimates over time (in light grey). This may be because people are exploiting aggregation benefits after talking with their friends, but could also be because reconsidering the problem leads them to better approaches.
In the previous analyses, the benefit of aggregating estimates from the same person may partly derive from such learning effects. For practical purposes, the exact sources and their contributions to the gain from within-person aggregation are unimportant.
Here we are also interested in the strength of within-person aggregation in the absence of learning. Therefore, we have analogously investigated the pattern of the MSE when the first K estimates from the same person are aggregated in random order.
We find that the pure ‘pure’ within-person aggregation benefit is considerably lower than the benefit of aggregating two judgments from different individuals: aggregating an infinite of judgments from a single individual approximates the wisdom of approximately 1.5 individuals.
Can a single person do better? Previous work suggests that accuracy gains are higher if there is a delay between estimates, allowing people to take a fresh perspective.
We exploit the variation in the timing between players’ first and second estimates to investigate the effect of delay on the benefit of aggregation. Figure a shows that accuracy gain increases almost monotonically with the delay.
For two estimates provided at a single point in time—a participant could enter up to five estimates simultaneously—the average accuracy gain from aggregation is 16–18%. For estimates submitted more than 5 weeks apart, the average accuracy gain is approximately 30%.
Figure b indicates that the increase in accuracy gain is a consequence of the decrease in correlation between estimates: the correlation coefficient decreases from over 0.8 when people entered the estimates simultaneously to approximately 0.5 when weeks passed between attempts.
Still, the benefits of within-person aggregation remain limited even if one fully exploits the benefit of delay. We statistically decompose the estimation error into a population bias, individual-level bias, and time-dependent noise term (details in the picture).
Allowing a single individual to make an infinite amount of estimates all an infinite amount apart in time would roughly approximate the wisdom of two individuals.
In conclusion, we find that the effectiveness of within-person aggregation is considerably lower than that of between-person aggregation: the average of a large number of judgments from one person is barely better than the average of two judgments from different people.
The efficacy difference is a consequence of the existence of individual-level systematic errors (idiosyncratic bias). The effect of these errors can be eliminated by combining estimates from multiple people, not by combining multiple estimates from a single person.
Within-person aggregation is potentially useful in situations where only one individual can make sufficiently informed estimates, such as in strictly personal matters or under high degrees of specialization, but between-person aggregation should be preferred whenever practicable.
This paper was joint work with @MJvandenAssem and is available here: http://rdcu.be/A1xF .
@ThreadReaderApp : please unroll
You can follow @dvdolder.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: