Our new paper https://rdcu.be/b5r1Y  shows that the mental models subjects bring into a two-stage task can drastically change their behavior. This thread is teaser of this work lead by @carolfehersilva now out in @NatureHumBehav #neuroeconomics @econ_uzh
The punchline is that humans are much more model-based than previously assumed, and apparent links between model-free learning in the two-stage task and habits, working memory, OCD, or age need to be revisited.
This because the standard assumptions and analysis of the two-stage task can falsely conclude that purely model-based agents are influenced by model-free learning.
The two-stage task has been the workhorse experimental paradigm for distinguishing between simple model-free (MF) and model-based (MB) learning since @nathanieldaw @gershbrain @benosaka Peter Dayan, and Ray Dolan introduced it in 2011.
The beauty of the two-stage task is that makes clear and distinct predictions for how MF vs MB agents should behave. Briefly, purely MF agents disregard the structure of the environment and are more likely to repeat actions that yielded reward in the past.
In contrast, MB agents use their understanding of the probabilistic transition structure of the task to guide their actions. Thus, MF agents show a main effect of reward and MB agents are characterized by reward X transition interactions.
Dozens of studies have shown that human behavior doesn't match either purely MF or MB predictions. Instead it seems like a hybrid of the two learning types. The hybrid pattern shown below has both a main effect of reward and an interaction with transition
The standard interpretation of this seemingly hybrid behavior has been that it is evidence that MF and MB learning take place in parallel and can compete to drive our actions.
The environmental and individual characteristics that might determine the relative influence of MF vs MB learning have been studied extensively in the social, biological, and medical sciences literature - often using the two-stage task as a probe.
The problem is that hybrid-like behavior in the two-stage task can be the result of many different learning algorithms and is not specific to a weighted combination of the simple MF and correct MB algorithms the task is designed to test.
The (too) strong assumption when using the two-stage task to distinguish simple MF from correct MB learning is that behavior is limited to only those algorithms or something in between. This case is depicted by the white spheres below.
However, in reality there are unlimited variations on the simple MF and correct MB algorithms. These variations are depicted by the gray spheres in this image.
We demonstrate that purely model-based agents can show hybrid-like behavior on the two-stage task if they use a model of the task that deviates from the true structure. The agents understand the probabilistic transitions, but get other aspects wrong.
The question then is whether or not human participants form mental models of the task that deviate from its true structure. We provide several forms of evidence that they do, even if they can report understanding the basics of the transition structure.
Critically, when we gave subjects highly detailed instructions that explained the two-stage task within a story that included reasons for each component, they behaved in a manner indistinguishable from purely model-based agents.
Thus, when healthy adult humans understand and form the correct mental model of the two-stage task, they demonstrate model-based learning.
Hybrid-like behavior in previous studies may be the result of competition b/n MF and MB learning as generally assumed OR it may be the result of participants using (partially) incorrect mental models of the task. We can't be sure!!
This doesn't change the fact that behavior on the two-stage task is associated with individual differences in working memory, compulsive symptomatology, etc. However, it does mean that we need to rethink the conclusions we draw from these results.
The two-stage task may be revealing differences in how well individuals form and maintain the correct mental model of the task rather than their relative degree of MF versus MB learning. Again, we can't be sure!
The ideas behind the two-stage task and algorithmic implementations of psychological concepts in general have been extremely influential and beneficial in pushing the cycle of theory and experimentation further.
We hope that our results serve to spark efforts to develop new and better empirical methods for testing critical concepts and theories of reward learning and decision making.
You can follow @Todd_A_Hare.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: