[1/6] Our new preprint is now available on arXiv. We revisit baselines in policy gradient methods and show that they have a much bigger role than simply variance reduction! With
Wesley Chung, Valentin Thomas, and @le_roux_nicolas.
https://arxiv.org/pdf/2008.13773.pdf">https://arxiv.org/pdf/2008....
[2/6] We show, for example, that two different baselines, that lead to the *same* variance, can induce different learning dynamics. It is not about variance, but the direction of the gradient, which is affected the baseline! We have both empirical and theoretical results on this.
[3/6] In fact, the baseline can impact the convergence point of the solution, even though it doesn& #39;t change the expectation of the gradients! We showed this empirically and theoretically. To do so theoretically, we looked at the stochastic estimates, not the expected setting.
[4/6] We also discuss a different way to speed up learning while ensuring convergence: importance sampling. However, we are talking about *designing* the sampling distribution instead of just correcting for trajectories someone else gave you. This opens up so many possibilities!
[5/6] I learned a lot while working with Wesley, Valentin, and Nicolas on this. I was often surprised by how the "folk knowledge" I had about baselines in PG methods was wrong. Carefully analyzing these things was very rewarding. Assumptions in theoretical results do matter!
[6/6] There& #39;s more to be done. In a sense, we& #39;re starting a conversation. In optimization, we often talk about curvature and variance but, in RL, it is more complicated than that https://abs.twimg.com/emoji/v2/... draggable="false" alt="😅" title="Lächelndes Gesicht mit offenem Mund und Angstschweiß" aria-label="Emoji: Lächelndes Gesicht mit offenem Mund und Angstschweiß">. I& #39;m particularly excited about the consequences this can have on how we think about exploration.
[7/6] I hadn& #39;t realized @wes_chung has a twitter account. My bad https://abs.twimg.com/emoji/v2/... draggable="false" alt="🙄" title="Gesicht mit rollenden Augen" aria-label="Emoji: Gesicht mit rollenden Augen">. I& #39;m tagging him in this thread.
You can follow @MarlosCMachado.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: