there seems to be an interesting asymmetry between MCMC and variational inference, in the sense that

- there are some cases in which MCMC can be used (≈ naturally), to improve VI, but
- there are relatively fewer in which VI methods can be used to improve MCMC.

[ 1 / 4 ]
i think it's a local / global thing; it's relatively plausible that any local algorithm (e.g. MCMC, gradient descent, ...) could be embedded in some other method. on the other hand, VI is basically trying to solve a global problem, and often (?) not that well.

[ 2 / 4 ]
most of the more convincing `VI-within-MCMC' schemes seem to boil down to either preconditioning or reparametrisation. it's a sensible enough goal - to be worthwhile, you only need to improve on the original parametrisation, which should be possible fairly often.

[ 3 / 4 ]
i'm curious to know whether there are other useful ways of nesting VI in MCMC, though it's not clear to me what the right approach is. it's tricky to pin down what a variational fit buys you in this context that e.g. a { MAP / Laplace approx./ ... } doesn't.

[ 4 / 4 ]
You can follow @sam_power_825.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: