Thread by @karlhigley, Counter-intuitively, if your recommender is trained on data collected from past interactions [...]

Counter-intuitively, if your recommender is trained on data collected from past interactions with recs, retraining new models regularly doesn’t resolve the issue of performance decaying over time and can actually make it worse. Here’s why...

The usual concern is that the performance of a model will decline as the distribution of data encountered during serving deviates from the distribution of the training data.

This is an accurate concern about a real phenomenon, and is a good reason to retrain on a regular basis.

However, if the training data is collected from interactions with past recommendations, then which data you collected was determined by which recommendations were made, which was determined by the last model served in production. See the issue?

Any bias introduced by the previous model changes which data you train the next model on—and the whole purpose of a recommender model is to be intentionally biased toward relevant items.

That doesn’t sound so bad at first, but as more and more subsequent models are trained on increasingly biased data, it can lead the recommender to collapse in on itself and recommend an increasingly narrow set of mostly popular items to a broader and broader slice of the users.

A similar effect applies to evaluation: when evaluating a new model on data collected from interactions with past recs, you don’t have data about interactions with items that weren’t recommended, so models that are similar to the previous model will tend to look the best.

Re-training regularly is a good idea, but without correcting for the bias current models introduce in future training and evaluation data, it can accelerate the destructive feedback loop.

So how do you correct for this effect? Well, you need some data about interactions that the current model wouldn’t normally collect, so instead of a recommendation policy that greedily exploits the best possible recs (according to the model), you introduce some exploration.

That lets you sample at least a little bit of data about interactions you wouldn’t normally see, which makes counterfactual/off-policy training and evaluation possible.

In essence, you collect a much broader but still very biased dataset, and then use statistics to reweight the resulting logged data for training and evaluation to look like it was collected with a more uniform recommendation policy (instead of a greedy policy.)

In order to accomplish this, introducing some exploration into the items being served to users (whether from recs or an existing non-algorithmic source) is one of the first things to do, even though it’s often at the end of a recommendations pipeline.

It doesn’t have to be anything fancy, even an epsilon-greedy policy, where some small percentage of the time an item is chosen at random (perhaps from a pool of potentially relevant candidate items), gives you broader data about interactions you wouldn’t have seen otherwise.

I’ll close with some papers and pointers to more info.

This excellent RecSys paper demonstrates what happens when you don’t correct for the biases introduced by past models, algorithms, and recs:

https://dl.acm.org/doi/10.1145/3240323.3240370">https://dl.acm.org/doi/10.11...

This paper has both a good summary of Batch Learning from Logged Bandit Feedback (BLBF) and a great list of references: http://www.cs.cornell.edu/people/tj/publications/wang_etal_19a.pdf">https://www.cs.cornell.edu/people/tj...

This paper has both a good summary of Batch Learning from Logged Bandit Feedback (BLBF) and a great list of references: https://www.cs.cornell.edu/people/tj...

And for more on counterfactual and off-policy methods, check out the proceedings of the past RecSys REVEAL workshops.

2019: https://sites.google.com/view/reveal2019/proceedings

2020:">https://sites.google.com/view/reve... https://sites.google.com/view/reveal2020/proceedings">https://sites.google.com/view/reve...

Latest Threads Unrolled: