I asked data scientists what challenges they were facing in 2020 and the resounding answer was how difficult it was to deploy models to production. I thought writing 3-4 blog posts would solve it.
I'm 8 posts in and still so much to say. Here's a breakdown of each post
I'm 8 posts in and still so much to say. Here's a breakdown of each post

Part 1 - What does it even mean to "deploy a model?" How does deployment fit into the machine learning process? What factors should you take into consideration when deciding how to deploy? https://mlinproduction.com/what-does-it-mean-to-deploy-a-machine-learning-model-deployment-series-01/
Part 2 - Deployment is considerably easier when you're working with the right interfaces. Doubly important when you're using models across different frameworks and languages. So what's the right interface to make deployment easier? https://mlinproduction.com/software-interfaces-for-machine-learning-deployment-deployment-series-02/
Part 3 - If you can precompute and cache predictions in batch, DO IT! It's much easier than deploying and maintaining APIs and other near real time infrastructure. Here's how to do batch inference. https://mlinproduction.com/batch-inference-for-machine-learning-deployment-deployment-series-03/
Part 4 - But when you need predictions in real time, you need online inference. There are many gotchas in online inference: you need to query data from multiple sources in real time, you'll need A/B testing, you need rollout strategies... https://mlinproduction.com/the-challenges-of-online-inference-deployment-series-04/
Part 5 - If after learning about those challenges you decide you still need online inference, bless your heart. There are a lot of posts on Flask APIs, but that's the easiest part. You need versioning, autoscaling, and the ability to A/B test models. https://mlinproduction.com/online-inference-for-ml-deployment-deployment-series-05/
Part 6 - Where do you store all these trained models? Where do you track metadata and lineage? How do you retrieve models at inference time? That's where you'll need a model registry. https://mlinproduction.com/model-registries-for-ml-deployment-deployment-series-06/
Part 7 - It's not enough to use aggregate metrics to understand model performance. You need to know how the model does on subslices of data. You need machine learning unit tests. https://mlinproduction.com/testing-machine-learning-models-deployment-series-07/
Part 8 - Just because a model passes its unit tests, doesn't mean it will move the product metrics. The only way to establish causality is through online validation. Like any other feature, models need to be A/B tested https://mlinproduction.com/ab-test-ml-models-deployment-series-08/
I'm planning on writing a few more posts. Next one will be on rollout strategies (dark mode vs canary vs blue green). But if there's something you think I missed, shout it out!