As a researcher, I used to think that beating state of the art is the most important thing in machine learning.

Now, with a bit more experience, I know that I was wrong.

This is a thread about a few other critical points that are often not considered quite enough!
1️⃣ Data engineering.

As the saying goes: garbage in, garbage out. Data is more important than the model itself!

➡️ Does the training data reflect the problem well?
➡️ Is it balanced?
➡️ Does it contain bias?
➡️ Do you have a good test set?
2️⃣ User-friendly interface.

Models are trained to be used by other developers and services.

➡️ Does it have a well-documented API?
➡️ Is it straightforward to deploy and use?
3️⃣ Deployment to production.

In software engineering, the faster you can push changes to production, the faster you can move. This is true machine learning for as well.

➡️ Do you have a robust and automated deployment pipeline?
➡️ Can you handle if a new model performs worse?
4️⃣ Testing and monitoring.

Testing is not a one and done thing. Performance should be monitored constantly. Occasionally, the data coming from production can slowly drift from the training data.

➡️ Is your infrastructure ready to spot these?
This is just the tip of the iceberg. There is so much beyond training a model!

What are the key questions you ask when deploying models to production?
You can follow @TivadarDanka.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: