10 questions that spark conversations, make you think, and give you a solid foundation of practical Machine Learning.

🧵👇
(Some) interviews are broken.

They focus on trivia and expect candidates to recall concepts that aren't even relevant for the job.

This is garbage.

Instead, focus on problems that scientists and engineers face every day while doing their jobs: 👇
Acme Inc. is building a model to classify images in several different categories.

Unfortunately, they don't have a lot of images for some of the classes.

How would you handle such an imbalanced dataset?

(1 of 10)
Acme Inc. gives you access to the data and code used to train their model.

They have been using it for some time with mixed results. They suspect the model might be overfitting.

How do you find out whether this is the case and fix it?

(2 of 10)
Acme Inc. has a deep learning image classification model that performs very well with most images.

Unfortunately, this is not good enough when lives are at stake.

How do you determine the uncertainty of predictions from the model to reduce the mistakes?

(3 of 10)
Acme Inc. wants you to build a couple of models using a dataset they built over the last few years.

But it turns out that capturing a lot of sensor data at scale is a complicated task.

How do you handle missing or corrupted data in this dataset?

(4 of 10)
Acme Inc. is deploying a model to predict whether their equipment is about to break.

They'd like to minimize mistakes, especially false negatives, because their cost is prohibitive.

How would you design a system that minimizes Type II errors?

(5 of 10)
Over the last few years, Acme Inc. was able to collect a lot of data.

Unfortunately, labeling the data is very costly, and they would like you to help.

How would you label as much data as possible, minimizing the cost of doing so?

(6 of 10)
Acme Inc. is building a pipeline that goes all the way from training to deployment.

There's a single missing piece to complete the process:

How would you automatically determine whether the new model is better than the one in production?

(7 of 10)
Acme Inc. has accumulated some data that they want to use in a classification problem.

Before giving you a job, they would like to know which algorithm you'd use.

How would you approach the process of selecting a suitable algorithm?

(8 of 10)
The latest version of Acme Inc.'s model is showing a whooping 99% accuracy detecting fraudulent transactions.

In production, however, during a manual audit, they found out that the model is not catching anything relevant.

How would you tackle this problem?

(9 of 10)
THIS PAGE INTENTIONALLY LEFT BLANK

(10 of 10)
This is the second time that Acme Inc.'s algorithms are significantly off predicting the election results.

They would like to try something new. They would like to predict the outcoming election using Twitter data.

How would you design this solution?

(11 of 10)
You can follow @svpino.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: