💡How do you frame an ML Problem?

💡What is the ML Mindset?

Below some takeaways from the "Introduction to Machine Learning Problem Framing" course by @googledevs

#MachineLearning

🧵👇
In this thread, I will summarize some concepts that cover the following topics:

1⃣ Common ML Problems
2⃣ The ML Mindset
3⃣ Identifying Good Problems for ML
4⃣ Hard ML Problems
5⃣ Deciding on ML
6⃣ Formulate Your Problem as an ML Problem

👇
1⃣ Common ML Problems (1/4)

In simple terms, ML is the process of training a model to make useful predictions using a data-set.

This predictive model can then generate predictions about previously unseen data.

👇
1⃣ Common ML Problems (2/4)

Supervised learning is a type of ML where the model is provided with labelled training data.

Features are measurements or descriptions; the label is essentially the "answer."

👇
1⃣ Common ML Problems (3/4)

During training, the algorithm gradually determines the relationship between features and their corresponding labels.

Supervised machine learning finds patterns between data and labels that can be expressed mathematically as functions.

👇
1⃣ Common ML Problems (4/4)

In unsupervised learning, the goal is to identify meaningful patterns in the data.

To accomplish this, the machine must learn from an unlabeled data set.

👇
2⃣ The ML Mindset (1/2)

In software engineering, you can reason from requirements to a workable design, but with ML, it will be necessary to experiment to find a workable model.

👇
2⃣ The ML Mindset (2/2)

Get comfortable with some uncertainty.

Implementing ML is different than traditional programming.

It is helpful to think of the ML process as an experiment where we run test after test after test to converge on a workable model.

👇
3⃣ Identifying Good Problems for ML (1/7)

💡Clear Use Case

Start with the problem, not the solution. Make sure you aren't treating ML as a hammer for your problems.

Focus on problems that would be difficult to solve with traditional programming.

👇
3⃣ Identifying Good Problems for ML (2/7)

Think of ML as just one of the tools in your toolkit and only bring it out when appropriate.

Ask yourself the following questions:
1. What problem is my product facing?
2. Would it be a good problem for ML?

👇
3⃣ Identifying Good Problems for ML (3/7)

💡Know the Problem Before Focusing on the Data

Make sure you understand the problem clearly.

EDA can help you understand your data, but you can't yet claim that patterns you find generalize until you check them against unseen data.

👇
3⃣ Identifying Good Problems for ML (4/7)

Failure to check could lead you in the wrong direction or reinforce stereotypes or bias.

💡ML requires a lot of relevant data

How much is "a lot?" That depends on the problem, but more data typically improve your model.

👇
3⃣ Identifying Good Problems for ML (5/7)

💡Your features contain predictive power

You should not try to make ML do the hard work of discovering which features are relevant

If you try lots of features without a hypothesis, you'll falsely believe these are relevant signals

👇
3⃣ Identifying Good Problems for ML (6/7)

Predictions vs. Decisions

💡Aim to make decisions, not just predictions.

Meaning that your product should take action on the output of the model.

Make sure your predictions allow you to take useful action.

👇
3⃣ Identifying Good Problems for ML (7/7)

ML is better at making decisions than giving you insights.

If you have a lot of data and want to find out "interesting" things about it, statistical approaches make more sense.

👇
4⃣ Hard ML Problems (1/5)

💡Clustering

What does each cluster mean in an unsupervised learning problem?

Sometimes, it is challenging to determine what action to take based on the cluster.

👇
4⃣ Hard ML Problems (2/5)

💡Anomaly Detection

How do you decide what constitutes an anomaly to get labelled data?

One option is to define a heuristic and use it to label anomalies.

If your heuristics are sufficiently complicated, then it may be worth considering ML.

👇
4⃣ Hard ML Problems (3/5)

💡Causation

ML can identify correlations or connections between two or more things.

Determining causation (one event or factor causing another) is much harder.

You can't determine causation from only observational data.

👇
4⃣ Hard ML Problems (4/5)

💡No Existing Data

if you have no data to train a model, then ML cannot help you. Without data, try a different solution like a rule-based system.

Once you have training data, try to find patterns in it.

👇
4⃣ Hard ML Problems (5/5)

If there are no patterns or only trivial patterns, then ML probably will not provide value.

If there are many patterns and it is important to make accurate predictions, then using ML might be the right approach.

👇
5⃣ Deciding on ML (1/6)

💡Start Clearly and Simply

What would you like your ML model to do?

At this point, the statement can be qualitative, but make sure this captures your real goal, not an indirect goal.

👇
5⃣ Deciding on ML (2/6)

💡What is Your Ideal Outcome?

The ML model should produce a desirable outcome.

This outcome may be quite different from how you assess the model and its quality.

👇
5⃣ Deciding on ML (3/6)

💡Success and Failure Metrics

Quantify It.

How will you know if your system has succeeded or failed?

Your success and failure metrics should be phrased independently of your evaluation metrics.

Set your success metrics before you begin.

👇
5⃣ Deciding on ML (4/6)

Are the metrics measurable?

Watch for too little signal in your data, or data that isn’t predictive, to determine if your hypothesis might be wrong.

Failing fast will enable you to revise your hypothesis earlier in the process and prevent lost time.

👇
5⃣ Deciding on ML (5/6)

💡What Output Would You Like the ML Model to Produce?

The output must be quantifiable with a clear definition that the machine can produce.

If not, use a proxy label instead.

The output should Be connected to your Ideal outcome.

👇
5⃣ Deciding on ML (6/6)

💡Heuristics

Don't launch a fancy ML model that can't beat a heuristic.

Non-ML solutions can sometimes be simpler to maintain than ML solutions.

👇
6⃣ Formulate Your Problem as an ML Problem

1. Articulate your problem.
2. Start simple.
3. Identify Your Data Sources.
4. Design your data for the model.
5. Determine where data comes from.
6. Determine easily obtained inputs.
7. Ability to Learn.
8. Think About Potential Bias.
You can follow @hmatalonga.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: