

Below some takeaways from the "Introduction to Machine Learning Problem Framing" course by @googledevs
#MachineLearning


In this thread, I will summarize some concepts that cover the following topics:
Common ML Problems
The ML Mindset
Identifying Good Problems for ML
Hard ML Problems
Deciding on ML
Formulate Your Problem as an ML Problem








In simple terms, ML is the process of training a model to make useful predictions using a data-set.
This predictive model can then generate predictions about previously unseen data.


Supervised learning is a type of ML where the model is provided with labelled training data.
Features are measurements or descriptions; the label is essentially the "answer."


During training, the algorithm gradually determines the relationship between features and their corresponding labels.
Supervised machine learning finds patterns between data and labels that can be expressed mathematically as functions.


In unsupervised learning, the goal is to identify meaningful patterns in the data.
To accomplish this, the machine must learn from an unlabeled data set.


In software engineering, you can reason from requirements to a workable design, but with ML, it will be necessary to experiment to find a workable model.


Get comfortable with some uncertainty.
Implementing ML is different than traditional programming.
It is helpful to think of the ML process as an experiment where we run test after test after test to converge on a workable model.



Start with the problem, not the solution. Make sure you aren't treating ML as a hammer for your problems.
Focus on problems that would be difficult to solve with traditional programming.


Think of ML as just one of the tools in your toolkit and only bring it out when appropriate.
Ask yourself the following questions:
1. What problem is my product facing?
2. Would it be a good problem for ML?



Make sure you understand the problem clearly.
EDA can help you understand your data, but you can't yet claim that patterns you find generalize until you check them against unseen data.


Failure to check could lead you in the wrong direction or reinforce stereotypes or bias.

How much is "a lot?" That depends on the problem, but more data typically improve your model.



You should not try to make ML do the hard work of discovering which features are relevant
If you try lots of features without a hypothesis, you'll falsely believe these are relevant signals


Predictions vs. Decisions

Meaning that your product should take action on the output of the model.
Make sure your predictions allow you to take useful action.


ML is better at making decisions than giving you insights.
If you have a lot of data and want to find out "interesting" things about it, statistical approaches make more sense.



What does each cluster mean in an unsupervised learning problem?
Sometimes, it is challenging to determine what action to take based on the cluster.



How do you decide what constitutes an anomaly to get labelled data?
One option is to define a heuristic and use it to label anomalies.
If your heuristics are sufficiently complicated, then it may be worth considering ML.



ML can identify correlations or connections between two or more things.
Determining causation (one event or factor causing another) is much harder.
You can't determine causation from only observational data.



if you have no data to train a model, then ML cannot help you. Without data, try a different solution like a rule-based system.
Once you have training data, try to find patterns in it.


If there are no patterns or only trivial patterns, then ML probably will not provide value.
If there are many patterns and it is important to make accurate predictions, then using ML might be the right approach.



What would you like your ML model to do?
At this point, the statement can be qualitative, but make sure this captures your real goal, not an indirect goal.



The ML model should produce a desirable outcome.
This outcome may be quite different from how you assess the model and its quality.



Quantify It.
How will you know if your system has succeeded or failed?
Your success and failure metrics should be phrased independently of your evaluation metrics.
Set your success metrics before you begin.


Are the metrics measurable?
Watch for too little signal in your data, or data that isn’t predictive, to determine if your hypothesis might be wrong.
Failing fast will enable you to revise your hypothesis earlier in the process and prevent lost time.



The output must be quantifiable with a clear definition that the machine can produce.
If not, use a proxy label instead.
The output should Be connected to your Ideal outcome.



Don't launch a fancy ML model that can't beat a heuristic.
Non-ML solutions can sometimes be simpler to maintain than ML solutions.


1. Articulate your problem.
2. Start simple.
3. Identify Your Data Sources.
4. Design your data for the model.
5. Determine where data comes from.
6. Determine easily obtained inputs.
7. Ability to Learn.
8. Think About Potential Bias.
This thread got incredibly long
I hope it is helpful in some way. All the material is sourced from the course: https://developers.google.com/machine-learning/problem-framing
