[2/25]

Machine learning beginners can sometimes be a little confused about the difference between regression and classification.

Even if you have a rough understanding of the difference, it can help to think more deeply about how they both work.

#datascience #machinelearning
[3/25]

First, let’s review what machine learning is … this will help frame up our discussion of regression and classification.

#datascience #data #machinelearning
[4/25]

Machine learning is a process that enables computers to improve performance on a task as they are exposed to some experience.

#datascience #data #machinelearning
[5/25]

So when we do machine learning, there are different parts of the process:

– the task
– the algorithm
– the experience (i.e., the training data)
– the performance measure

#datascience #data #machinelearning
[6/25]

Why does this matter?

Because regression and classification are types of machine learning *tasks*

#datascience #data #machinelearning
[7/25]

To be clear: there are many other types of machine learning tasks, like:

– machine translation
– denoising
– imputation of missing values
– anomaly detection

And many others.

But regression and classification are two of the most common.

#datascience #machinelearning
[8/25]

The point is, regression systems and classification system have different purposes in machine learning.

They do different things.

And that being the case, one way to distinguish between regression and classification is by looking at the *output* of these systems.
[9/25]

So as tasks, what are we trying to do in regression and classification?

Speaking roughly:
– in regression we’re trying to predict a *number*
– in classification, we’re trying to predict a *class*

#machinelearning #datascience #data
[10/25]

So in regression systems, the output is typically a numeric value.

#datascience #data #machinelearning
[11/25]

Some regression examples are things like:
– predicting housing prices
– predicting a person’s salary
– predicting a person’s weight

#datascience #data #machinelearning
[12/25]

And some classification examples are things like:
– categorizing cats vs dogs
– identifying “spam” email vs “ham” email (bad email vs good email)
– predicting if a person will buy a product (buyer vs non-buyer)

#datascience #data #machinelearning
[13/25]

So roughly, you can think of it this way:

– If the output variable is numeric then it’s a regression problem

– If the output variable is categorical then it’s a classification problem

#datascience #data #machinelearning
[14/25]

So you can think of regression vs classification in terms of the *outputs*

But you can also think of regression vs classification in terms of the *inputs*

(meaning the input data)

#datascience #data #machinelearning
[15/25]

First of all, remember that regression and classification are typically categorized as types of supervised learning.

#datascience #data #machinelearning
[16/25]

In supervised learning, the input data has a so-called “target” variable that we’re trying to predict.

During the model building process, this variable “supervises” the process.

#datascience #data #machinelearning
[18/25]

So for both regression and classification, the input data (i.e., training data) has a Y variable that is populated with values.

But the format of those values is different.

#datascience #data #machinelearning
[19/25]

In regression, the values in the Y variable are *numeric*

In classification, the values in the Y variable are *categorical*

#datascience #data #machinelearning
[20/25]

This is true for the input data ….

But again, it’s also true for the outputs.

When we make predictions with these systems, regression predicts a numeric value and classification predicts a categorical value.

#datascience #data #machinelearning
[21/25]

So regression and classification are different in terms of the training data that we use to build the systems …

And also the output that these systems produce.

#datascience #data #machinelearning
[22/25]

Having said all of this, it’s not always completely clear cut.

Sometimes the line between regression and classification is blurry.

#datascience #data #machinelearning
[23/25]

For example, logistic regression is a technique that we typically use for *classification*

… but technically, it’s a regression technique.

So the distinction between regression and classification is not always clear.

#datascience #data #machinelearning
[25/25]

And if you want more machine learning explanations, follow me here on Twitter: @Josh_Ebner

Almost every day, I post threads to explain machine learning and data science in a way that makes them clear and easy to understand.

#datascience #data #machinelearning
You can follow @Josh_Ebner.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: