At the beginning of this year I was starting to get into machine learning from web development. Data cleaning was one of the things which I found extremely difficult.

Here's how you can get started with data cleaning.
(so that you don't make the mistakes I did)

🧵👇
First of all what is data cleaning? 🤔

Data cleaning is the process of properly formatting your data before you feed it to your neural network. This is very important as there can be serious performance hits to the accuracy of your neural net if the data fed in is not right.
In the real world, data will be incredibly messy. It is your job to filter the data and format it the right way. This picture explains Data cleaning really well👇
So how do you get started with data cleaning?
You must know slightly advanced concepts, check out this thread for more info👇 https://twitter.com/PrasoonPratham/status/1313745702439153664?s=20
Now let's look at the libraries you must learn 👇

Pandas : Load data from files
Numpy : Modify Data loaded from Pandas
Matplotlib + Seaborn : Visualise Data
Where to learn them from?

FreeCodeCamp has you covered with this course
You can follow @PrasoonPratham.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: