At the beginning of this year I was starting to get into machine learning from web development. Data cleaning was one of the things which I found extremely difficult.
Here's how you can get started with data cleaning.
(so that you don't make the mistakes I did)

Here's how you can get started with data cleaning.
(so that you don't make the mistakes I did)


First of all what is data cleaning? 
Data cleaning is the process of properly formatting your data before you feed it to your neural network. This is very important as there can be serious performance hits to the accuracy of your neural net if the data fed in is not right.

Data cleaning is the process of properly formatting your data before you feed it to your neural network. This is very important as there can be serious performance hits to the accuracy of your neural net if the data fed in is not right.
In the real world, data will be incredibly messy. It is your job to filter the data and format it the right way. This picture explains Data cleaning really well

So how do you get started with data cleaning?
You must know slightly advanced concepts, check out this thread for more info
https://twitter.com/PrasoonPratham/status/1313745702439153664?s=20
You must know slightly advanced concepts, check out this thread for more info

Now let's look at the libraries you must learn 
Pandas : Load data from files
Numpy : Modify Data loaded from Pandas
Matplotlib + Seaborn : Visualise Data

Pandas : Load data from files
Numpy : Modify Data loaded from Pandas
Matplotlib + Seaborn : Visualise Data
Where to learn them from?
FreeCodeCamp has you covered with this course
FreeCodeCamp has you covered with this course
Practising these skills on Kaggle is the next thing you have to do!
The Titanic dateset is the best place to start from. https://www.kaggle.com/c/titanic
The Titanic dateset is the best place to start from. https://www.kaggle.com/c/titanic