In this, we will understand what data cleaning is and why we used to need it in machine learning algorithms?


Data Cleaning:

Data Cleaning is a process of identifying the incorrect, incomplete, inaccurate, irrelevant, or missing parts of the data and then modifying, replacing, or deleting them according to necessity. 

Why accurate data is important?

Data is the most valuable thing for Analytics and Machine learning. In computing or Business data is needed everywhere. When it comes to real-world data, it is not improbable that data may contain incomplete, inconsistent or missing values. If the data is corrupted then it may hinder the process or provide inaccurate results.

Let’s see some examples of the importance of data cleaning.

For example, you are a general manager of a company. Your company collects data from different customers who buy products produced by your company. Now you want to know which products people are interested in most and according to that, you want to increase the production of that product. But if the data is corrupted or contains missing values then you will be misguided to make the correct decision and you will be in trouble.

In machine learning, if the data is irrelevant or error-prone then it leads to an incorrect model building.

As much as you make your data clean, as much as you can make a better model. So, we need to process or clean the data before using it. Without the quality data, it would be foolish to expect anything good outcome.

Posted 
May 14, 2022
 in 
Machine Learning
 category

More from 

Machine Learning

 category

View All