If you are sitting for a data analyst job, this is one of the most frequently asked data analyst interview questions.
Data cleansing primarily refers to the process of detecting and removing errors and inconsistencies from the data to improve data quality.
The best ways to clean data are:
- Segregating data, according to their respective attributes.
- Breaking large chunks of data into small datasets and then cleaning them.
- Analyzing the statistics of each data column.
- Creating a set of utility functions or scripts for dealing with common cleaning tasks.
- Keeping track of all the data cleansing operations to facilitate easy addition or removal from the datasets, if required.