Why You Need Data Preprocessing
Incomplete, noisy, and inconsistent data are commonplace properties of large real-world databases and data warehouses. Incomplete data can occur for a number of reasons. Attributes of interest may not always be available, such as customer information for sales transactions important at the time of entry. Relevant data may not be recorded due to a misunderstanding, or because of equipment malfunctions. Data that were inconsistent with other recorded data may have been deleted. Furthermore, recording of the history of modifications to the data may have been overlooked. Missing data, particularly for tuples with a missing value for some mining results. Therefore to improve the quality of data and, consequently, of the mining results, data preprocessing is needed.
OR,
By now, you’ve surely realized why your data preprocessing is so important. Since mistakes, redundancies, missing values, and inconsistencies all compromise the integrity of the set, you need to fix all those issues for a more accurate outcome. Imagine you are training a Machine Learning algorithm to deal with your customers’ purchases with a faulty dataset. Chances are that the system will develop biases and deviations that will produce a poor user experience.
Thus, before using that data for the purpose you want, you need it to be as organized and “clean” as possible. There are several ways to do so, depending on what kind of problem you’re tackling. Ideally, you’d use all of the following techniques to get a better data set.
Comments
Post a Comment