Posts

Showing posts from September, 2018

Data Preprocessing for Machine learning in Python

Image
Data Preprocessing for Machine Learning in Python Data Preprocessing is a technique that is used to convert the raw data into a clean data set.  In simple words, pre-processing refers to the transformations applied to your data before feeding it to the algorithm.   Need of Data Preprocessing             QUALITY DATA : ( Low Quality of data  gives  Low Quality of mining  results                                 -  Quality decisions must be based on quality data       e.g., duplicate  or missing data may cause incorrect or even misleading statistics . Tasks of Data Preprocessing Different steps are involved for Data Preprocessing. These steps are described below - Data Cleaning This is the first step which is implemented in Data Preprocessing. In this step, the primary focus is on handling missing data, noisy data, detection, and removal of outliers, minimizing duplication and computed biases within the data. Data Integration This process is u