Multi-approaches on scrubbing data for medium-sized enterprises

Faiz, T (2019) Multi-approaches on scrubbing data for medium-sized enterprises. In: 2019 International Conference on Digitization (ICD), 18-19 November 2019, Sharjah, United Arab Emirates.

[thumbnail of 53.pdf] Text
53.pdf - Published Version
Restricted to Registered users only

Download (4MB) | Request a copy


Tidy and fit for purpose data are the prerequisite for analyzing data and for guaranteeing good business decisions. Data Scrubbing or data cleaning is the process of identifying errors and inconsistencies in the data and fixing these errors before analyzing the data. Organization's decisions rely on Data Quality which makes data scrubbing a very important step towards their productivity. Untidy data includes; importing data from multiple sources, missing values or corrupt records, data types mismatch, special character removal or discarding duplicates. Current research is lacking the latest data scrubbing techniques practiced by the medium sized enterprises. This article highlights possible data errors, literature review, and data science project life cycle. The document explains how to clean data using Python libraries for exploratory data analysis such as Pandas, NumPy, Scikit- Learn and libraries for data visualization for example matplotlib, Seaborn, and Plotly.

Affiliation: Skyline University College
SUC Author(s): Faiz, T
All Author(s): Faiz, T
Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Data Scrubbing, Data Cleaning, Data Cleansing, Exploratory data analysis, Python – Data Cleaning, Data Quality, Pandas Library, Data Pre-processing, Data transformation
Subjects: B Information Technology > BQ Data Analytics
B Information Technology > BT Data Management
Divisions: Skyline University College > School of IT
Depositing User: Mr Veeramani Rasu
Date Deposited: 28 Nov 2021 09:07
Last Modified: 28 Nov 2021 09:07
Publisher URL:
Publisher OA policy:
Related URLs:

Actions (login required)

View Item
View Item
Statistics for SkyRep ePrint 55 Statistics for this ePrint Item