Hey let say you are in your office and your manager told you
CLEAN THAT DATA OUT!
and normally you didn't know what he really want to ask for so first let's discuss
WHAT IS DATA CLEANSING OR CLEANING ?
Data cleansing is a process of detecting and rectifying (or deleting) of untrustworthy, inaccurate or outdated information from a data set, archives, table, or database. It helps you to identify incomplete, incorrect, inaccurate or irrelevant parts of the data. Sometimes it also called Data Mining
It consist of different processes which includes:
- Parsing.
- correcting.
- standardizing.
- matching.
- consolidation.
- dealing with missing data.
- dealing with Incorrect & noise data.
These things people use to do with various tools but today we discuss applying these techniques using Microsoft Excel:
#1. First Name Extraction
we are going to use the feature of Flash Filling for this purpose. Just write first name of the person and excel will automatically suggest what you want to express then just press enter after it reflects the whole column.
This thing can also be achieve by Pressing CTRL + E
Similarly we can do this for Last name, Just enter the first letter and excel will do the rest some exception may occur while in the case where we have middle(eg. Wilone O'Kielt) we will do this manually
#2. Number to Percentage
let say we some range of values in our column and all we do is turn all those values into percentages
this can be achieve by 2 methods first one is little bit lengthy
1. All we have to do is click on some different cell write 100 there and copy it and then select the whole column and right click on column gives us right bar where we have to select paste special where we have to select Divide operation. Now everything is done we get each number in the term of fraction and afterwards all we need to apply those number into percentage which we get in Home Tab.
2.This steps 2 steps lets take this step more simpler all we gonna do is inspite of taking 100 write 1% and in the paste special section Run Multiply operation inspite of Divide which gives same result as above.
#3. Remove all blank rows
Sometimes we have some raw data having some Null values all we need to do is remove them from our Table all we gonna do is select entire Table and Press F5 which will give us to go to tool click Special.
Now next we select Blank Radio button which will highlight all blank cells/rows. Next step is deleting all highlighted cells, Pressing CTRL - (minus) will deleting each rows in our table after choosing entire row.
#4 Find common values in 2 list
Suppose we have 2 columns of Invoice Paid by Ross and Monika we have to find same/Duplicate values in both columns
In Home Tab go to Conditional Formatting select Highlight Cells Rules and select Duplicate values gives us all Duplicate values
#5 Clean-up ugly formats
Suppose we have some messed up/noise data. Clearing them is as easy as giving cookies to your dog
just go to Home Tab and in clear section press Clear Formats. Boom as simple and tidy.
Wooohooo Time comes for Bonus Trick
#Remove extra spaces(Using Trim Function)
Suppose we have some abrupt data with spaces in somewhere between them in that case all we have to do is select whole column/Table and drag it to an end while doing this select the column where we want it to enter and write the function TRIM() in the first row double tap on the edge of that row specify all column with the correct value.
#6. Unstack data
As we see Address is inappropriate way taking 3 columns for some data set i.e, street, city, state.
to overcome this situation we first design or specify the area where we have to enter the desired data in city column enter = and select the first item in our Address column afterwards do this for all columns and dragging down or double click will fill out all columns. Select all and CTRL + C will copy all values after that you need Right click them all so to fill assign them into numeric value
once this step is done Select all or CTRL+A and do the same trick we did before in the case of remove all blank cell step we only keep in mind inspite of selecting entire Row select shift cells up, Cause deleting entire row we will lose our whole table and we proceed the hole steps again.#7 Removing Duplicate combinations
This is a common trick but still many people doesn't know this trick where we remove the duplicate values available in our Table so all we gonna do is in our Data Tab select Remove Duplicates in the right end corner select in which columns we want to remove the duplicate value.
#8 Remove all Spotted Value
suppose we are in our office where we are shared some used data where someone as per their requirement spotted some color for example yellow then all we need to do is select whole table or CTRL + A and press CTRL + F we see find box appear in which we have to find format and afterwards we go to fill section and select the desired color as yellow. The Result will popup in our Result window where we have select all by pressing CTRL key and dragging down the cursor will select our entire values or rows afterwards select CTRL - will show us the option where we to choose delete entire row.
lets say we have some value like address where we have to edit the numeric value out and the remaining as usual so what we gonna do is repeat the steps we did in the first name extraction
column just write the value numeric value in another column and it will flash fill the numbers including the words which are written afterwards so deleting those words Excels will again reflects the desired result which is only numeric value.
#10 Get file name from full path
suppose we have some Drive path (for eg. C:\Users\user\OneDrive\Desktop) or email address given with semi colon(;) available in our Table or any column Select all and press CTRL + H, a Find and Replace tool popped up where we have to fill *\ remain empty our replace with space which will lead to delete our entire path delete except for desired path for example in our case only Desktop will remain.
So We are done with this but wait do you want another bonus Trick so here we have an another Bonus Trick.
#Bonus2 Dealing with Numbers Formatted as Text
whenever you enter numbers in a column excel treat it as a char for getting rid of this condition select all and press another cell with no value in it then Right click and select paste special where we have to select Add operation where excel will notice oh its a numeric value and it will add up all rows with our chosen empty cell which excel will treat as 0.
πΏπππππ & π½ππππππ...
0 comments:
Post a Comment