In a spreadsheet containing a lot of data, you may find that some of the data are repeated across the spreadsheet. In such a case, you will be interested in removing the duplicate entries. If the number of entries are just a few, you could possibly go through the worksheet and manually delete all the duplicates. But in the case of a large volume of entries in the spreadsheet, it becomes impractical to remove the duplicates manually.
In such instances, you will have to employ some features in Excel to remove the duplicates. A more specialized case is where you are only interested in removing data entries based on the repeated occurrence of a particular information in a single or multiple columns.
Let’s take a look at the data below.
In this example, you find out that we have a number of people who were born at the same place. So, assuming you are only interested in getting only one person born from each place. This means that we have to remove all entries that have the place of birth to be Accra.
In this example, therefore, you are looking to remove duplicates from the spreadsheet based on just a single column.
Here are a few methods that you can use to achieve this.
- Using the remove duplicates feature
How to remove duplicates in excel based on one column using the remove duplicate feature
Follow the steps below to remove duplicates from the spreadsheet based on the values in a particular column.
- Using the sample data given above, the column of interest will be the column B
- Select all the data on the spreadsheet
- Click on the Data tab
- Look for the Data tools group on the ribbon towards the right side
- Click on Remove Duplicates
- The Remove Duplicates dialog box opens up
- By default select all is selected and hence all columns that are included in your data selection will be automatically selected
- You should uncheck the ones that you are not interested in. If there are a lot of them, you can use the “Unselect All” button
- Check the column you want
- In our example, you have to go ahead and use the column “Place of Birth”
- So, go ahead and uncheck all the other columns
- Click OK
- You get an alert box giving information on the number of duplicates that were detected and removed
- Click OK to continue
How to remove duplicates in excel using COUNTIF function
Even though using the first option as discussed above is way easier, you can also use the COUNTIF function in excel to remove duplicates. As such, if you really want to master Excel and for that matter want to learn an extra way to remove duplicates then you can go ahead with this.
First of all, you have to understand how the COUNTIF function works. As the name of the function suggests, it is going to count the number of occurrences if a certain condition is met.
How then are we going to use the COUNTIF function to remove duplicates you may ask.
It’s simple. With the help of the COUNTIF function, you are going to count the number of times a particular data occurs in the column of interest. You will then apply filters to separate the ones that occur more than once. Then go ahead to delete the excesses.
How to use COUNTIF function
The COUNTIF function takes in two parameters. COUNTIF(range, criteria for counting). The range specifies all the cells that it has to go through to check if the condition (criteria) exists and do the counting for you.
In Excel, you indicate a range by writing the first cell followed by a column and then the last cell. For example, A2:A8 in Excel indicates that, start from A2 through to A8. So, in such a case, it will check them in the order A2, A3, A4, A5, A6, A7 and end at A8.
Using the range above, you can write the formula as COUNTIF(A2:A8, A2). This means that, go through all the cells from A2 to A8 and count how many times the value in A2 appears.
To be able to use the COUNTIF function to remove duplicates, we need to understand the concept of absolute referencing in Excel. A dollar sign ($) is used to indicate absolute referencing. It simply means that the reference cell used in the formula should change when the formula is applied to other cells.
So, for instance, if you write a formula that contains $A$2 in one cell (say, Cell A5), you can apply the same formula in another cell (say, Cell B5) and it will still use the initial cell A2 and not B2.
Read more on absolute referencing in Excel here
Using COUNTIF to count duplicate entries
- Let’s create a new column in our spreadsheet and call it count to keep the number of times an item appears
- Since Column B is the column of interest in our sample data that will be the condition for our COUNTIF function. This means that for each cell in the Column B, the function will check how many times the value in that cell has already occurred.
- Note that we are interested in knowing the number of times a particular data has appeared so far and hence it’s going to be a cumulative value in each duplicated cell and not the total number of times an item appears in the whole document.
- This means that we have to write the formula such that at each cell it is going to count the number of times the value in the column of interest has been counted so far and indicate the value.
- So, if it is the second time, it will write 2 in our count column and the next one will write 3 and the rest will follow in that order.
- For items that are appearing for the first time, the count value will be one.
This can be achieved with the help of absolute referencing. So, let’s use the same sample excel spreadsheet we used earlier. The column of interest is Column B. Let’s write the formula for just the first row and apply the same formula to all other rows.
For the first row, our range will be $B$2:$B2. This means check from cell B2 to B2. When you apply this formula to the next row, it is going to become $B$2:$B3 which will also mean checking from cell B2 to B3. The next becomes from B2 to B4 and in that order till you get to the last cell.
So the formula for the first row will be COUNTIF($B$2:$B2, $B2). The implication of this formula is that, check the number of times the value in cell B2 occurs from cell B2 to B2. When you apply the same formula to the next row just by dragging the count cell with the formula across to the next row, it becomes COUNTIF($B$2:$B3, $B3).
The starting point doesn’t change because we are using absolute referencing with the aid of the dollar sign ($). Hence, the next count cell will count the number of times the value in cell B3 appears from B2 to B3.
Now when you drag it across all the cells in the count column, it will automatically apply the formula to all of them and calculate the number of times each value has already occurred.
As you can see, the number of times Accra appears has been accurately determined. Now the next thing to do is to use filters to remove all the count values that are greater than 1 since those will be counted as the duplicates based on column B.
- Select all the dataset on the spreadsheet. You can easily do that by using the keyboard shortcut CTRL+A
- Click on the Data tab
- Look for sort and filter group on the ribbon
- There is a funnel icon as part of the tools in the sort and filter group with the label filter beneath it
- Click on the filter icon
- This brings up arrows besides the each column heading
- Click on the dropdown icon beside the count column heading
- From the list on the dropdown menu, click on Number filters
- Another dropdown pop ups
- Click on Greater than
- Since we are interested in duplicates meaning any count value greater than 1 so go ahead and type 1 into the textbox beside the is greater than textbox
- Then click OK
- You will now find only the dataset which has count values of greater than 1
- Right click on the selected dataset
- Click on delete row
- Confirm the deletion of the entire row by clicking OK
- All duplicates are now removed