UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Duplicate Values and Unique Values    
Duplicate Values and Unique Values

Duplicate Values and Unique Values in Excel

There are a number applications where identifying or removing duplicate values is needed.

The first question is, “What is a duplicate?” For example, if the letter A appears twice in a column of data does it count as two duplicates or an original and one duplicate?

The second question is, “What do you want to do with the duplicates?” Do you want to remove them so you have a unique list, or do you want to keep them and identify them?

If you run a search engine inquirery on “Excel Duplicate Values” or “Excel Unique Values” you will get instructions on how to use the Advanced Filter or the Remove Duplicates command in the Data Tools Group on the Data ribbon. These are manual processes and they work.

However sometimes you want to accomplish these steps with formulas so they happen automatically.

There are five ways to identify duplicates. For the sake of discussion, it is assumed that Column A and / or Column B contain the data to be checked for duplicates.

Contents

Method 1 – By Sorting

The first method is to sort the data in Column A and check each cell with the cell above it. If the value above is equal to the current value then the current value is a duplicate. For example if A3 = A2, then A3 is a duplicate of A2. This method has one advantage, the original (unique value) and the duplicates are shown together. However this advantage is offset by the need to sort the data.

Method 2 – By Counting

The second method is to use a helper column to test =COUNTIF(A:A,A2). This formula yields the number of times the value in Cell A2 is found in the column. Filtering on the results = 1 identifies unique values, but if a value has a count > 1, this method does not acknowledge that one of them is unique. At best, this method tells you what cells do not have a duplicate.

Method 3 – By Matching

The third method is to use a helper column to test =MATCH(A2,A:A,0) = ROW(). This expression evaluates to TRUE for the first occurrence of a value in column A, and FALSE if the value has appeared before. This formula could be expanded to give you more information:

=IF(MATCH(A2,A:A,0)=ROW()=TRUE,"Unique","Duplicate of row" & MATCH(A2,A:A,0))

This formula yields “Unique” for the first occurrence of the value or it identifies the row on where the value first appeared.

If you just need a count of unique values, then you do not need a helper column. The following formula can be used:

SUMPRODUCT ((MATCH(Range,Column,0) = ROW(RANGE)) * 1)

For example

SUMPRODUCT ((MATCH(A2:A12, A:A, 0) = ROW (A2:A12)) * 1)  

Method 4 – By Pivot Table

The fourth method is even easier

The quickest means of getting a unique list of values is to use a pivot table. The pivot table has the additional value of being able to filter the data and sort it with a simple right click of the mouse.

Method 5 - By Array Formula

From office.microsoft.com

Count Numbers (Contiguous Range)

=SUM(IF(FREQUENCY(Range,Range)>0,1)) Count the number of unique number values in Range, but do not count blank cells or text values.

Count Numbers and Text (Contiguous Range)

=SUM(IF(FREQUENCY(MATCH(Range,Range,0),MATCH(Range,Range,0))>0,1)) Count the number of unique text and number values in Range (which must not contain blank cells.

Count Numbers and Text (Non-contiguous Range)

=SUM(IF(FREQUENCY(IF(LEN(Range)>0,MATCH(Range,Range,0),""), IF(LEN(Range)>0,MATCH(Range,Range,0),""))>0,1)) Count the number of unique text and number values in Range, but do not count blank cells.


Attached is a spreadsheet showing the 5 methods. Media:Sample_Unique.zip

Edit Discussion
Custom Search
Thank you for your support!
This page has been accessed 5,476 times.  This page was last modified 14:15, 28 September 2013 by dflak.   Disclaimers