What's new
USCHO Fan Forum

This is a sample guest message. Register a free account today to become a member! Once signed in, you'll be able to participate on this site by adding your own topics and posts, as well as connect with other members through your own private inbox!

  • The USCHO Fan Forum has migrated to a new plaform, xenForo. Most of the function of the forum should work in familiar ways. Please note that you can switch between light and dark modes by clicking on the gear icon in the upper right of the main menu bar. We are hoping that this new platform will prove to be faster and more reliable. Please feel free to explore its features.

Excel question

Slap Shot

I got nothing
If I have a spreadsheet (Office version 2003) with say 10,000 records, many of which have dupcliated data in every column, other than sorting and manually deleting them, is there a systematic way to delete duplicate rows?

[edit] And if not via Excel what if I converted it to Access?

Thanks.
 
Last edited:
Re: Excel question

nm - I figured it out as long as it's sorted and the duplicates are next to each other: =IF(B2=B3,1,0)

All unique records are assinged a '0' and all the dupes are assinged a '1' as long as I drag the formula throughout the unique column. I then sorted that column and was able to delete all rows with a '1'.
 
Re: Excel question

If I have a spreadsheet (Office version 2003) with say 10,000 records, many of which have dupcliated data in every column, other than sorting and manually deleting them, is there a systematic way to delete duplicate rows?

[edit] And if not via Excel what if I converted it to Access?

Thanks.

There is simple formula's you can use to de-dup a spreadsheet as well.

First you would want to sort your data so that the duplicates end up next to each other.

So maybe you sort by the first 3 columns: First name, Last name, Account Number (example).

Then you can add a new column to the end of your data called DUPS that will compare the rows to see if that data matches.

=IF(CONCATENATE(A2,B2,C2)=CONCATENATE(A1,B1,C1),"dup","")

That formula only checks in a single direction, but would catch all of them in the example situation.

Once you have all the DUPS identified you can simply filter on the DUP row to only show ones that say "dup". Then delete all of those rows. Once you unfilter your data you should only have the unique rows you want.

Edit: Yep you figured out the same basic principle. :)
 
Re: Excel question

Doing it that way will filter the data.... But the data will still be there unless you delete it.

The old school way of doing it would be to add a column and create an if statement. In that new column you would create an if statement in the 2nd to the top row. Something like this:

=IF(B7=B8,"yes","no")

B7 & B8 are the cell you are comparing, so it would obviously be different on your spreadsheet. But in this sample you testing to see if the value in B8 equals the value above it. If the value is the same it puts "yes" in the cell, if it does not match it puts "no" in that cell.

Once you have the first formula created, copy it down your entire list. Then you will have a bunch of yes and no's listed in that column. I would then highlight the entire column, copy it, then go to your edit menu and select past special. Then only select values. It will paste the yes and no's over themselves but the formulas will be gone. Then just sort that column and you can organize all of the yes's and no's and delete which ever you want in bulk delete. In this case you want to remove the duplicates, so you would delete all of the yes's.

Make sense?


That is the complete old school way of how I would do it.
 
Re: Excel question

Jup and JF - thanks for the posts and that is exactly what I did, although it took me some trial and error because I first sorted without pasting the special values and the formula obviously no longer worked since the records by row had changed.

Thanks guys!
 
Back
Top