UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> I'm after an easy way to delete/remove duplicate records?    
 
   
gsmcellular
post Aug 22 2006, 03:26 PM
Post#1



Posts: 9
Joined: 19-July 06



Hi,
I've got an email address database with several thousand addresses within it.
It only has one column which houses the address, no primary key or anything.
I was just wondering, is there an easy way to remove duplicates with a query etc.
I don't mind if I have to export the data again and rebuild it, or if there is a way I can get the query to delete the dupes for me.
Can anyone help.
Regards,
Graeme
Go to the top of the page
 
Nwulf
post Aug 22 2006, 03:36 PM
Post#2



Posts: 814
Joined: 19-June 01
From: MN


Create a make table query to a new table. In that query, do a group by and it will not show the duplocates. Run the query and after it is ran, delete your "primary" email address table and then copy in the new on from the newly created table.
would backup the email address table before deleting anything out first...you never know what could happen sad.gif
Nick
Go to the top of the page
 
gsmcellular
post Aug 22 2006, 03:57 PM
Post#3



Posts: 9
Joined: 19-July 06



Hi,
I've got as far as making a query to new table query, but how do I do the group by thing? Do I put that in criteria, or is it a seperate thing like an expression or something?
Cheers,
Graeme
Go to the top of the page
 
vandensype
post Aug 22 2006, 03:59 PM
Post#4



Posts: 203
Joined: 27-June 05



Hit the group by (sigma symbo) button on the toolbar.
Go to the top of the page
 
truittb
post Aug 22 2006, 07:02 PM
Post#5


Retired Moderator
Posts: 13,563
Joined: 23-June 02
From: Texas (Is there anywhere else?)


Add an Autonumber field and name it ID to the table and save it.
When create this query using your table name and field name
SELECT *
FROM YourTable
WHERE ID Not In (SELECT Max(ID) FROM YourTable GROUP BY AddressField)
Run it as a Select query to be sure you are returning only the duplicates. When you are satisified, change it to a Delete query. You should have a backup of the table before you delete anything.
Go to the top of the page
 
GroverParkGeorge
post Aug 22 2006, 07:56 PM
Post#6


UA Admin
Posts: 31,244
Joined: 20-June 02
From: Newcastle, WA


This query will remove only the 2nd (or greater) instance of duplicate valued records in a table? Why didn't I know that before?
Edited by: GroverParkGeorge on Tue Aug 22 20:57:36 EDT 2006.
Go to the top of the page
 
truittb
post Aug 22 2006, 08:53 PM
Post#7


Retired Moderator
Posts: 13,563
Joined: 23-June 02
From: Texas (Is there anywhere else?)


Isn't SQL great. o! laugh.gif
Go to the top of the page
 
GroverParkGeorge
post Aug 22 2006, 10:01 PM
Post#8


UA Admin
Posts: 31,244
Joined: 20-June 02
From: Newcastle, WA


Way cool. It's so obvious looking at your SQL, but it never occurred to me before. I've literally wasted two hours today de-duping a recordset the old-fashioned way. Where do I sign up for some of them smart pills?
eorge
Go to the top of the page
 
GroverParkGeorge
post Aug 23 2006, 01:14 PM
Post#9


UA Admin
Posts: 31,244
Joined: 20-June 02
From: Newcastle, WA


One more piece of feedback. On a table with 8700 +/- records, the SQL runs fairly slowly (60 second minimum), but it does return each redundant instance of the duplicated addresses. Very cool.
nfortunately, I can't just blow out these records because there are other factors that i have to account for, but once I figure out how to sort the recordset and create the IDs so that I get the record(s) I want to keep in the Max position, this is going to shave hours off my job over the next few weeks as we update 20+ groups of similar or larger size.
notworthy.gif notworthy.gif
George
Go to the top of the page
 
truittb
post Aug 23 2006, 03:49 PM
Post#10


Retired Moderator
Posts: 13,563
Joined: 23-June 02
From: Texas (Is there anywhere else?)


Subqueries are slow, but in this case it is usually faster than temp tables etc.
Glad I could help make the light blub come on. lightbulb.gif
Go to the top of the page
 
Ender
post Aug 23 2006, 04:03 PM
Post#11


UtterAccess VIP
Posts: 1,283
Joined: 11-February 04
From: AZ


George if you want speed, use truitt's example, but get rid of the not in (that's what's slowing it down) and use a left join against the group by query.
Go to the top of the page
 
GroverParkGeorge
post Aug 23 2006, 05:18 PM
Post#12


UA Admin
Posts: 31,244
Joined: 20-June 02
From: Newcastle, WA


I don't think that would work. My first test shows the result is that EVERY record in the table is returned if I remove the "NOT". That means I'd delete EVERY record, not duplicates.

I'll defer to Truitt on this one, but it sure looks to me like the only valid approach is to use "NOT" in the subquery.


George
Edited by: GroverParkGeorge on Wed Aug 23 18:42:28 EDT 2006.
Go to the top of the page
 
Ender
post Aug 23 2006, 06:56 PM
Post#13


UtterAccess VIP
Posts: 1,283
Joined: 11-February 04
From: AZ


You don't just remove the not you remove the whole not in and do a left join. This means you also end up without a subquery since you left join against that query.
For example:
qry_max = SELECT Max(ID) as max_id FROM YourTable GROUP BY AddressField
SELECT *
FROM YourTable left join qry_max on YourTable.id = max_id
where qry_max.max_id is null
You can of course defer to truitt, but I almost always use a left join instead of a not in because the not in command is so inefficient.
Go to the top of the page
 
GroverParkGeorge
post Aug 23 2006, 07:19 PM
Post#14


UA Admin
Posts: 31,244
Joined: 20-June 02
From: Newcastle, WA


Okay, now I understand what you meant to say.
Thank you for clarifying. I'll give that a try as well.
George
Go to the top of the page
 
GroverParkGeorge
post Aug 25 2006, 11:52 AM
Post#15


UA Admin
Posts: 31,244
Joined: 20-June 02
From: Newcastle, WA


Works a treat. I figured out that I can actually use the base query in various flavors to match as few or as many fields as I want:
UPDATE tmpMemberFile SET tmpMemberFile .MemberStatusID = 9
WHERE (((tmpMemberFile .MemberID) Not In (SELECT MAX(MemberID) FROM tmpMemberFile GROUP BY [Address1] & " " & [LastName] & " " & [FirstName] )));
Extra Parentheses courtesy of the Access Query Grid)
UPDATE tmpMemberFile SET tmpMemberFile .MemberStatusID = 9
WHERE (((tmpMemberFile .MemberID) Not In (SELECT MAX(MemberID) FROM tmpMemberFile GROUP BY [Address1] & " " & [LastName] )));
And so on.
The first version matches both first and last names, the second only last names, meaning I can exclude spouses who reside at the same address.
I've not tried Ender's suggestion yet, but I'm going there next.
Go to the top of the page
 
truittb
post Aug 25 2006, 11:57 AM
Post#16


Retired Moderator
Posts: 13,563
Joined: 23-June 02
From: Texas (Is there anywhere else?)


Glad to help.
Go to the top of the page
 
Ender
post Aug 25 2006, 03:14 PM
Post#17


UtterAccess VIP
Posts: 1,283
Joined: 11-February 04
From: AZ


My different tack on truitt's solution is only worth exploring if you are waiting excessively on the "not in". If it's running fine then no worries, unless you've got the time and just want to test it. In which case, I'd be interested in your results. Personally I prefer the not in because it is easy to see what is happening just by looking at the query. This makes long term maintenance easier. It's only when I have really large data sets and excessive wait times that I go with the left join.
Go to the top of the page
 
GroverParkGeorge
post Aug 28 2006, 01:03 PM
Post#18


UA Admin
Posts: 31,244
Joined: 20-June 02
From: Newcastle, WA


Sorry to be so slow getting back to you on this. Work keeps interfering with other work so to speak.
It's starting to look like your left join technique is going to be significantly quicker, even though, as you point out, it takes a bit more to set up in the first place.
More definite feedback later, after I test a bit more.
George
Go to the top of the page
 
GroverParkGeorge
post Aug 29 2006, 11:40 AM
Post#19


UA Admin
Posts: 31,244
Joined: 20-June 02
From: Newcastle, WA


Okay, the left join technique runs faster, hands down. On recrodsets the size we are working with (8,000-35,000 member names), it's not even close.
It does call for a small additional amount of prep work to generate the MaxID query each time to feed the second query, but I've been building a form and module that will take any temporary input file and generate on-the-fly SQL for the required queries to run from VBA. Once that's done, I figure my basic de-duping work will be cut in half.
yayhandclap.gif thanks.gif
George
Go to the top of the page
 
Ender
post Aug 29 2006, 01:30 PM
Post#20


UtterAccess VIP
Posts: 1,283
Joined: 11-February 04
From: AZ


Glad I could help - I just hate waiting on code.
Go to the top of the page
 


Custom Search
RSSSearch   Top   Lo-Fi    17th December 2017 - 04:14 PM