My Assistant
![]() ![]() |
|
|
Aug 22 2006, 03:26 PM
Post
#1
|
|
|
New Member Posts: 13 |
Hi,
I've got an email address database with several thousand addresses within it. It only has one column which houses the address, no primary key or anything. I was just wondering, is there an easy way to remove duplicates with a query etc. I don't mind if I have to export the data again and rebuild it, or if there is a way I can get the query to delete the dupes for me. Can anyone help. Regards, Graeme |
|
|
|
Aug 22 2006, 03:36 PM
Post
#2
|
|
|
UtterAccess Guru Posts: 814 From: MN |
Create a make table query to a new table. In that query, do a group by and it will not show the duplocates. Run the query and after it is ran, delete your "primary" email address table and then copy in the new on from the newly created table.
I would backup the email address table before deleting anything out first...you never know what could happen (IMG:http://www.utteraccess.com/forum/style_emoticons/default/sad.gif) Nick |
|
|
|
Aug 22 2006, 03:57 PM
Post
#3
|
|
|
New Member Posts: 13 |
Hi,
I've got as far as making a query to new table query, but how do I do the group by thing? Do I put that in criteria, or is it a seperate thing like an expression or something? Cheers, Graeme |
|
|
|
Aug 22 2006, 03:59 PM
Post
#4
|
|
|
UtterAccess Addict Posts: 201 |
Hit the group by (sigma symbo) button on the toolbar.
|
|
|
|
Aug 22 2006, 07:02 PM
Post
#5
|
|
|
Retired Moderator Posts: 13,544 From: Texas (Is there anywhere else?) |
Add an Autonumber field and name it ID to the table and save it.
Then create this query using your table name and field name SELECT * FROM YourTable WHERE ID Not In (SELECT Max(ID) FROM YourTable GROUP BY AddressField) Run it as a Select query to be sure you are returning only the duplicates. When you are satisified, change it to a Delete query. You should have a backup of the table before you delete anything. |
|
|
|
Aug 22 2006, 07:56 PM
Post
#6
|
|
|
UA Admin Posts: 19,245 From: Newcastle, WA |
This query will remove only the 2nd (or greater) instance of duplicate valued records in a table? Why didn't I know that before?
Edited by: GroverParkGeorge on Tue Aug 22 20:57:36 EDT 2006. |
|
|
|
Aug 22 2006, 08:53 PM
Post
#7
|
|
|
Retired Moderator Posts: 13,544 From: Texas (Is there anywhere else?) |
Isn't SQL great. o! (IMG:http://www.utteraccess.com/forum/style_emoticons/default/laugh.gif)
|
|
|
|
Aug 22 2006, 10:01 PM
Post
#8
|
|
|
UA Admin Posts: 19,245 From: Newcastle, WA |
Way cool. It's so obvious looking at your SQL, but it never occurred to me before. I've literally wasted two hours today de-duping a recordset the old-fashioned way. Where do I sign up for some of them smart pills?
George |
|
|
|
Aug 23 2006, 01:14 PM
Post
#9
|
|
|
UA Admin Posts: 19,245 From: Newcastle, WA |
One more piece of feedback. On a table with 8700 +/- records, the SQL runs fairly slowly (60 second minimum), but it does return each redundant instance of the duplicated addresses. Very cool.
Unfortunately, I can't just blow out these records because there are other factors that i have to account for, but once I figure out how to sort the recordset and create the IDs so that I get the record(s) I want to keep in the Max position, this is going to shave hours off my job over the next few weeks as we update 20+ groups of similar or larger size. (IMG:http://www.utteraccess.com/forum/style_emoticons/default/notworthy.gif) (IMG:http://www.utteraccess.com/forum/style_emoticons/default/notworthy.gif) George |
|
|
|
Aug 23 2006, 03:49 PM
Post
#10
|
|
|
Retired Moderator Posts: 13,544 From: Texas (Is there anywhere else?) |
Subqueries are slow, but in this case it is usually faster than temp tables etc.
Glad I could help make the light blub come on. (IMG:http://www.utteraccess.com/forum/style_emoticons/default/lightbulb.gif) |
|
|
|
Aug 23 2006, 04:03 PM
Post
#11
|
|
|
UtterAccess VIP Posts: 1,283 From: AZ |
George if you want speed, use truitt's example, but get rid of the not in (that's what's slowing it down) and use a left join against the group by query.
|
|
|
|
Aug 23 2006, 05:18 PM
Post
#12
|
|
|
UA Admin Posts: 19,245 From: Newcastle, WA |
I don't think that would work. My first test shows the result is that EVERY record in the table is returned if I remove the "NOT". That means I'd delete EVERY record, not duplicates.
I'll defer to Truitt on this one, but it sure looks to me like the only valid approach is to use "NOT" in the subquery. George Edited by: GroverParkGeorge on Wed Aug 23 18:42:28 EDT 2006. |
|
|
|
Aug 23 2006, 06:56 PM
Post
#13
|
|
|
UtterAccess VIP Posts: 1,283 From: AZ |
You don't just remove the not you remove the whole not in and do a left join. This means you also end up without a subquery since you left join against that query.
For example: qry_max = SELECT Max(ID) as max_id FROM YourTable GROUP BY AddressField SELECT * FROM YourTable left join qry_max on YourTable.id = max_id where qry_max.max_id is null You can of course defer to truitt, but I almost always use a left join instead of a not in because the not in command is so inefficient. |
|
|
|
Aug 23 2006, 07:19 PM
Post
#14
|
|
|
UA Admin Posts: 19,245 From: Newcastle, WA |
Okay, now I understand what you meant to say.
Thank you for clarifying. I'll give that a try as well. George |
|
|
|
Aug 25 2006, 11:52 AM
Post
#15
|
|
|
UA Admin Posts: 19,245 From: Newcastle, WA |
Works a treat. I figured out that I can actually use the base query in various flavors to match as few or as many fields as I want:
UPDATE tmpMemberFile SET tmpMemberFile .MemberStatusID = 9 WHERE (((tmpMemberFile .MemberID) Not In (SELECT MAX(MemberID) FROM tmpMemberFile GROUP BY [Address1] & " " & [LastName] & " " & [FirstName] ))); (Extra Parentheses courtesy of the Access Query Grid) UPDATE tmpMemberFile SET tmpMemberFile .MemberStatusID = 9 WHERE (((tmpMemberFile .MemberID) Not In (SELECT MAX(MemberID) FROM tmpMemberFile GROUP BY [Address1] & " " & [LastName] ))); And so on. The first version matches both first and last names, the second only last names, meaning I can exclude spouses who reside at the same address. I've not tried Ender's suggestion yet, but I'm going there next. |
|
|
|
Aug 25 2006, 11:57 AM
Post
#16
|
|
|
Retired Moderator Posts: 13,544 From: Texas (Is there anywhere else?) |
Glad to help.
|
|
|
|
Aug 25 2006, 03:14 PM
Post
#17
|
|
|
UtterAccess VIP Posts: 1,283 From: AZ |
My different tack on truitt's solution is only worth exploring if you are waiting excessively on the "not in". If it's running fine then no worries, unless you've got the time and just want to test it. In which case, I'd be interested in your results. Personally I prefer the not in because it is easy to see what is happening just by looking at the query. This makes long term maintenance easier. It's only when I have really large data sets and excessive wait times that I go with the left join.
|
|
|
|
Aug 28 2006, 01:03 PM
Post
#18
|
|
|
UA Admin Posts: 19,245 From: Newcastle, WA |
Sorry to be so slow getting back to you on this. Work keeps interfering with other work so to speak.
It's starting to look like your left join technique is going to be significantly quicker, even though, as you point out, it takes a bit more to set up in the first place. More definite feedback later, after I test a bit more. George |
|
|
|
Aug 29 2006, 11:40 AM
Post
#19
|
|
|
UA Admin Posts: 19,245 From: Newcastle, WA |
Okay, the left join technique runs faster, hands down. On recrodsets the size we are working with (8,000-35,000 member names), it's not even close.
It does call for a small additional amount of prep work to generate the MaxID query each time to feed the second query, but I've been building a form and module that will take any temporary input file and generate on-the-fly SQL for the required queries to run from VBA. Once that's done, I figure my basic de-duping work will be cut in half. (IMG:http://www.utteraccess.com/forum/style_emoticons/default/yayhandclap.gif) (IMG:http://www.utteraccess.com/forum/style_emoticons/default/thanks.gif) George |
|
|
|
Aug 29 2006, 01:30 PM
Post
#20
|
|
|
UtterAccess VIP Posts: 1,283 From: AZ |
Glad I could help - I just hate waiting on code.
|
|
|
|
![]() ![]() |
|
Go to Top · Lo-Fi Version | Time is now: 23rd May 2013 - 08:38 AM |