UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
2 Pages V  1 2 >  (Go to first unread post)
   Reply to this topicStart new topic
> Duplicates Removal Code, Access 2013    
 
   
SomekindaVB
post Dec 18 2017, 06:13 PM
Post#1



Posts: 252
Joined: 15-December 16



Hi all and Merry Holiday season

So I have been advised by my boss that I have this last day to get this [censored] right before I have to chase down another method.

Out of sheer desperation I am posting the full duplicate kill code. This method has already had records imported and now orders records alphabetically in order to determine if a following record is the same as the proceeding one.

Can anybody tell me if there is any glaring errors. Note the code generally works, but tends to miss some duplicates.

Requirements: it cannot change any existing records. It must always delete the newer records.
Because I am so tried of this failing all the bloody time. newer records must be executed with maximum brutality!



CODE
Function RemoveDuplicatesFromMasterRecord() As Long
'Cycles throuhg the selected record and remove any that duplicate.
'This will leave the first record alone, so its important that the SQL is ordered correctly
Dim rsWorkitemNew As DAO.Recordset
Dim strPrevious As String, strRecordDate As String, strWorkItemStatus As String, strCurrentRecord As String
Dim lngClientID As Long, lngDueDate As String
Dim dDueDate As Date

    'set the SQL string - note that this method combines multiple fields to order them
    strSQL = _
        "SELECT Tbl_Workflow_New.ClientId, [ClientID] & '/' & [Due Date] & '/' & [DMSEmailSummary] AS Combination, " & _
        "CLng([Due Date]) AS DueDate, Tbl_Workflow_New.DMSEmailSummary, Tbl_Workflow_New.ID " & _
        "FROM Tbl_Workflow_New " & _
        "WHERE (((Tbl_Workflow_New.RecordStarted)>Date()-30) AND ((Tbl_Workflow_New.WorkItemType)='Covenant')) " & _
        "ORDER BY Tbl_Workflow_New.ClientId, [ClientID] & '/' & [Due Date] & '/' & [DMSEmailSummary]"
    Debug.Print strSQL
    'Set the recordset
    Set rsWorkItem = CurrentDb.OpenRecordset(strSQL, dbOpenDynaset)
        With rsWorkItem
            'order the recordset - ordered by the requirements first them the dat that it was imported.  always delete the last record.
            '.Sort = "RecordStarted" 'where the magic happens.  The sort must be used to prove a temporal comparison of what record was imported and when
            If Not .RecordCount = 0 Then .MoveLast: .MoveFirst
            strPrevious = ""
            For i = 1 To .RecordCount
                'Populate variables
                lngClientID = .Fields.Item("ClientID")
                strSummary = Nz(.Fields.Item("DMSEmailSummary"), "No Summary")
                dDueDate = .Fields.Item("DueDate")
                strCurrentRecord = .Fields.Item("Combination")
                'loop through each record, comparing previously collected record info as a string
                If strPrevious = strCurrentRecord Then 'lngClientID & "/" & strSummary & "/" & lngDueDate Then
                    'Lay down some pain on this duplicate skuzbag!
                    On Error Resume Next 'likely a record locked error
                    .Delete
                    On Error GoTo 0
                    lngEmailDuplicates = lngEmailDuplicates + 1
                    Else
                    'No duplicate found
                End If
                strPrevious = lngClientID & "/" & dDueDate & "/" & strSummary ' Collect the details of the current record.  This will be use to compared to the new record
                .MoveNext
            Next i
        End With
        'Pass the count to the function
        RemoveDuplicatesFromMasterRecord = lngEmailDuplicates
End Function


Cheers
Go to the top of the page
 
RJD
post Dec 18 2017, 06:58 PM
Post#2


UtterAccess VIP
Posts: 8,481
Joined: 25-October 10
From: Gulf South USA


Hi: While I haven't dug into you code completely, it would seem that you could do this much easier. If you created a query that sorted on all fields, then used a DCount (all non-ID fields = and differentiated by the unique record ID, such as an Autonumber - which you can add if there is not one already) to create a sequence number, then you could delete all records with a sequence number greater than 1. That seems pretty straightforward and is easily testable before executing. Back up the table, of course, before deleting - as you no doubt already know (I'm a belt and suspenders guy myself).

Just some thoughts.

... and Seasons Greetings as well!

HTH
Joe

--------------------
"Each problem that I solved became a rule, which served afterwards to solve other problems."
"You just keep pushing. You just keep pushing. I made every mistake that could be made. But I just kept pushing."

Rene Descartes 1596-1650 (Mathematician and Philosopher)
Go to the top of the page
 
SomekindaVB
post Dec 18 2017, 07:15 PM
Post#3



Posts: 252
Joined: 15-December 16



Thanks for the tip RJD.

Doesn't this method require running through each record, then running a second query to remove the highest ID value?
Go to the top of the page
 
RJD
post Dec 18 2017, 07:51 PM
Post#4


UtterAccess VIP
Posts: 8,481
Joined: 25-October 10
From: Gulf South USA


Yes, you have to run through each record - but don't you have to do that anyway to see which ones are duplicates? And you should be able to do that either with one query pass or using a subquery. Does that cause a problem?

HTH
Joe

--------------------
"Each problem that I solved became a rule, which served afterwards to solve other problems."
"You just keep pushing. You just keep pushing. I made every mistake that could be made. But I just kept pushing."

Rene Descartes 1596-1650 (Mathematician and Philosopher)
Go to the top of the page
 
RJD
post Dec 18 2017, 08:09 PM
Post#5


UtterAccess VIP
Posts: 8,481
Joined: 25-October 10
From: Gulf South USA


Hi again: Take a look at the demo attached. Look at the table. Then run the query. Then look at the table again. Note that there is one query with a subquery in the WHERE clause, along with a DMin to avoid the lowest record ID.

In your case you will have to deal with all the fields for which you want to check duplication.

HTH
Joe
Attached File(s)
Attached File  RemoveDuplicates.zip ( 18.76K )Number of downloads: 12
 

--------------------
"Each problem that I solved became a rule, which served afterwards to solve other problems."
"You just keep pushing. You just keep pushing. I made every mistake that could be made. But I just kept pushing."

Rene Descartes 1596-1650 (Mathematician and Philosopher)
Go to the top of the page
 
SomekindaVB
post Dec 18 2017, 10:05 PM
Post#6



Posts: 252
Joined: 15-December 16



The problem is I can see the duplicates - I know how many there are, except that removing the latest add record, seems to be the most complex thing in the world. I'm going freaking nuts with this [censored]
Go to the top of the page
 
RJD
post Dec 18 2017, 10:09 PM
Post#7


UtterAccess VIP
Posts: 8,481
Joined: 25-October 10
From: Gulf South USA


Hi again: Well, I guess I'm not seeing the difficulty here. Perhaps you could post some records, some with duplicates, for us to look at. That way, seeing what you are seeing, we might understand better - and give you a procedure/query that will do what you want.

And could you explain more what you mean by "except that removing the latest add record seems to be the most complex thing in the world"?

HTH
Joe

--------------------
"Each problem that I solved became a rule, which served afterwards to solve other problems."
"You just keep pushing. You just keep pushing. I made every mistake that could be made. But I just kept pushing."

Rene Descartes 1596-1650 (Mathematician and Philosopher)
Go to the top of the page
 
SomekindaVB
post Dec 18 2017, 10:25 PM
Post#8



Posts: 252
Joined: 15-December 16



Hi again RJD,

Thanks for the example. This is the best working example I've seen. Can I make this work with a compiled set of fields.

e.g:[ClientID] & '/' & [Due Date] & '/' & [DMSEmailSummary] AS Combination. The duplicate record is based on multiple fields.

Or must I specify each field? and do they need to be ordered in any particular way?

I tried this in the IN statement In (SELECT Tmp.ClientID, Tmp.DueDate, Tmp.Summary, Count(*) FROM Tbl_Workflow_New AS Tmp GROUP BY Tmp.ClientID, Tmp.DueDate, Tmp.Summary HAVING ((Count(*))>1)), but it didn't like it a whole bunch.
This post has been edited by SomekindaVB: Dec 18 2017, 10:53 PM
Go to the top of the page
 
RJD
post Dec 18 2017, 11:15 PM
Post#9


UtterAccess VIP
Posts: 8,481
Joined: 25-October 10
From: Gulf South USA


Yes, you should be able to concatenate the fields to be checked as duplicates, as long as you format them all as text - and I would control the length of each field as well. Date, for example, could be done with Format([Due Date],'yyyymmdd') and numerics as something like Format([SomeField],'00000').

See the revision to my demo attached, which concatenates three fields for duplicate comparison.

If you are still having difficulty, if you give us a few record examples, perhaps we can give you some better specifics on dealing with this ...

HTH
Joe
Attached File(s)
Attached File  RemoveDuplicates_Rev1.zip ( 19.01K )Number of downloads: 5
 

--------------------
"Each problem that I solved became a rule, which served afterwards to solve other problems."
"You just keep pushing. You just keep pushing. I made every mistake that could be made. But I just kept pushing."

Rene Descartes 1596-1650 (Mathematician and Philosopher)
Go to the top of the page
 
MadPiet
post Dec 18 2017, 11:27 PM
Post#10



Posts: 2,426
Joined: 27-February 09



Move to SQL Server, relink tables, and do it in T-SQL:

CODE
-- keep the FIRST/earliest record
-- delete all non-first items.

WITH DupesCTE (FirstName, DupeNum)
AS
    (
        SELECT FirstName
            , ROW_NUMBER() OVER (PARTITION BY FirstName ORDER BY ID) AS dupeNum
        FROM TestDupes
    )
DELETE
FROM DupesCTE
WHERE DupeNum>1;


yeah, just kidding.
Go to the top of the page
 
merlenicholson
post Dec 18 2017, 11:35 PM
Post#11



Posts: 520
Joined: 23-November 11
From: Tampa, Florida, USA - UTC -5:00


SomekindaVB, I would go to great lengths to avoid looping through a table one record at a time unless it's a small table, it's a one-off job or the processing time doesn't matter. I think that when using your code you find it's done inaccurately it's because you need to run this more than once, up to a maximum count of multiple duplicates. (I can be wrong about this, maybe, probably).

I use this simple SQL to eliminate duplicate records in a table that already has them, (obviously it's easier to just not add dups in the first place). But first, I would try:

"DELETE Tbl_Workflow_New.* FROM Tbl_Workflow_New WHERE ID NOT IN (SELECT MIN(ID) FROM Tbl_Workflow_New GROUP BY [ClientID], [Due Date], [DMSEmailSummary);"

My only hesitation is whether or not it will delete the last record added. I think it will, provided there's an ID autoincrement field, and the latest added is always a higher ID. That's why I used the MIN function instead of MAX.

Second method:
From your original post it appears that the records may have been added (appended) to the main table from other tables or processes, and that is the most efficient time to eliminate duplicates, just don't add them in the first place. Just simply add one single index to your main table using the three fields, mark the index as Unique True, and as the records are appended, those that are duplicates will simply error out. If you are appending new records from a table, just set Docmd.SetWarnings False first so you don't have to contend with the message box that is a summary of how many records were appended and how many were not due to an index error. This method insures that in case of a duplicate the one that already exists in the main table will be used.

Third method:
Create a new table identical to the main table using structure only, no data, and create a new single index using the three fields as Unique. Append the entire main table data into the new. The duplicates will error out and the result will be perfect - and it happens quickly. Rename the tables. I've done this entirely in code; it couldn't be easier, but most times I do it as a adhoc one-off because it's so easy to do. It goes much faster if you create the same index in the main table - especially if it's large. The problem is that it probably will not preserve the oldest record reliably.



--------------------
Merle Nicholson
Ninety percent of all mental errors are in your head. - Yogi Berra -
Go to the top of the page
 
SomekindaVB
post Dec 19 2017, 12:16 AM
Post#12



Posts: 252
Joined: 15-December 16



Madpiet - I have a project to move the entire BE to SQL coming up, but not yet. will be a whole ton of new questions then I'm sure.

merlenicholson - thanks so much for your suggestions. I'm working on the DELETE string now to make sure it's gonna work before I delete everything. As for 2nd one, this seems like a brilliant solution. I knew about this but didn't know how to make it work. Think I might have the idea now. I'll have to find out how to do this and then trial it. I have a concatenation field of the three fields. is this to be be set to unique, and it will error out?

RJD - thanks for your advice. unfortunately the FE is over 20mb, so I can provide it and I didn't have enough time to make a basic one. But I have been playing around with your DBs as a means to work on this issue. It helps me understand what other people say also. So thanks so much for your efforts
Go to the top of the page
 
merlenicholson
post Dec 19 2017, 01:38 AM
Post#13



Posts: 520
Joined: 23-November 11
From: Tampa, Florida, USA - UTC -5:00


The concatenation field is not necessary, just a maintenance nuisance. The new index is a single index of three fields, with the Unique property set to True. See your December 11 posting on the same subject, my last reply. The image attachment shows it all. See Find First Issue

--------------------
Merle Nicholson
Ninety percent of all mental errors are in your head. - Yogi Berra -
Go to the top of the page
 
RJD
post Dec 19 2017, 08:17 AM
Post#14


UtterAccess VIP
Posts: 8,481
Joined: 25-October 10
From: Gulf South USA


Merle, you are absolutely correct, of course - the concatenated field is NOT necessary! I had started with a duplicates query and simply continued with that in the solution. My bad. I must have been punchier than usual last night! Your solution is far better. and the OP should go with that. Thanks for stepping in and correcting my approach.

Regards,
Joe

--------------------
"Each problem that I solved became a rule, which served afterwards to solve other problems."
"You just keep pushing. You just keep pushing. I made every mistake that could be made. But I just kept pushing."

Rene Descartes 1596-1650 (Mathematician and Philosopher)
Go to the top of the page
 
SomekindaVB
post Dec 19 2017, 04:20 PM
Post#15



Posts: 252
Joined: 15-December 16



merlenicholson - sorry if i'm being a bit thick,

But I don't really understand this method. So I get that you can make a single field unique, which would prevent duplicates, but I have multiple fields. each of these fields individually can have the same value. E.g same customer id, same date, same summary.

What make it a duplicate is when all three fields are exactly the same. This is why I thought the concatenation field would be the one needed.
Go to the top of the page
 
RJD
post Dec 19 2017, 05:10 PM
Post#16


UtterAccess VIP
Posts: 8,481
Joined: 25-October 10
From: Gulf South USA


Hi: Pardon me for jumping back in. I'll address the concatenation issue and let Merle continue with the index issue...

Merle was completely correct about not requiring concatenation. I was a way bit overzealous in adding the query part where the results are limited to records with duplicates, and then using concatenation to make the matches. This is, as Merle pointed out, not necessary at all. I altered my last demo to use Merle's approach. Look at the table, then at the query, then run the query and look at the table again. See if this is what you are trying to do.

Note that the subquery in the criteria actually does concatenate (in a way) the matching fields by using the GROUP BY approach and selects the lowest ID field of the group for exclusion from the table. Again, this assumes you have a unique record identifier (such as an Autonumber). If not, you can add that easily.

And thanks again to Merle for hitting my head with a 2X4 to make me see that the simpler way is far better...

HTH
Joe


Attached File(s)
Attached File  RemoveDuplicates_Rev2.zip ( 20.2K )Number of downloads: 6
 

--------------------
"Each problem that I solved became a rule, which served afterwards to solve other problems."
"You just keep pushing. You just keep pushing. I made every mistake that could be made. But I just kept pushing."

Rene Descartes 1596-1650 (Mathematician and Philosopher)
Go to the top of the page
 
SomekindaVB
post Dec 19 2017, 05:44 PM
Post#17



Posts: 252
Joined: 15-December 16




RJD and merlenicholson,

Thanks so much for your continuing efforts to wisen me up. I've learned more in the last couple of day than I have in months on this problem.

So I think I got the indexing concept into my head and it seems like a brilliant way to go, which is to say, I'm not appending data from a table to a table. I'm writing data from an ADO recordset in code to the table. I've made the indexes, but the code did not discover any duplicates.

CODE
Set rsADO = New ADODB.Recordset
rsADO.Fields.Append "ClientID", adInteger, , adFldIsNullable
'...etc
rsADO.AddNew
rsADO.Fields("ClientID") = lngClientID
'...etc
rsADO.update


I don't know how to append an ado recordset to a table, except by writing to it in a loop. e.g for each record in ado recordset copy to table.
Go to the top of the page
 
merlenicholson
post Dec 19 2017, 11:49 PM
Post#18



Posts: 520
Joined: 23-November 11
From: Tampa, Florida, USA - UTC -5:00


IF you have created the index as I described. One index, three fields, Unique, as I illustrate here:
Attached File  Capture.JPG ( 123.88K )Number of downloads: 1


The Index Name doesn't matter, as long as it's not "PrimaryKey" or some other reserved word. The index uses all three fields.

I would change the last line of your code to these three lines:
CODE
On Local Error Resume Next ' Make sure that on error it will continue processing because it will error if there is an Index Violation
rs.ADO.Update                    ' Append the data if there is no Index violation, otherwise error
On Local Error GoTo 0          ' Reset the error trap


All untested of course. Test it with both duplicate data and non-duplicate data.

You're not locked into looping one record at a time, but let's try to get some results first. Get something that works, trim it down to a workable example and post the .accdb if you'd like to pursue that.

--------------------
Merle Nicholson
Ninety percent of all mental errors are in your head. - Yogi Berra -
Go to the top of the page
 
SomekindaVB
post Dec 19 2017, 11:52 PM
Post#19



Posts: 252
Joined: 15-December 16



I should say I got the indexing solution to work. Except just one tiny small problem

CODE
With Workitem

![ClientID] = rsADO.Fields("ClientID").Value            ' Client ID
![Due Date] = rsADO.Fields("Due Date").Value            '
![DMSEmailSummary] = rsADO.Fields("DMSEmailSummary").Value   ' DMSEmailSummary

On Error GoTo NextADORecord
DoCmd.SetWarnings False
.Update 'duplicate will throw 3022 error
DoCmd.SetWarnings True
On Error GoTo 0

End with


So the 3022 is thrown and no amount of setting warnings false, pleading or begging will prevent that blasted error msg from driving me mad. Sure, it may work in the macro, but VBA doesn't work too well.

What gives with error 3022 that it doesn't wanna do what it's told to do? Even the on error command doesn't wanna play nice
This post has been edited by SomekindaVB: Dec 19 2017, 11:52 PM
Go to the top of the page
 
merlenicholson
post Dec 20 2017, 12:02 AM
Post#20



Posts: 520
Joined: 23-November 11
From: Tampa, Florida, USA - UTC -5:00


As I suggested:
CODE
With Workitem

![ClientID] = rsADO.Fields("ClientID").Value            ' Client ID
![Due Date] = rsADO.Fields("Due Date").Value            '
![DMSEmailSummary] = rsADO.Fields("DMSEmailSummary").Value   ' DMSEmailSummary

DoCmd.SetWarnings False
On Local Error Resume Next
Err.Clear
.Update 'duplicate will throw 3022 error
If Err.Number <> 3022 then
     MsgBox "Error Number Other than Duplicate " & Err.Number & "  -" & Err.Description
     Err.Clear
end if
DoCmd.SetWarnings True
On Error GoTo 0

End with

The two Err.Clear are not really necessary, but sometimes unusual circumstances interfere.
This post has been edited by merlenicholson: Dec 20 2017, 12:06 AM

--------------------
Merle Nicholson
Ninety percent of all mental errors are in your head. - Yogi Berra -
Go to the top of the page
 
2 Pages V  1 2 >


Custom Search
RSSSearch   Top   Lo-Fi    19th July 2018 - 04:58 AM