UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> Deleting Duplicate Records, Access 2016    
 
   
ecovindaloo
post Feb 12 2020, 11:46 AM
Post#1



Posts: 295
Joined: 1-November 08
From: New York Area, USA


I'm trying to delete duplicate records in a table. I'm trying to figure out the best way to do this.

The issue is that I need to keep one of the records and use the ID to update multiple tables to the final ID of the record that I'm trying to keep.

I've attached a snapshot of a query showing an example of what I'm trying to do. In the example I want to keep the first record because it has the most information and delete the second record. I will use the PatientID from the first record and update records in multiple tables currently using the PatientID of the second record.

In the end I'm just trying to get the PatientID of the record to keep and the PatientID of the records to remove. I have the code to update the tables and everything else that I need to do.

I'm not sure if I've made my question clear. If not, let me know.
Attached File(s)
Attached File  RecordsetExample_2_12_20_.JPG ( 27.47K )Number of downloads: 8
 
Go to the top of the page
 
theDBguy
post Feb 12 2020, 11:51 AM
Post#2


UA Moderator
Posts: 77,490
Joined: 19-June 07
From: SunnySandyEggo


Hi. Does your table have a PK field? If so, are you able to add the PK field to that query's result, so you can tell which actual record you want to save? Also, from the image, it's easy enough for you to say you want to keep the first one, because it has more data, but how to tell the computer that would be a little harder. You'll need to instruct the database to consider "something" as candidate for deletion or retention. If that "something" is more data, then you'll have to somehow translate it into logic or code the computer can understand.

--------------------
Just my 2 cents... "And if I claim to be a wise man, it surely means that I don't know" - Kansas
Access Website | Access Blog | Email
Go to the top of the page
 
ecovindaloo
post Feb 12 2020, 01:17 PM
Post#3



Posts: 295
Joined: 1-November 08
From: New York Area, USA


The PatientID is the Primary Key field.

What I was thinking of doing was to step through each record in the query and get a count of which fields are populated, excluding the first and last name fields. The record with the highest count would be the record to keep.

I was just wondering if there was an easy way to do this that I wasn't aware of.
Go to the top of the page
 
orange999
post Feb 12 2020, 01:25 PM
Post#4



Posts: 2,041
Joined: 10-February 08
From: Ottawa, Ont, Canada; West Palm Beach, FL


Removing the duplicate is one issue, but I'd be concerned how the second (or replicates if they exist) have gotten into your system. Perhaps some additional validation is in order --just saying.

--------------------
Good luck with your project!
Go to the top of the page
 
ecovindaloo
post Feb 12 2020, 01:28 PM
Post#5



Posts: 295
Joined: 1-November 08
From: New York Area, USA


This wasn't a database that I created. I'm just adding updates and fixing issues in the database. The database wasn't set up correctly in the beginning. The client wanted to have the duplicates in the database to have some sort of history.

I've convinced the client that this is not the way to go and to clean up the data.
Go to the top of the page
 
theDBguy
post Feb 12 2020, 01:37 PM
Post#6


UA Moderator
Posts: 77,490
Joined: 19-June 07
From: SunnySandyEggo


Hi. No matter what you do, I think you will have to do them in several steps anyway. For example:

1. Create a Find Duplicates query
2. Create a query with a calculated column showing which record is a retention/deletion candidate (this is that count empty column part you talked about)
3. Create another query joining the first two query, so you can identify the PK fields with duplicates
4. Create another query to identify the PKs from the above query with the most value in the calculated column
5. Use that query to delete the duplicates (but probably after you have updated the related tables with the new PK value)

Hope that helps...

--------------------
Just my 2 cents... "And if I claim to be a wise man, it surely means that I don't know" - Kansas
Access Website | Access Blog | Email
Go to the top of the page
 
ecovindaloo
post Feb 12 2020, 02:11 PM
Post#7



Posts: 295
Joined: 1-November 08
From: New York Area, USA


This is what I ended up doing. I passed the recordset to a function which looped through the recordset. The function counts the fields that are populated for the current record and then compares that number to the previous highest number count. Fields are then populated based on the count and at the end the two fields are combined using the colon as delimiter. This is passed back to the original function which will then update all of the appropriate tables with the PatientID which is being kept and updating the records with the PatientIDs that are being removed in the end.

Here's the function:
Public Function FindPtData(rst As DAO.Recordset) As String
'------------------------------------------------------------------------------------------
'2/12/20 Loop through records to decide which record to keep and which records to delete
'------------------------------------------------------------------------------------------
Dim PtIds As String
Dim PtIdKeep As Long
Dim PtIdDelete As String
Dim rstCount As Integer
Dim i As Integer
Dim FieldTotal As Integer

On Error GoTo FindPtData_Error

FieldTotal = 0
PtIdDelete = vbNullString

With rst
.MoveFirst

Do Until .EOF
'Check to see which fields are populated
i = 0

If Not IsNull(.Fields(3)) Then
i = i + 1
End If

If Not IsNull(.Fields(4)) Then
i = i + 1
End If

If Not IsNull(.Fields(5)) Then
i = i + 1
End If

If Not IsNull(.Fields(6)) Then
i = i + 1
End If

If Not IsNull(.Fields(7)) Then
i = i + 1
End If

If Not IsNull(.Fields(8)) Then
i = i + 1
End If

'Update fields
If i > FieldTotal Then
If PtIdKeep <> 0 Then
PtIdDelete = PtIdKeep & "," & PtIdDelete
End If

PtIdKeep = .Fields(0)
ElseIf FieldTotal > i Then
PtIdDelete = .Fields(0) & "," & PtIdDelete
End If

FieldTotal = i

.MoveNext
Loop
End With

'Clean up string
If Right(PtIdDelete, 1) = "," Then
PtIdDelete = Left(PtIdDelete, Len(PtIdDelete) - 1)
End If

'Combine strings using ":" as delimeter
PtIds = PtIdKeep & ":" & PtIdDelete

FindPtData_Exit:
FindPtData = PtIds
Exit Function

FindPtData_Error:
MsgBox Err.Description, , "FindPtData"
PtIds = vbNullString
Resume FindPtData_Exit

End Function

I've never had to do something like this before. It does work, although it seems a little convoluted to me.

Maybe someone else can use this function in their project down the line.

Let me know what you think of this solution.
Go to the top of the page
 
nvogel
post Feb 12 2020, 04:13 PM
Post#8



Posts: 1,061
Joined: 26-January 14
From: London, UK


The logic being applied in your code seems a bit arbitrary. How will you know if you are keeping the correct version of data? From my position of ignorance about your actual circumstances this seems a bit scary in the context of personally identifiable information and healthcare. I'm sure you know that there are laws and regulations and certain standards that have to be applied in those cases. Those are my views for what they are worth, but I expect you'll want to get approval from a responsible authority rather than just rely on comments in this forum.
This post has been edited by nvogel: Feb 12 2020, 04:13 PM
Go to the top of the page
 
MadPiet
post Feb 12 2020, 04:16 PM
Post#9



Posts: 3,465
Joined: 27-February 09



If it's medical information, it's covered by HIPPA, and those violations can be cripplingly expensive. Yikes.
I remember "cracking" an Access database with Word.
Go to the top of the page
 
nvogel
post Feb 12 2020, 04:20 PM
Post#10



Posts: 1,061
Joined: 26-January 14
From: London, UK


Yes but HIPAA, not HIPPA. A very common slip smile.gif
Go to the top of the page
 
MadPiet
post Feb 12 2020, 04:24 PM
Post#11



Posts: 3,465
Joined: 27-February 09



Oh, can never remember if it's Privacy & Portability or not. =)

I remember working on a problem like that in 2007, and they had names, birthdates, and Social Security Numbers in an Access database... and at one point, I read part of another "security by obscurity" Access database with Word. Wasn't pretty, but I could read parts of it. It was a lawsuit waiting to happen.
Go to the top of the page
 
theDBguy
post Feb 12 2020, 04:30 PM
Post#12


UA Moderator
Posts: 77,490
Joined: 19-June 07
From: SunnySandyEggo


QUOTE (MadPiet)
Oh, can never remember if it's Privacy & Portability or not. =)

Hi. I had to look it up. Portability, yes, but I guess it's not so much as Privacy but more of Accountability, in this case.

--------------------
Just my 2 cents... "And if I claim to be a wise man, it surely means that I don't know" - Kansas
Access Website | Access Blog | Email
Go to the top of the page
 
ecovindaloo
post Feb 12 2020, 05:00 PM
Post#13



Posts: 295
Joined: 1-November 08
From: New York Area, USA


The company is only keeping the basic information for these patients. They don't keep the SS# or other relevant information.

Also this isn't a medical office.
Go to the top of the page
 
WildBird
post Feb 12 2020, 05:19 PM
Post#14


UtterAccess VIP
Posts: 3,724
Joined: 19-August 03
From: Auckland, Little Australia


What happens if 2 'duplicate' records are found, and there is data in both with different fields? I.e. record 1 has address, but no state. Record 2 has state, but no address?

I would be wary of deleting anything without manual intervention to tidy it up. Are there other tables that relate to this table? If so, what records are they referencing?

--------------------
Beer, natures brain defragging tool.
Go to the top of the page
 
ecovindaloo
post Feb 12 2020, 05:25 PM
Post#15



Posts: 295
Joined: 1-November 08
From: New York Area, USA


There will be an Excel file created showing all of the duplicate records. This will be the backup.

From looking at the duplicate data it looks like one record has most of the information like email address, address, city, state, zip code, and the duplicate record has only the first and last name.

The data never should have been going into the database like this from the beginning.
Go to the top of the page
 
WildBird
post Feb 12 2020, 05:58 PM
Post#16


UtterAccess VIP
Posts: 3,724
Joined: 19-August 03
From: Auckland, Little Australia


Backup the database obviously, dont rely on Excel copy as you will lose autonumbers etc.

Anyway, in the example pic, it has email in both. If this is the case, you could try something like

Create a query, group on email, and first name, last name. Get the min ID.

Create another query, same as above, but get Max of ID.

Add another field to the table, name it something like "Keep". Make it Yes/No

Make a form, and have 2 subforms or datasheets (Sorry, I do very little forms!) but basically you want one recordsource to be the min query, the other to be the max query.

Might be a way you could synch them up, anyway, ideally you want to be able to see both records side by side, and be able to select which one to 'keep'.

Not sure if this will work, but I am on a caffeine rush right now and it seems to make some sort of sense to me!

--------------------
Beer, natures brain defragging tool.
Go to the top of the page
 
ecovindaloo
post Feb 12 2020, 06:19 PM
Post#17



Posts: 295
Joined: 1-November 08
From: New York Area, USA


I've already posted my solution earlier in this thread. I created a function to handle this.
Go to the top of the page
 


Custom Search


RSSSearch   Top   Lo-Fi    18th February 2020 - 01:27 PM