Printable Version of Topic

Click here to view this topic in its original format

UtterAccess Forums _ Access Q and A _ Duplicates

Posted by: rzw0wr Nov 13 2019, 11:52 PM

I have a list of names in a table.
ID, FirstName and LastName.

I created a duplicate query to find them.
Is there a way to send the user a message from that query or is there another better way to find duplicates in a table.


Thank you,
Dale

Posted by: theDBguy Nov 14 2019, 12:09 AM

Hi Dale. Those are two separate process. You got one done, just need to do the other. Do you have the email info as well?

Posted by: rzw0wr Nov 14 2019, 12:30 AM

You have me a little confused.

What email info are you looking for wink.gif

Posted by: nvogel Nov 14 2019, 08:09 AM

I think theDBguy's point may be that it isn't clear what emails you want to send. Are you trying to send emails to every person returned in your query or are you trying to send one email to someone that includes the *results* of your query? If it's the former then you must have the email addresses listed somewhere that you haven't specified. If it's the latter then presumably someone still has to push a button to execute the query, in which case sending the email can be part of the process executed when the button is pushed. In either case, sending the email is different to running the query.

Posted by: Jeff B. Nov 14 2019, 09:23 AM

If that second column contains the concatenated First Name and Last Name, it might present some difficulties down the line. For example, how would you sort by "Last Name"?

Also, if you have a "Rob Smith", and a "Robert Smith", and a "Bob Smith", how will you/your users know if those are all the same person. For that matter, what if you legitimately have more than one "Rob Smith"?

Posted by: rzw0wr Nov 14 2019, 05:17 PM

I am not sending email to anyone.

The list is a list of customers.

I want to make sure that a customer is not entered more than once on the list.

The query I have now finds these duplicates for me.

If there are 2 or more John Does the they would need something to ID them as different people.

I am trying to figure out what I can do with the query.
If I make it a delete query then it deletes both names on the list.

Thanks for the replies
Dale

Posted by: Gerrit Nov 14 2019, 05:28 PM

As it is possible to have two or more Robert Smiths (being different persons with the same name) in your database then the alternative is to search for duplicates in other fields for each person. Say search for identical address's for each Robert Smith?

Posted by: orange999 Nov 14 2019, 07:02 PM

To differentiate customers whose names are duplicates, you will need other criteria to make a determination of uniqueness or not.
Perhaps phone number, street address/state/city/zipcode, BirthDate.....

Posted by: WildBird Nov 14 2019, 07:22 PM

And just to add to the mix, you might 2 Robert Smith's with different addresses, but what is to say this isn't the same person, and they simply moved? Not an easy one if you don't have a truly unique field for each person.

Posted by: rzw0wr Nov 14 2019, 10:56 PM

I got it.
Problem solved.

I used a little DAO in VB.

If the customer moved then my niece would change their address.

Thank you for the replies,
Dale

Posted by: FrankRuperto Nov 30 2019, 08:14 PM

If you are capturing their date of birth, it helps to locate duplicates because it is very rare that you find two or more people with the same last names and same date of births. I use the following qry to identify duplicate customer records.

CODE
SELECT [LastName], [FirstName], [DateOfBirth], [IdentType], [IdentNum], [Address1], [Address2], [City], [State], [TelephoneNum1]
FROM tblCustomer
WHERE (((tblCustomer.LastName) In (SELECT [LastName] FROM [tblCustomer] As Tmp GROUP BY [LastName] HAVING COUNT(*) > 1 )))
ORDER BY tblCustomer.LastName, tblCustomer.FirstName;




Posted by: nvogel Nov 30 2019, 11:19 PM

Hi Frank,

The problem with using IN like that is that it limits you to comparing one column at a time. Using a join instead means you can include as many columns as you like for the duplicate checking criteria.

SELECT c.*
FROM tblCustomer AS c,
(SELECT LastName, FirstName FROM tblCustomer GROUP BY LastName, FirstName HAVING COUNT(*)>1) AS d
WHERE c.LastName = d.LastName AND c.FirstName = d.FirstName;

Obviously names tend to be very frequently duplicated and don't necessarily help you find actual duplication problems. As regards data of birth, probably most companies can't or won't ask customers for that information. In some situations (e.g. employment or health related) keeping the customer's date of birth is OK but those seem like the exception rather than the rule. The data protection practice of not capturing more data than required is called minimisation and in some jurisdictions it is the law and not just a good practice.

Posted by: FrankRuperto Dec 1 2019, 02:25 AM

My users are pawnbrokers so DoB's and all other info on customers photo ID's are captured when users feed their ID's into a scanner. One column, like LastName, is fine because the dups report is filtering custs with the same last names and same DoB's, and the report also shows all the other info that's on the scanned ID's. So we have a larger result set that we can visually examine to see if there are too many coincidences. Sometimes existing customers supply different ID's and if the users dont visually recognize the customer, and those customers say they've never done business there before, then duplicate records slip through the cracks. On a similar topic, some users wanted the ability to add new values on_the_fly in certain comboxes like ItemNames, Brands, Models and they ended up adding mispellings and synonyms of existing values, despite all the training, cheat sheets I provided, plus I emphasized to first check the existing values before deciding to add new one's, ugh pullhair.gif

Posted by: Jeff B. Dec 1 2019, 08:51 AM

Frank

I wish rare duplicate names + birthdates were true I once worked with a fellow who spent the better part of a year convincing "the authorities" that he wasn't the person they were looking for. Turned out he shared a name AND birthdate with a serious felon living 100 miles away. and one of the other fellows posting here relates that there were TWO other folks besides him with the same name working at the same organization. Go figure!

Posted by: FrankRuperto Dec 1 2019, 10:04 AM

Jeff B.

Those exceptions will occur and authorities rely on fingerprints and dna to resolve a person's identity, but for a business that tracks say 10,000 people, the odds are still very low. So far I have not detected two or more different people with the same last names and DoB's within one pawnshop site. Spanish last names where my users are located consist of father's surname and mother's surname, so that lowers the probabilities even more, but that still does not guarantee it will never happen. My app has a customer history table that tracks any changes made to a customer's profile data. When a new or existing customer provides a photo ID, the app does a lookup in the history table to see if that ID's info already exists and displays the customer record if true, so that also helps prevent dup customers. We are planning to implement fingerprint scanners since regulations require their capture anyway when customers pawn or sell stuff, and we can leverage those fingerprints to uniquely identitfy customers. But as nvogel mentioned earlier, not all business models capture DoB's and other personal data, and with the rise of identity theft I can see these organizations that track large amounts of people grappling with duplicate records, or records with slight differences that belong to the same person.

Posted by: FrankRuperto Dec 1 2019, 12:04 PM

Despite all the safeguards we can implement to minimize the creation of more than one master record for a same person, we can not totally prevent them and automate their detection, and must rely on humans to periodically examine suspect data. So what happens when we do find more than one master record which belongs to the same person, and the master table has some or many related detail tables with data in them?... Which master record do we keep, and which detail records that belonged to the duplicate masters do we merge with the master record we kept?... We attempted to automate the process of merging the detail data to the master that was selected to be preserved by updating all the foreign keys, but this proved to not be practical because this process involves human decision-making as to which detail records to merge and which one's to discard. Our users are also not capable of doing this on their own, so we collaborate with them in deciding what to keep, what to discard and do it for them. This and maintaining some of their lookup tables is part of our support services.

Posted by: gemmathehusky Dec 2 2019, 07:42 AM

The problem is that computer data is not the same as the underlying real world data.

That's why although you can design a perfect system using surrogate autonumber keys, you need some unique real world information to manage and prevent "duplicates" in your table. For very big data populations, eg the population of a country, it's next to impossible to manage without error.

Posted by: WildBird Dec 2 2019, 02:49 PM

Frank,

A friend of mine was working at a bank, and was typing names into a system. She came across 2 that had very similar names, literally 1 letter different, something like Antonio and Antonion (cant remember real names, but similar and very 'rare' to say the least one of the names). Same data of birth, same address etc. She deleted the one with the weird name thinking it was a typo. Turns out, they were twins. Major issue to put the deleted one back into the system. So twins, same data of birth, same last name, same address. Twins are not exactly rare I guess.

DNA and fingerprints. I will argue that these are not definite. They might be able to tell if they ARE not a match, but can not guarantee that they are a match. Close, but no way to guarantee that with current technology that there are not 2 people with 'identical' fingerprints or DNA, either alive now, or previously alive, or even being born in the future.

Posted by: nvogel Dec 2 2019, 03:34 PM

There's no need for anything like DNA in most cases. Having the same person signed-up multiple times for a business relationship isn't necessarily a problem.

I might open a second account with Amazon under a different login just so that I can discreetly buy a birthday present for my wife. That's not a problem for Amazon as long as I'm not trying to defraud them. In fact de-duplicating and closing my second account might even be considered unethical, contrary to data protection laws and/or a breach of terms and conditions. The business key for my accounts would be the login name which would be unique for each account (probably a unique email address would be required as well).

In banking there has to be a "know your customer" process that ensures accounts get assigned to a suitably identified person. Multiple identification methods can be used but what matters is due diligence, compliance with regulations and achieving a satisfactory level of fraud-prevention.

Posted by: FrankRuperto Dec 2 2019, 05:35 PM

WildBird,

Parents usually name twins differently to disambiguate, so your example is an extreme rarity. When detecting possible dups, users should investigate the data against realworld facts before merging data. A good design is to never physically delete records, rather logically delete using a flag field and selecting a reason code. Implementing a related history table that tracks any profile changes also provides an audit trail. Government agencies like Social Security and Dept. Motor Vehicles combine records so they can reference history of all data. There are people who legitimately have more than one socail security number, legally changed their names, etc. Fingerprints and DNA are not totally unique, but they are 99.98% reliable. Scanning photo ID's that have https://www.aamva.org/uploadedFiles/MainSite/Content/SolutionsBestPractices/BestPracticesModelLegislation(1)/BarCodeDataEncodingReqmtsBestPractice.pdf helps prevents data entry errors, but the data on the photo ID could also be wrong. But atleast we are able to minimize the probabilities of duplicate records and incorrect data.

nvogel,

Some business cases warrant having only one customer profile record per person. Banking systems use a centralized CIF (Customer Information FIle) per customer. One customer can have several accounts, but the are all linked to the customer's unique CIF record. If bank discovers more than one CIF for the same customer, they will merge all the detail data to the CIF that will remain active and inactivate (logically delete) the other dup CIF records.

Posted by: WildBird Dec 2 2019, 05:58 PM

QUOTE
Parents usually name twins differently to disambiguate, so your example is an extreme rarity.


Haven't you seen The Newhart show? Hi, I'm Larry, this is my brother Darrel, and my other brother, Darrel.

And I wouldn't say extreme rarity, there could be a lot who have the same first initial. I worked overseas in a few developing countries. They often don't know their birthdays, just maybe year and month.

To me, a good design caters to the lowest common denominator if possible.





Posted by: FrankRuperto Dec 2 2019, 07:00 PM

WildBird,

I agree, I have been overseas where tons of people have identical last and first names, names like "Harish Patel", "Shamin Begum", "John Smith", "Mario Di Roma", etc.
In a Crow Indian Reservation close to where I used to live in Billings Montana USA, almost everyone's last names were "Moccasin", "Yellowtail", "Curley".
Parents in the old days named their children with a given name and a surname describing where they're from, the family's trade/skill, etc. and these names have propagated throughout generations to present day.
So we can't rely on just surname's and given names, rather we have to gather as many attributes possible from each person and use a combination of all that to build a unique profile.

Posted by: WildBird Dec 2 2019, 08:37 PM

Frank,

We all know names are in no way unique. I myself work sitting next to someone with same first name, and a few others in the immediate vicinity, so I get Coop, not Stephen. Its just white noise now when I hear Stephen.

What I have issues with is people thinking fingerprints or DNA are in fact unique. While I agree they can be used to differentiate i.e. you can say that they are different, I don't believe it is possible to say they are unique.

Take a simple example. You have a database of 100 million customer records. You are looking for a Bob Smith (assume we search all variations of Bob, Robert, Bobby, etc.), born 1st January, 2000. A simple SQL search finds 1. Does this mean that there is only 1 Bob Smith, born 1st January, 2000? In this dataset, yes. But with 100 million customers, there must be some history. So you look at archive tables, to expand the data set. Search still only turns up 1. Does this mean there is only 1 Bob Smith, born 1st January, 2000? Well no, because he may not have been a customer. So even if you looked at all the providers in the country, and in their archive tables, there might not be another Bob Smith in the country. But what about overseas? Even if you looked at every provider, and every archive table, there is still a chance that there was another Bob Smith, born on the same day, but never was a customer anywhere in the world.

My point is, that comparing fingerprints or DNA, this is simply a number of fields, more than the simple example of names and date of birth, against a known dataset. The issue is that it is impossible to compare against every single person, living or dead. If we are talking about a crime and literally life or death, then I can only see DNA and fingerprints being able to be used to say 2 samples DONT match.

I also feel we have moved away, ever so subtly, from the original post :-)

Posted by: nvogel Dec 3 2019, 05:30 AM

Is this even relevant to database design or data management? It's not a problem because hardly anyone is interested in identifying customers by DNA fingerprints. Millions of people and their database systems work just fine with other identification schemes because they are perfectly adequate for their purpose.

Posted by: gemmathehusky Dec 3 2019, 07:06 AM

I think you can take it for granted that fingerprints are unique. The chance that 2 different people with the same name have the same fingerprints but are different people are vanishingly small.

If all this wasn't the case, then replication keys wouldn't work either. Access (other databases?) are predicated on the fact that a random numeric key won't be duplicated, and there are only 2 billion long integers. A database for every person in the US with a population of 300m, might start getting random hits for key numbers only, though. You would be getting serious issues with China, and a population of 1.4billion.

I assume a bigint uses an 8-byte integer, not a 4-byte integer.


@nvogel
With regard to data systems being "adequate" - see this example in the UK
https://www.bbc.co.UK/news/UK-politics-eu-referendum-35959949

reported "immigration" is a third of the number of NI (ie SSN) numbers requested.

I wonder if there are stats available for the US. Apparently you have used about half the capacity of SSN numbers available.

Posted by: FrankRuperto Dec 3 2019, 07:44 AM

If fingerprints are good enough evidence for a court to decide if a person is guilty or not, then they are reliable enough to uniquely identify someone in a database ohyeah.gif
So getting back to the OP's question, using just LastName and FirstName is not going to be good enough to prevent duplicates and the OP will have to periodically check for them.

P.S. Dave, that BBC link you posted is 404.