UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> Upsert Query Inserting All Data (including Duplicate), Access 2016    
 
   
Zaddicus
post Jun 7 2019, 04:37 AM
Post#1



Posts: 136
Joined: 3-April 19
From: Cardiff


Guess who's back... (At this point I'm surprised no one from UtterAccess has put a hit out on me... or maybe you have wary.gif )

So, today I'm here regarding Upsert queries. One that I been using in particular for the last month and realised just this morning it's not working as intended.
Let's set the scenario.

I have a mass import process based on multiple files to import event data into our database. Thing is the same data could appear in multiple spreadsheets so I import them all into a staging table and was attempting to use an upsert to insert new/update duplicates. However... It basically just inserts all the data even if it is a "duplicate".

How do I determine if a data is a duplicate you ask? Well, the table the excel sheets are being inserted into is for event attendees. In order to determine whether an 'Attendee' is unique or not we check 3 values in the staging table and compare to the attendee table

ContactID = Do they exist in the master data (This is FK - if they don't exist in contacts table they should not be added as attendee)
EventID = What event are they attending (Also FK)
TypeID = What attendee type are they (Again FK - Sponsor, Delegate, Speaker, etc...)

This seems simple enough however clearly I've missed aq trick because I done goofed and resulted in duplicate data for the last month (A weekend of data cleaning is going to be fun) Anyway, let's look at some sample data.

Sample Data Before Upsert:
CODE
+---------+-----------+--------+-------+
| EventID | ContactID | TypeID | Entry |
+---------+-----------+--------+-------+
|       1 |         1 |      1 |     1 |
|       1 |         2 |      1 |     1 |
|       1 |         3 |      2 |     1 |
|       1 |         2 |      1 |    -1 |
|       2 |         1 |      3 |     1 |
|       2 |         2 |      2 |     1 |
|       2 |         2 |      1 |     1 |
|       2 |         3 |      1 |     1 |
|       3 |         1 |      1 |     1 |
|       3 |         2 |      1 |     1 |
|       3 |         3 |      2 |     1 |
|       3 |         3 |      3 |     1 |
|       3 |         4 |      1 |     1 |
|       3 |         2 |      1 |    -1 |
|       2 |         3 |      1 |    -1 |
+---------+-----------+--------+-------+

Expected Results: (Actual results is just all the data in the table above pullhair.gif )
CODE
+---------+-----------+--------+-------+
| EventID | ContactID | TypeID | Entry |
+---------+-----------+--------+-------+
|       1 |         1 |      1 |     1 |
|       1 |         2 |      1 |     1 |
|       1 |         3 |      2 |     1 |
|       2 |         1 |      3 |     1 |
|       2 |         2 |      2 |     1 |
|       2 |         2 |      1 |     1 |
|       2 |         3 |      1 |     1 |
|       3 |         1 |      1 |     1 |
|       3 |         2 |      1 |     1 |
|       3 |         3 |      2 |     1 |
|       3 |         3 |      3 |     1 |
|       3 |         4 |      1 |     1 |
+---------+-----------+--------+-------+

Basically anyone in the first table with -1 entry is a 'duplicate' as they are the same person going to the same event with the same attendance type.

So now that I've set the scenario and tried my best to explain the data it's time for you to look at the disgusting SQL statement I slapped together:
SQL
UPDATE Out_Data LEFT JOIN In_Data ON (Out_Data.TypeID = In_Data.TypeID) AND (Out_Data.ContactID = In_Data.ContactID) AND (Out_Data.EventID = In_Data.EventID) SET In_Data.EventID = [Out_Data]![EventID], In_Data.ContactID = [Out_Data]![ContactID], In_Data.TypeID = [Out_Data]![TypeID], In_Data.Entry = [Out_Data]![Entry];

I feel that I should maybe add some where clauses (Like <> or is not "" - but I dunno) - Although below is another upsert query I made however it only checks against 1 criteria and does work as intended without a where clause
SQL
UPDATE tbl_Contacts RIGHT JOIN tbl_STG_Suppression ON tbl_Contacts.Email = tbl_STG_Suppression.Email SET tbl_Contacts.Suppress = -1, tbl_Contacts.Email = [tbl_STG_Suppression]![Email];

Would someone be very kind to explain where I have gone wrong in adapting the query.

Thanks in advance,
The hopeless one

--------------------
~ Castiel


Skype: live:castielswift
Go to the top of the page
 
gemmathehusky
post Jun 7 2019, 05:00 AM
Post#2


UtterAccess VIP
Posts: 4,725
Joined: 5-June 07
From: UK


I think you will have to do this vba top process a record set. You can find the duplicate keys with a query, but you can't delete the records.


basically the pseudo code is this


get a query of the records, sorted in sequential order to identify the records
then do this sort of process

CODE
open the query as a recordset

while not end of file
     if current record is different from the previous record then
           save the details for comparison with the next record
    else
          'current record is the same as the previous record so
          report it if you want it reporting
          delete it
   end if

   goto next record
wend
close the recordset





--------------------
Dave (Male)

(Gemma was my dog)
Go to the top of the page
 
Zaddicus
post Jun 7 2019, 05:30 AM
Post#3



Posts: 136
Joined: 3-April 19
From: Cardiff


QUOTE
I think you will have to do this vba top process a record set.

Darn, I was hoping to avoid VBA for this process because VBA process with a lot of data always crash my access and do not make any changes to the tables on the back-end server, however queries work fine.

QUOTE
You can find the duplicate keys with a query, but you can't delete the records

I definitely do not want to delete any data! Just update existing or add new data based on the 3 criteria 'ContactID', 'EventID', 'AttendeeTypeID'.

So for example let's imagine I have a concatenate field (I will not add one though because calculated fields through SSMS are a nightmare and just don't behave properly)
CODE
+---------+-----------+--------+-------------+----------------+
| EventID | ContactID | TypeID | CONCATENATE | Desired Action |
+---------+-----------+--------+-------------+----------------+
|       1 |         2 |      1 |       1.2.1 | Insert         |
|       3 |         1 |      1 |       3.1.1 | Insert         |
|       1 |         2 |      1 |       1.2.1 | Update         |
|       2 |         1 |      3 |       2.1.3 | Insert         |
|       2 |         1 |      2 |       2.1.2 | Insert         |
|       3 |         1 |      1 |       3.1.1 | Update         |
|       3 |         1 |      2 |       3.1.2 | Insert         |
+---------+-----------+--------+-------------+----------------+


Basically I need the upsert query to follow the 'Desired action' where as now all it does is insert everything - The left Join should allow for the update/insert according to a check of the existing data. I know it is possible I'm just not exactly sure what it is I'm missing.



--------------------
~ Castiel


Skype: live:castielswift
Go to the top of the page
 
Zaddicus
post Jun 7 2019, 06:11 AM
Post#4



Posts: 136
Joined: 3-April 19
From: Cardiff


Update! I've made a change to try and get this sorted ASAP as I am currently holding back the entire company and their work due to the data being unavailable! But even with the change the upsert is just NOT BEHAVING!

I added a new field to concatenate the contactID,EventID,TypeID but seperate vaules with - (Basically Calculated field, [ContactID]&"-"&[EventID]&"-"&[TypeID])

I changed the query to LEFT JOIN on this - In theory the upsert should see the value 2-1-1 twice and only insert one recordset/row as the other is a duplicate... but it adds both! because clearly Im useless at a simple SQL update statement.

SQL
UPDATE Out_Data LEFT JOIN In_Data ON Out_Data.Concat = In_Data.Concat SET In_Data.EventID = [Out_Data]![EventID], In_Data.ContactID = [Out_Data]![ContactID], In_Data.TypeID = [Out_Data]![TypeID], In_Data.Entry = [Out_Data]![Entry];



This post has been edited by Zaddicus: Jun 7 2019, 06:13 AM

--------------------
~ Castiel


Skype: live:castielswift
Go to the top of the page
 
gemmathehusky
post Jun 7 2019, 06:30 AM
Post#5


UtterAccess VIP
Posts: 4,725
Joined: 5-June 07
From: UK


Maybe I misunderstood.

If you want to delete the rows with -1 then, a simple query, or similar in code will do it
currentdb.execute "delete * from tablename where entry = -1"


but are you just showing the entry value to be what you would like to see, then?

In which case you need the data sorted in order of the first 3 columns, and then you can use the recordset logic.
The values for the first 3 columns match the previous row for the red items - so they are duplicates - then you can either delete the row, or set an appropriate indicator


EventID | ContactID | TypeID | Entry |
+---------+-----------+--------+-------+
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | -1 |
| 1 | 3 | 2 | 1 |
| 2 | 1 | 3 | 1 |
| 2 | 2 | 2 | 1 |
| 2 | 2 | 1 | 1 |
| 2 | 3 | 1 | 1 |
| 2 | 3 | 1 | -1 |
| 3 | 1 | 1 | 1 |
| 3 | 2 | 1 | 1 |
| 3 | 2 | 1 | -1 |
| 3 | 3 | 2 | 1 |
| 3 | 3 | 3 | 1 |
| 3 | 4 | 1 | 1 |



Alternatively, you could prevent the duplicates occurring by adding a unique key to the table, based on the first 3 columns - you will then get a run time error of some sort (3155 offhand?) , when you try to insert the duplicate. But you won't be able to add the unique key until you get rid of the existing duplicates.

--------------------
Dave (Male)

(Gemma was my dog)
Go to the top of the page
 
Zaddicus
post Jun 7 2019, 07:09 AM
Post#6



Posts: 136
Joined: 3-April 19
From: Cardiff


I think I found the problem, just trying to figure the best solution.

So all data gets imported into a staging table. The upsert then checks the concatenated field against the attendees table and updates the existing records if there is a match and appends any new records.

Problem occurs if a duplicate record exists in the Staging table and not the Attendees table. it will then append both those records even though they are duplicate because they do njot exist in the attendees table.

What I need is a 'dedupe' delete query to run before the upset query.

So before the upsert it deletes any duplicate field leaving 1 record... I was going to try something like

SQL
DELETE * FROM tbl_STG_AttendeeImport WHERE Count tbl_STG_AttendeeImport.Concat >1


However this just doesn't work.

What I need - A Query to remove all duplicate recordsets from the staging table but ensuring at least 1 recordset remains (this will be the unique record)

Once this is done the initial Upsert will work as intended...
This post has been edited by Zaddicus: Jun 7 2019, 07:10 AM

--------------------
~ Castiel


Skype: live:castielswift
Go to the top of the page
 
gemmathehusky
post Jun 7 2019, 07:33 AM
Post#7


UtterAccess VIP
Posts: 4,725
Joined: 5-June 07
From: UK


As I said, I don't think you can do that- you either need a unique key in the staging table to prevent the duplicates getting there, or you have to have a manual (or coded) process to get rid of the duplicates.

The trouble is that Access can identify (say) that there are 3 matching examples of a particular row of data, but it can't tell them apart, and it can't delete all but 1.

There MAY be a way of using dmax to pick just one of each rows, and append those into the main table - If there are no duplicates it doesn't matter. If there are more than one, then it will just pick a random one. Try a dmax query, and see if the count is correct. However, if use dmax, you will possibly find the append query is non-updateable - for the same reason. The dmax query identifies the unique data, but it doesn't know which row it needs to use.

The VBA code is very easy.

--------------------
Dave (Male)

(Gemma was my dog)
Go to the top of the page
 
theDBguy
post Jun 7 2019, 07:34 AM
Post#8


Access Wiki and Forums Moderator
Posts: 75,724
Joined: 19-June 07
From: SunnySandyEggo


Hi. Just curious, have you tried simply setting a multi field unique index on the staging table?

Edit: Oops, sorry, I was a little slow.

--------------------
Just my 2 cents... "And if I claim to be a wise man, it surely means that I don't know" - Kansas
Access Website | Access Blog | Email
Go to the top of the page
 
Zaddicus
post Jun 7 2019, 07:41 AM
Post#9



Posts: 136
Joined: 3-April 19
From: Cardiff


Alas guess I'll just have to accept the fact that there may be the odd duplicate data input into the attendees table as I can not rely on VBA for any functions that involve altering tables/data (Thanks MSSQL Server)

--------------------
~ Castiel


Skype: live:castielswift
Go to the top of the page
 
gemmathehusky
post Jun 7 2019, 08:18 AM
Post#10


UtterAccess VIP
Posts: 4,725
Joined: 5-June 07
From: UK


Why can you not rely on VBA? That's what we all do. We design robust and well tested and de-bugged applications using stored queries, and VBA.


--------------------
Dave (Male)

(Gemma was my dog)
Go to the top of the page
 
Zaddicus
post Jun 7 2019, 08:33 AM
Post#11



Posts: 136
Joined: 3-April 19
From: Cardiff


CODE
Why can you not rely on VBA? That's what we all do. We design robust and well tested and de-bugged applications using stored queries, and VBA.


for Single use on my pc only VBA is awesome and does everything I want it to, the second I split the front end and hosted the back-end on an MSSQL server it went to pot.
All my VBA functions would cause the front-end to crash/not respond eventually resulting in the user having to ALT+F4 for some odd reason with no changes were made on the back-end (EVERY TIME).
I tried running the VBA import process I had written overnight, came back in the next morning and it had errored with 0 changes made to the back-end.

However if I run data changes through a query it was okay with that.

Tl;Dr - Can't use VBA because MSSQL is a steaming pile of dog [censored] and times-out as soon as a process starts.


--------------------
~ Castiel


Skype: live:castielswift
Go to the top of the page
 
MadPiet
post Jun 8 2019, 08:24 PM
Post#12



Posts: 3,171
Joined: 27-February 09



If you're importing the data into a table in TempDB, then why not just use T-SQL for the whole thing?

The "do not insert records that exist already on some table" is easy:

INSERT INTO DestinationTable(field1, field2, …)
SELECT field1, field2, …
FROM #TempTable tt
WHERE EXISTS (SELECT 1 FROM DestinationTable d WHERE d.field1 = tt.field1)

if you wanted todo an update then you could use the same WHERE EXISTS clause or use an inner join to filter out all the records you don't want.
e.g.
WHERE tt.Field1 = otherTable.Field2
This post has been edited by MadPiet: Jun 8 2019, 08:44 PM
Go to the top of the page
 
MadPiet
post Jun 9 2019, 12:05 AM
Post#13



Posts: 3,171
Joined: 27-February 09



How about something like this? If given the choice, I would absolutely do this in SQL Server. ROW_NUMBER() is your friend. And don't try to do the INSERT and UPDATE in a single query. You can use WHEN UNMATCHED, but it's easier to write one INSERT statement and one UPDATE statement and just change the WHERE clause on one so they're opposites of one another. Something like:

First off, take the easy way. You don't need to do this in one query.

CODE
INSERT INTO #TempTable(ContactID, EventID, TypeID)
SELECT ContactID, EventID, TypeID
FROM ...
WHERE....


If there are potentially duplicates in that group (ContactID, EventID, TypeID), you can use ROW_NUMBER() to eliminate them:

CODE
SELECT dupes.ContactID
    , dupes.EventID
    , dupes. TypeID
FROM
    (SELECT ContactID
        , EventID
        , TypeID
        , ROW_NUMBER() OVER (PARTITION BY ContactID, EventID, TypeID) AS rn
    FROM MyTable) dupes
WHERE dupes.rn = 1;


Then you can do whatever you want with that. To find all the records that are in one table and not in another:

CODE
SELECT d.ContactID
    , d.EventID
    , d.TypeID
FROM dupes d
    LEFT JOIN OtherTable o
    ON d.ContactID = 0.ContactID
    AND d.EventID = o.EventID
    AND d.TypeID = o.TypeID;

Then just append that somewhere:

CODE
INSERT INTO FinalTable (contactID, EventID, TypeID)
    SELECT d.ContactID
        , d.EventID
        , d.TypeID
    FROM dupes d
        LEFT JOIN OtherTable o
        ON d.ContactID = 0.ContactID
        AND d.EventID = o.EventID
        AND d.TypeID = o.TypeID;

Go to the top of the page
 
Zaddicus
post Jun 10 2019, 05:25 AM
Post#14



Posts: 136
Joined: 3-April 19
From: Cardiff


QUOTE
How about something like this? If given the choice, I would absolutely do this in SQL Server. ROW_NUMBER() is your friend. And don't try to do the INSERT and UPDATE in a single query. You can use WHEN UNMATCHED, but it's easier to write one INSERT statement and one UPDATE statement and just change the WHERE clause on one so they're opposites of one another. Something like:


All of this looks extremely promising, I shall see if I can get it to work with the current process offline first before attempting to code it into the 'Live' version.

Thanks for this smile.gif

--------------------
~ Castiel


Skype: live:castielswift
Go to the top of the page
 
MadPiet
post Jun 10 2019, 03:04 PM
Post#15



Posts: 3,171
Joined: 27-February 09



Good plan. Never ever run code on live data you haven't tested beforehand. That could be really dangerous.
Go to the top of the page
 


Custom Search


RSSSearch   Top   Lo-Fi    20th July 2019 - 07:31 PM