UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> SQL Code To Remove Duplicates    
 
   
lclayton1997
post May 28 2019, 01:21 PM
Post#1



Posts: 162
Joined: 2-September 15



I have two tables.
BGE_returns which I named R
and
Product which I named P

I need to join the two based on the left 6 digits of Returned_sku from R and the Prod_ID from P

I wrote the code but it brings back two records for each (exactly doubling the database results)

I tried Distinct, but that eliminated too many records (not exactly half)

I've tried several inner join commands but I'm not familiar enough with SQL to write it properly.
Could someone please help me?

CODE
SELECT
R.own_co_code,
R.sell_co_code,
R.ord_id,
R.returned_sku,
R.return_date,
r.pick_location,
substr(digits(p.CORP_ID),4,2) || '-'||
substr(digits(p.SERIES_nbr),6,5) || '-'||  
substr(digits(p.PRODuct_series_SEQ),3,3) as sku


from
aradmin.bge_returns r,
aradmin.product p

where
r.return_date between '2019-03-30' and '2019-05-04'
and r.cntry_code = '01'
and r.PICK_LOCATION = 'REPAK'
and r.own_co_code <> '18'
and left(r.returned_sku,6) = p.prod_id
Go to the top of the page
 
GroverParkGeorge
post May 28 2019, 01:35 PM
Post#2


UA Admin
Posts: 35,307
Joined: 20-June 02
From: Newcastle, WA


The first question to address would be whether these are truly duplicates (i.e. the same record returned twice) or the same records returned twice from the One side table because there are two matching records in the table on the one side.

--------------------
My Real Name Is George. Grover Park Consulting is where I do business.
How to Ask a Good Question
Beginning SQL Server
Go to the top of the page
 
lclayton1997
post May 28 2019, 01:42 PM
Post#3



Posts: 162
Joined: 2-September 15



I believe the latter is correct.
there are two resulting from the p table

because when I run it without the r. = p.
I only get one record for each.
Go to the top of the page
 
GroverParkGeorge
post May 28 2019, 01:58 PM
Post#4


UA Admin
Posts: 35,307
Joined: 20-June 02
From: Newcastle, WA


So, we have a more or less expected result, where two (or more) records in a many side table are returned in a query because the join used includes the related records from the joined tables, and there are two (or more) many-side records for each one-side record.

You have to figure out how to limit the returned recordset to one of those.

I think the reason you saw a different result with DISTINCT is probably that there are not always two many-side records for each one in the one-side table, so that changes the record count to only those that are truly unique.

One approach might be to use a subquery that returns ONLY the required records (without duplicates) from the many-side table, and join that to the one-side table to get that unique result set.

--------------------
My Real Name Is George. Grover Park Consulting is where I do business.
How to Ask a Good Question
Beginning SQL Server
Go to the top of the page
 
cheekybuddha
post May 28 2019, 02:00 PM
Post#5


UtterAccess VIP
Posts: 11,419
Joined: 6-December 03
From: Telegraph Hill


Perhaps try it this way:
CODE
SELECT
  r.own_co_code,
  r.sell_co_code,
  r.ord_id,
  r.returned_sku,
  r.return_date,
  r.pick_location,
  substr(digits(p.CORP_ID),4,2) || '-'||
  substr(digits(p.SERIES_nbr),6,5) || '-'||  
  substr(digits(p.PRODuct_series_SEQ),3,3) as sku
FROM aradmin.bge_returns r
INNER JOIN aradmin.product p
        ON p.prod_id = LEFT(r.returned_sku, 6)
WHERE r.return_date BETWEEN '2019-03-30' and '2019-05-04'
  AND r.cntry_code = '01'
  AND r.pick_location = 'REPAK'
  AND r.own_co_code <> '18'

--------------------


Regards,

David Marten
Go to the top of the page
 
lclayton1997
post May 28 2019, 02:10 PM
Post#6



Posts: 162
Joined: 2-September 15



That resulted in two times the data as well.

there are 24,084 records before the join

this returns 48,168 (exactly double)
Go to the top of the page
 
cheekybuddha
post May 28 2019, 02:21 PM
Post#7


UtterAccess VIP
Posts: 11,419
Joined: 6-December 03
From: Telegraph Hill


In which case, I suspect that George is on the money.

Have you tried a sub query?

--------------------


Regards,

David Marten
Go to the top of the page
 
lclayton1997
post May 28 2019, 02:44 PM
Post#8



Posts: 162
Joined: 2-September 15



Yes I believe this will work, however I don't know how to do that.
Could you please expand?

I'm researching also but not getting anywhere frown.gif

This is what I have so far but returns zero records:

CODE
SELECT*
FROM(SELECT
  r.own_co_code,
  r.sell_co_code,
  r.ord_id,
  left(r.returned_sku,6) as sku,
  r.return_date,
  r.pick_location
  

from aradmin.bge_returns r

WHERE r.return_date BETWEEN '2019-03-30' and '2019-05-04'
  AND r.cntry_code = '01'
  AND r.pick_location = 'REPAK'
  AND r.own_co_code <> '18')
as DETAIL

INNER JOIN aradmin.product
        ON  detail.sku= aradmin.product.prod_id





and detail.sku in

(select p.prod_id
from aradmin.product as p
group by p.prod_id
having count(p.prod_id)=1)

order by detail.sku

This post has been edited by lclayton1997: May 28 2019, 02:45 PM
Go to the top of the page
 
cheekybuddha
post May 28 2019, 03:12 PM
Post#9


UtterAccess VIP
Posts: 11,419
Joined: 6-December 03
From: Telegraph Hill


Your query seems along the right lines...

Just got to pop out for half an hour,

but out of curiosity, what happens when you run:
CODE
SELECT
  LEFT(r.returned_sku, 6),
  COUNT(*) AS returns
FROM aradmin.bge_returns r
GROUP BY LEFT(r.returned_sku, 6)
HAVING COUNT(*) > 1
;


Will look more when I get back, or George or others will look in before

--------------------


Regards,

David Marten
Go to the top of the page
 
lclayton1997
post May 28 2019, 03:19 PM
Post#10



Posts: 162
Joined: 2-September 15



returns with 90,201 rows
Go to the top of the page
 
cheekybuddha
post May 28 2019, 03:40 PM
Post#11


UtterAccess VIP
Posts: 11,419
Joined: 6-December 03
From: Telegraph Hill


I thought you said it only had 24,084 records.

--------------------


Regards,

David Marten
Go to the top of the page
 
lclayton1997
post May 28 2019, 03:51 PM
Post#12



Posts: 162
Joined: 2-September 15



i copied your code verbatim but it is missing all the where stmts.

i did this and it returned 2,756 rows

CODE
SELECT
  LEFT(r.returned_sku, 6),
  COUNT(*) AS returns
FROM aradmin.bge_returns r
WHERE r.return_date BETWEEN '2019-03-30' and '2019-05-04'
  AND r.cntry_code = '01'
  AND r.pick_location = 'REPAK'
  AND r.own_co_code <> '18'
GROUP BY LEFT(r.returned_sku, 6)
HAVING COUNT(*) > 1
;


Sorry I don't know coding that well.
When I run the query without the second table and with the where stmts I get 24,084 records.
the first table "r" has the Left(r.returned_sku, 6) which corresponds to table "p" prod_id.

Go to the top of the page
 
lclayton1997
post May 28 2019, 04:05 PM
Post#13



Posts: 162
Joined: 2-September 15



I got this to return the 24,084 records and the headings from both tables, but there is nothing in the cols from the P table.
It should be showing the prod_id. I think I need to put the fields in here somewhere but where?

CODE
SELECT*
FROM(SELECT
  r.own_co_code,
  r.sell_co_code,
  r.ord_id,
  left(r.returned_sku,6) as sku,
  r.return_date,
  r.pick_location
  

from aradmin.bge_returns r

WHERE r.return_date BETWEEN '2019-03-30' and '2019-05-04'
  AND r.cntry_code = '01'
  AND r.pick_location = 'REPAK'
  AND r.own_co_code <> '18')
as DETAIL

left JOIN  aradmin.product
        ON  detail.sku = aradmin.product.prod_id

and detail.sku in

( select p.prod_id
from aradmin.product as p
group by p.prod_id
having count(p.prod_id)=1)

order by detail.sku
Go to the top of the page
 
cheekybuddha
post May 28 2019, 04:20 PM
Post#14


UtterAccess VIP
Posts: 11,419
Joined: 6-December 03
From: Telegraph Hill


Is prod_id the primary key of table aradmin.product? If so, then there should only ever be a COUNT(prod_id) = 1 for every record since it will be unique, and therefore not necessary in the WHERE clause.

Does this get you closer?
CODE
SELECT
  r.own_co_code,
  r.sell_co_code,
  r.ord_id,
  r.sku,
  r.return_date,
  r.pick_location,
  substr(digits(p.CORP_ID),4,2) || '-'||
  substr(digits(p.SERIES_nbr),6,5) || '-'||  
  substr(digits(p.PRODuct_series_SEQ),3,3) as prod_sku  
FROM (
  SELECT
    own_co_code,
    sell_co_code,
    ord_id,
    LEFT(returned_sku,6) as sku,
    return_date,
    pick_location
  FROM aradmin.bge_returns
  WHERE return_date BETWEEN '2019-03-30' and '2019-05-04'
    AND cntry_code = '01'
    AND pick_location = 'REPAK'
    AND own_co_code <> '18'
) r
INNER JOIN aradmin.product p
        ON r.sku = p.prod_id
ORDER BY r.sku
;

--------------------


Regards,

David Marten
Go to the top of the page
 
lclayton1997
post May 28 2019, 04:33 PM
Post#15



Posts: 162
Joined: 2-September 15



Yes Prod_Id is a primary key.
still duplicates the records = 48,168
I tried left join and still got 2 for every record = 48,168

My last code got me the 24,084 with the columns from both tables, but the data was blank from the second (p) table.
I can't figure out why.
Go to the top of the page
 
lclayton1997
post May 28 2019, 04:51 PM
Post#16



Posts: 162
Joined: 2-September 15



I found the problem.
There is a hidden zero in front of the r.return_sku's that don't have 9 digits.
Could they make this any harder?

That is why the data is not showing but the columns are there.

Now to figure out how to write this to take off leading zeros.
left(r.returned_sku,6) as sku

Any Suggestions for that?
Once that is solved the code will work.

CODE
SELECT*
FROM(SELECT
  r.own_co_code,
  r.sell_co_code,
  r.ord_id,
  left(r.returned_sku,6) as sku,
  r.return_date,
  r.pick_location
  
from aradmin.bge_returns r

WHERE r.return_date BETWEEN '2019-03-30' and '2019-05-04'
  AND r.cntry_code = '01'
  AND r.pick_location = 'REPAK'
  AND r.own_co_code <> '18')

as DETAIL

left JOIN  aradmin.product
  ON  detail.sku = aradmin.product.prod_id
  and detail.sku in
( select p.prod_id
from aradmin.product as p
group by p.prod_id
having count(p.prod_id)=1)

order by detail.sku
Go to the top of the page
 
cheekybuddha
post May 28 2019, 04:55 PM
Post#17


UtterAccess VIP
Posts: 11,419
Joined: 6-December 03
From: Telegraph Hill


I've just seen that you are posting the PHP/MySQL forum.

What database are you running on?

--------------------


Regards,

David Marten
Go to the top of the page
 
lclayton1997
post May 29 2019, 12:31 PM
Post#18



Posts: 162
Joined: 2-September 15



I am only given the SQL Query Tool by my work.
I don't have any database other than to connect by ODBC and write SQL in the query tool.
Makes it very difficult.

Where would I post that?

Thanks for your help.

I have the code now that gives me the correct prod_id.
I ran those against the P table and they came up with the correct information.
But
when I run this code, it still is not giving me the Prod_ID from the second Table.

the one piece i have not added to the code yet to get the product number is this:
substr(digits(p.CORP_ID),4,2) || '-'||
substr(digits(p.SERIES_nbr),6,5) || '-'||
substr(digits(p.PRODuct_series_SEQ),3,3) as sku

I figured I would add it after I could get any of it to come in from the second table.


CODE
SELECT*
FROM(SELECT
  r.own_co_code,
  r.sell_co_code,
  r.ord_id,
REPLACE(LTRIM(REPLACE(  left(r.returned_sku,6),'0',' ')),' ','0') as sku,
  r.return_date,
  r.pick_location
  
from aradmin.bge_returns r

WHERE r.return_date BETWEEN '2019-03-30' and '2019-05-04'
  AND r.cntry_code = '01'
  AND r.pick_location = 'REPAK'
  AND r.own_co_code <> '18')

as DETAIL

left JOIN  aradmin.product
  ON  detail.sku = aradmin.product.prod_id
  and detail.sku in
( select p.prod_id
from aradmin.product as p
group by p.prod_id
having count(p.prod_id)=1)

order by detail.sku

This post has been edited by lclayton1997: May 29 2019, 12:48 PM
Go to the top of the page
 
lclayton1997
post May 29 2019, 01:30 PM
Post#19



Posts: 162
Joined: 2-September 15



I solved it.
My table two had two country codes hence the duplicates.
once I did a where in the code it worked.

CODE
SELECT*
FROM(SELECT
  r.own_co_code,
  r.sell_co_code,
  r.ord_id,
REPLACE(LTRIM(REPLACE(  left(r.returned_sku,6),'0',' ')),' ','0') as sku,
  r.return_date,
  r.pick_location
  
from aradmin.bge_returns r

WHERE r.return_date BETWEEN '2019-03-30' and '2019-05-03'
  AND r.cntry_code = '01'
  AND r.pick_location = 'REPAK'
  AND r.cntry_code = '01'
  AND r.own_co_code <> '18')

as DETAIL

left JOIN aradmin.product
  ON  detail.sku = aradmin.product.prod_id
and aradmin.product.cntry_code = '01'


order by detail.sku
Go to the top of the page
 


Custom Search


RSSSearch   Top   Lo-Fi    19th July 2019 - 05:07 AM