UtterAccess.com
X   Site Message
(Message will auto close in 2 seconds)

Welcome to UtterAccess! Please ( Login   or   Register )

Custom Search
 
   Reply to this topicStart new topic
> Interquartile Range, Access 2016    
 
   
top_jimmy44
post Dec 21 2017, 10:38 AM
Post#1



Posts: 480
Joined: 7-December 04



I am working on a project where I need to identify outliers in a set of data and use that identification as a "trigger" to inform users that there could be an issue. My thought was to use the interquartile range to identify the outliers.

I've found a few sources online, the most promising being the following - http://access.mvps.org/access/queries/qry0019.htm

The code in that link identifies the percentile, but looks specifically at the entire data set. My data table has numbers for different categories so instead of finding percentile for the entire data set, I need to find the percentiles for each category. This function looks promising but I'm not sure how I would go about changing the code so that I can identify XPercentile by category, not for the entire data set.

Can it be done or is there another way to go about calculating this in Access?

I also read that it is possible to "Employ Excel's statistical functions from Access by calling the Excel application reference in the VBA code window. Define an array and then use the QUARTILE() function from the Excel Object Library." The only problem is I don't know how to make that happen either.

I did search for Interquartile range on UtterAccess but came back with 0 results. I guess I'm the first.

Thanks.
Jim
Go to the top of the page
 
GroverParkGeorge
post Dec 21 2017, 10:52 AM
Post#2


UA Admin
Posts: 31,632
Joined: 20-June 02
From: Newcastle, WA


Statistics is not my expertise, but speaking strictly from a SQL perspective, "...I need to find the percentiles for each category...", could you not create a union query that would bring in the calculations from subqueries filtered for each category?

--------------------
Go to the top of the page
 
top_jimmy44
post Dec 21 2017, 11:05 AM
Post#3



Posts: 480
Joined: 7-December 04



That is true. UNION ALL would bring them together, but I have over 300 categories in this data set. That may get a bit cumbersome.
Go to the top of the page
 
GroverParkGeorge
post Dec 21 2017, 11:46 AM
Post#4


UA Admin
Posts: 31,632
Joined: 20-June 02
From: Newcastle, WA


Ouch. Well, that's a non-starter.

Plan B. A work table for each category and run the final report against that.

You could open a recordset based on categories and iterate it to run the parameterized queries to put each record into it.

Temp tables tend to be a problem because of bloat, but that may be a reasonable tradeoff.

--------------------
Go to the top of the page
 
top_jimmy44
post Dec 21 2017, 11:50 AM
Post#5



Posts: 480
Joined: 7-December 04



I know that Quartile is fairly simple within Excel so I may just pull a sample data set and run it through Excel first, see if this calculation will be a good predictor of what I am trying to identify. If that proves true, I'll move forward with it. I just don't want to put in all this work to make this happen in Access only to find it doesn't help.
Go to the top of the page
 
theDBguy
post Dec 21 2017, 12:02 PM
Post#6


Access Wiki and Forums Moderator
Posts: 71,430
Joined: 19-June 07
From: SunnySandyEggo


Hi,

Pardon me for jumping in but I seem to recall posting a sample code in the forums on how to execute Excel functions from within Access. I'll see if I can remember where I put it...

--------------------
Just my 2 cents... "And if I claim to be a wise man, it surely means that I don't know" - Kansas
Microsoft Access MVP | Access Website | Access Blog | Email
Go to the top of the page
 
haresfur
post Dec 24 2017, 02:21 AM
Post#7



Posts: 225
Joined: 4-April 12
From: Bendigo, Australia


This isn't an answer to the Access part of the question, but if you use the interquartile range as a trigger then you guarantee that half your records will be identified as potential outliers. Is that really what you want?

--------------------
-- Evan
Go to the top of the page
 
top_jimmy44
post Jan 3 2018, 08:13 AM
Post#8



Posts: 480
Joined: 7-December 04



Sorry it took so long to reply. Holiday break from work keeps me offline.

I do not want over half of the records to be considered outliers. I only want "extreme outliers" meaning I want to identify something that is wrong or out of the ordinary in a major way. For example, if I were to use monthly deposits as just a example, let's say each month, for the past year, I receive a deposit from the same source anywhere between $60-75 and then the next month I receive a deposit of $250, that would be out of the ordinary. If it were 80-85, it would be more than usual, but nothing extreme.

Go to the top of the page
 
GroverParkGeorge
post Jan 3 2018, 09:09 AM
Post#9


UA Admin
Posts: 31,632
Joined: 20-June 02
From: Newcastle, WA


In order for a computer system, such as Access, to identify an "extreme outlier" from a merely "interesting outlier", you'll need to provide a specific, measurable criteria to each measured result. e.g. "Is it more than 10% greater, or smaller, than any previous value?" or "Is it more than 10 points higher, or lower, than the average of all previous values. Or whatever "extreme outlier" means to you.

Unlike people, computers can't apply situational logic without a set of rules to follow. My favorite analogy is that ambiguity is required for humor, poetry and politics. It is anathema to computers.

So, calculate your values, max, min, average, mean, whatever. Define the ranges of "normal" and "abnormal" results. Place those ranges into a table against which you can compare those calculated results.

--------------------
Go to the top of the page
 
top_jimmy44
post Jan 3 2018, 02:24 PM
Post#10



Posts: 480
Joined: 7-December 04



Sounds like a plan. I did find another interesting problem though.

My initial thought was to identify my counts by week number, then identify the current week using Now(), last week using Now()-1, last 4 weeks using Now()-4, etc. Came back from holiday break and my queries were showing nothing. I was comparing last week counts to the mean, mean for last 4 weeks but since we are in Week 1 of 2018, Now()-1 = 0 (week 0 does not exist).

Is there a workaround for this? For example, if I want to compare the total for week 1 of 2018 to the total from week 53 of 2017, or if I want to compare total for week 1 of 2018 to the last 4 weeks of 2017 (50, 51, 52, 53), how would that work? I'm thinking I'd need to just ignore the week numbers and work with DateDiff and identify the prior week and last 4 weeks by range.
Go to the top of the page
 
GroverParkGeorge
post Jan 3 2018, 03:11 PM
Post#11


UA Admin
Posts: 31,632
Joined: 20-June 02
From: Newcastle, WA


There is a way to do this, I think, by combining both Week and Year, Year([YourDateGoesHere]) & DatePart("WW", [YourDateGoesHere])

That would yield, for example, 20181 from

Year(#1/3/2018#) & DatePart("ww",#1/3/2018#)

Further refinement might be useful, but that would generate unique values for every week in every year within a typical range, i.e. one that your data is likely to include.

--------------------
Go to the top of the page
 


Custom Search
RSSSearch   Top   Lo-Fi    21st January 2018 - 03:39 PM