Full Version: Aggregate Totals Of All Combinations
UtterAccess Discussion Forums > Microsoft® Access > Access Queries
lex
Hello.

I have four columns and want to get aggregate totals (min/max/avg) for every combination of values in the four columns. What is the best way of doing this? Do I need to write vba to do this?

Thank you for your time.
Lex
Doug Steele
CODE
SELECT Min(Field1) AS MinField1, Max(Field1) AS MaxField1, Avg(Field1) AS AvgField1,
Min(Field2) AS MinField2, Max(Field2) AS MaxField2, Avg(Field2) AS AvgField2,
Min(Field3) AS MinField3, Max(Field3) AS MaxField3, Avg(Field3) AS AvgField3,
Min(Field4) AS MinField4, Max(Field4) AS MaxField4, Avg(Field4) AS AvgField4
FROM MyTable
lex
Thanks, Doug - but I don't think I was being clear.

I have a dataset that includes four columns (Route, LatPos, Bound, LayerTyp), and there many be many different Routes, LatPos's, Bound's, and LayerTyp's (user definable).

I would think I need to write code to create the sql statement on the fly, looping through each one, creating a different sql statement for each combination - unless there is another way in access.

Am I making sense?

Thanks
MadPiet
QUOTE (lex @ May 23 2012, 04:14 PM) *
Thanks, Doug - but I don't think I was being clear.

I have a dataset that includes four columns (Route, LatPos, Bound, LayerTyp), and there many be many different Routes, LatPos's, Bound's, and LayerTyp's (user definable).

I would think I need to write code to create the sql statement on the fly, looping through each one, creating a different sql statement for each combination - unless there is another way in access.

Am I making sense?

Thanks


Sounds to me like you're looking for at least a partial cartesian product. No VBA required. If all the possible values for (Route, LatPos, Bound, LayerTyp) are in one or more tables, you could do something like this:

SELECT t1.Route, t2.LatPos, t3.Bound, t4.LayerTyp
FROM t1, t2, t3, t4
WHERE ... <apply any filters you need here>

effectively what happens is you get ALL possible combinations of the four sets. Then you can filter that any way you like. The easiest way to get your head around this is to start with a really small table or two (like 2-4 records per table), and play around with the queries so you can see and visually verify the results. Then once you are sure that you are getting the results you want, select from the real datasets/tables...
lex
Thanks, MadPiet.

But to formulate the sql, I need to know the values in each set - therefore, I must first query the each of the four columns, then formulate the combinations writing the sql on the fly - I don't see any other way of doing it other than writing vba.

I'm trying to automate it.

Again, thanks
Lex
Doug Steele
Maybe it's just me, but it's definitely not clear what you're trying to do.

Can you present a small example? (Note that I'm not talking about attaching a sample database. I'm suggesting giving a small example in words, showing some sample data)
lex
ID, Route, LatPos, Bound, LayerTyp, Thickness
1, I-275, Shoulder, NB, Agg, 6.0
2, I-275, Main1, SB, Sand, 10
3, I-275, Main1, SB, Sand, 9
4, I-275, Main1, SB, Agg, 4
5, I-275, Main2, NB, Sand, 10
6, I-94, Main2, SB, Sand, 10
7, I-94, Main2, SB, Agg, 4
8, I-94, Main2, SB, HMA, 4
9, I-275, Main2, EB, Concrete, 6
... (hope that enough to get the point across)

I want to formulate every combination of the VALUES for the Route (ie there are two - I-275 and I-94), LatPos (there are three), Bound (there are three), LayerTyp (there are four) and get aggregate totals (min/max/avg) for the thicknesses.

Again, I'm trying to automate it, and don't know all the values as I sit here.

Anyhow - thanks.
Lex
RAZMaddaz
If I do a Crosstab Query and do what I think you want, below is what I get. Is this anything like what you want?


Route LatPos LayerTyp EB NB SB Total Of Thickness
I-275 Main1 Agg 4 4
I-275 Main1 Sand 19 19
I-275 Main2 Concrete 6 6
I-275 Main2 Sand 10 10
I-275 Shoulder Agg 6 6
I-94 Main2 Agg 4 4
I-94 Main2 HMA 4 4
I-94 Main2 Sand 10 10
lex
Thanks, RAZMaddaz - but I think there are more combinations than that.

Combining the two Routes with the 3 LatPos - gives 6 combinations
those 6 with the 3 bounds - gives 18 combinations
those 18 with the 4 LayerTypes - gives 72 combinations

I've not worked with Crosstabs before - do you think it will do that?

Lex
RAZMaddaz
Try using the Crosstab wizard and see if you are to get more combinations, remember I'm only using the data you supplied above.

RAZMaddaz
RAZMaddaz
Here is a copy of the Table and Query I created. It is in Access2007, so if you have something before Access2007, let me know and I'll save the dbase in an earlier version.
lex
Thanks RAZMaddaz. Again, I've not used crosstabs before - but will try it (may be a while before I get into it).

So you entered it just I showed? Again - I see 72 combinations - so I don't see why the output you show below only has 8 rows. There should be 72 rows with min/max/avg thicknesses for each combination (given I didn't generate much data - the min/max/ave may all the close to the same - but there should still be 72 rows).

Thanks again
Doug Steele
From what you're describing, it sounds as though MadPiet's correct that you need to use a Cartesian Product. Did you try his suggested SQL?
MadPiet
Here's the Cartesian Product one... it's just a query with a bunch of tables but no joins between them...

CODE
SELECT DISTINCT LexData.Route, LexData_1.LatPos, LexData_2.LayerTyp, LexData_3.EB AS Thickness
FROM LexData, LexData AS LexData_1, LexData AS LexData_2, LexData AS LexData_3;


The DISTINCT is there to remove the duplicates that I was getting for some reason - probably because the nature of a Cartesian Product query, but no matter. I just dropped in your table "Lex_Data" 4 times and didn't join them (as you can see), and I got I think 96 records. then you can just create a simple totals query to get MIN, MAX and AVG of Thickness or whatever, and just group by the other columns. The numbers didn't look like they made sense, but then I didn't really look at them - maybe because it's fake data.

CODE
SELECT qcpRoute_LatPos_LayerType_Thickness.Route, qcpRoute_LatPos_LayerType_Thickness.LatPos, qcpRoute_LatPos_LayerType_Thickness.LayerTyp, Min(qcpRoute_LatPos_LayerType_Thickness.Thickness) AS MinOfThickness, Max(qcpRoute_LatPos_LayerType_Thickness.Thickness) AS MaxOfThickness, Avg(qcpRoute_LatPos_LayerType_Thickness.Thickness) AS AvgOfThickness
FROM qcpRoute_LatPos_LayerType_Thickness
GROUP BY qcpRoute_LatPos_LayerType_Thickness.Route, qcpRoute_LatPos_LayerType_Thickness.LatPos, qcpRoute_LatPos_LayerType_Thickness.LayerTyp;


I could be wrong, but I think that's at least close to what you're after. Crosstab is a red herring... that's an entirely different beast.
lex
Hello, again. Sorry for taking so long to reply. I'm a dirt (geotech) engineer by trade and have a full-time job... so sometimes it takes me a while to get to things.

I tried the crosstab and *think*/*thought* that was working (I didn't check against anything real), but then got MadPiet's sql and statement

QUOTE
"Crosstab is a red herring... that's an entirely different beast"


If you could elaborate on the "red herring" - is it not what I should be doing?

In the crosstab, I noticed TRANSFORM and PIVOT and don't know much about it - so I have to look into it.

I don't quite get your cartesian below (when I have somemore time I will try it and report back).

Thanks to all for your comments... please stay tuned as I hope to develop this more from your kinds offerings.

Lex

Doug Steele
"Red Herring" may not have been the appropriate terminology: "may not be what you need in this situation" might be more appropriate.

A cross-tab query only summarizes data that exists in the table. Since you want to summarize data that doesn't exist in the table, you need to use a Cartesian Product to ensure that all combinations do, in fact, exist. Not sure whether my articles Uses for Cartesian Products in MS Access and Reporting what's not there would help or hinder the explaination!

That being said, once you've created a query that returns a Cartesian Product, you could use a crosstab query against it for certain.

(Thinking a little more about this, I'm hard-pressed to understand what you're trying to do! Why do you need to include readings you have not obtained in your summary of the data?)
lex
Thanks, Doug.

Let me just say - I'm honored to have you reading my post. You've already provided me alot of help with your web postings - so Thanks.

I took a brief look at your article but will need to have more time to read it (it's rather long) - but maybe I can address your question

QUOTE
Why do you need to include readings you have not obtained in your summary of the data?


I think what you mean (restating?) is Why would I report nulls (get aggregates on nulls). Several reasons - but probably the first is I'm not smart enough to do otherwise. I'm working in a field that is very subjective and changing. My sample data below is not true data (if more time - I could post additional - but it's ubiquitous). The columns stated are just a start, but essentially, my goal is to allow the user to select the combinations of the columns to obtain aggregate totals (ie specific value of "Route" AND specific value of "Route", AND specific value of "LatPos, AND specific value of "Bound with a cross-tab on "LayerTyp".

Anyhow - thanks again. I obviously need more time to think - and hopefully will have some this weekend.

Have a great holiday!

Lex
Doug Steele
Unless the sample data you posted is extremely similar to your true data, not posting real data can often confuse the issue...

Looking at the data you posted, I don't see what purpose calculating minimum and maximum serves. I-275 is the minimum route, I-94 is the maximum. So what? And how do you expect to get an average for text values? And if no instances of particular layer type exist on a given route, what point to calculating minimum, maximum and average thicknesses of that layer type for that route?

Actually, this isn't a holiday here in Canada: last Monday (May 21) was (it commemorated the birthday of Queen Victoria, who was queen of England when Canada was granted independence.) But then, I just recently retired, so every day is a holiday! laugh.gif You have a good holiday though.
lex
Hello again, Doug.

Sorry for not posting real data. I don't know how much space on the site here I should devote to my data. Maybe I should have posted a file. Also, this is a very long story, so I don't always feel it is beneficial to post all the gory details - but have decided to type some out here.

Click to view attachment

Attached is the file I am working with. It is actually data out of a geotechnical program that is based on Access. There are datapoints missing that should be added to tables within the other program. I have to pitch this to those working with it as they don't "get" things like normalization, etc. The program documents soil (and pavement) strata from the field. As a note - soil is a very non-exact science and I believe those working with it tend to act in that manner. I have been struggling with that for years.

The office has been looking at this data, using excel to determine aggregate totals for the thicknesses of "layers" (again - non-exact - a layer is not always well defined),and outputting to a Word document in memo format (aggregate totals actually end up in a table within Word). There are "layers" defined from the field, then there are different "layers" defined in the office based on what the field says (even what and who the field is gets convoluted as many different people touch the data before it gets to this point).

My plan of the "pitch" is to show benefit using Access (create dog-and-pony to show benefit for change), and then *possibly* change the way they create/manage their data.

So, the pertinent data from this program is as follows

table POINT, GintRecID, PointID, Location

table LITHOLOGY, GintRecID, PointID, Depth, Description

Again, the above tables are output from another program.

Within the POINT.location lies the Route, LatPos, and Bound data (basically, they look at it and type/sort in Excel).
within the LITHOLOGY.Description lies the LayerTyp
within the LITHOLOGY.Depth lies the Thickness

My plan was to create temporary tables to append POINT and LITHOLOGY using values from lookups (one for each of the Route, LatPos, Bound, LayerTyp) and forms to allow them to read from the tables and select appropriate values. The thickness is (I *believe*) is code that I must tackle by once they select the LayerTyp, finding adjacent layers of the same LayerTyp, determining the thickness, and perform the necessary aggregate totals and output them.

Now, *maybe* I have developed my problem statement appropriately... ? I'm sorry if I have made this confusing.

In addition to the above, I was planning on having the system propose values for Route, LatPos, Bound, and LayerTyp to the user for verification. This would require searching for specific values within POINT.location and LITHOLOGY.Description, showing the proposed values for verification/updating within my proposed tables tblLithAddndm and tblPointAddndm.


Getting back to some of your questions...

As I sit here, I am not sure what values of Route, LatPos, Bound, and LayerTyp will exist for the dataset - so my plan was to display aggregate totals of every combination of them (including nulls), and let them cut/paste or type into MS Word.

I read through your article and found the statement

QUOTE
"a crosstab query will generate Null values if there is no value for a particular intersection of the left-side field and the top field, like regular Select queries".


I believe that you took issue with my wanting to generate nulls - but given that my users are reading and transcribing (under my current plan) I don't see an issue. A null simply means that the combination doesn't exist.

With respect to

"
QUOTE
I don't see what purpose calculating minimum and maximum serves."


I think you can see now that I'm only calculating aggregate totals on the thickness.

So, in summarizing, I think I'm putting the cart before the horse a little. I should figure out how to get my "layer" thicknesses first (I'm thinking code to find adjacent "layers" that are the same), then try to get my aggregate totals. If you have any thoughts at all about what I'm doing - please share them.

I'm sorry for not realizing you were not in the US. I easily forget that I may be talking with those outside the US. I'm glad to hear that you are retired! I hope you enjoy every minute - again - thanks for your help. I enjoy figuring these things out too.

Lex

This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.