UtterAccess HomeUtterAccess Wiki

Welcome Guest ( Log In | Register )

Custom Search
Edit Discussion
> Array Formulas    
Array Formulas

Contents

Understanding Array Formulas

I was going to title this article, “Array Formulas Made Easy” but there is no royal road to understanding them. I would also like to acknowledge the patient people who explained it to me.

What is an array formula? Don’t read more into it than the name suggests. An array formula is a formula that gets its information from an array of cells (range) rather than individual cells. So instead of using Cells B2 and C2, the formula might be using ranges B2:B17 and C2:C17.

Array formulas eliminate the use of “helper columns” to do intermediate calculations and applying formulas to them.

When not to use an array formula

There are a number of formulas built into Excel that make using array formulas unnecessary.

The simplest of these are SUM and COUNT. To add or count a single array of cells without any qualification, then these formulas work.

If there is a single criteria associated with the sum or count of elements in the array, SUMIF and COUNTIF work. These formulas add or count one range based on the values in another.

What happens when the need is to add or count the values in one range based on more than one criterion? Excel provides a “regular” formula for this too: SUMPRODUCT. SUMPRODUCT is actually an array formula in disguise and this article gets under the hood to explain how it works since it demonstrates the logic behind array formulas.

Users of Excel 2007 or later also have SUMIFS and COUNTIFS to do the work of SUMPRODUCT. For an environment in which everyone has these formulas, they work well. In a mixed environment, use SUMPRODUCT since it is understood by earlier versions of Excel.

Using array formulas

However, sometimes the need is to do more than add or count things such as: finding the maximum, minimum, average, standard deviation or other aggregates of an array based on values in other arrays. These computations require array formulas.

Array formulas are entered the same way as any other formula except that instead of pressing ENTER to “commit” the formula, press CTRL-SHIFT-ENTER. There are really only two things to know to make array formulas work: (1) False =0 and True = 1 and (2) Any multiple of zero is still zero.

The following mockup of charitable donations is used during the rest of this demonstration:

Image:Array1.jpg

Examining how SUMPRODUCT works gives a good understanding on how array formulas work.

Once again, the name says it all. SUMPRODUCT is simply the sum of the products. The first example explains how SUMPRODUCT computes the number of female contributors from the North Region.

The formula for this is

   =SUMPRODUCT ((B2:B17 = “Female”) * (D2:D17 = “North”)).

The following is a view of the data showing where the conditions are TRUE in green, and where the conditions are FALSE in red.

Image:Array2.jpg

Now remembering that TRUE = 1 and FALSE = 0, the data now looks like:

Image:Array3.jpg

To compute the SUMPRODUCT, first compute the product of the two columns (shown in the “Product” helper column), then compute their sum.

The same formula can also be written as an array formula in a slightly more generic form:

   =SUM ((B2:B17 = “Female”) * (D2:D17 = “North”))

SUMPRODUCT can do more than count; it can also add. To get the dollar amount of contributions from female donors in the North Region, add in the additional term. The formula and data now look like:

   =SUM ((B2:B17 = “Female”) * (D2:D17 = “North”) * (E2:E17))

Image:Array4.jpg

There is no criterion associated with the contribution amount. The contribution doesn’t have to be evaluated as zero or one. It is whatever value it is.

SUMPRODUCT, SUMIFS and COUNTIFS are all that is needed to get a count or sum of items (see also final notes at the end of this article).

To get the other aggregates to work, an array formula is needed. The article already presented how SUM can be used as an array formula.

   =SUM ((B2:B17 = “Female”) * (D2:D17 = “North”) * (E2:E17))

Logically, it would seem that simply replacing SUM with another aggregate would make the formula work such as replacing SUM with MAX to find the maximum contribution made by female contributors in the North Region.

On the surface, this formula seems to work in that the result is $20 which is the correct answer.

The technique falls apart when other aggregates are applied. The expected result for MIN is $1. However

   =MIN ((B2:B17 = “Female”) * (D2:D17 = “North”) * (E2:E17))

Yields zero! What went wrong?

What went wrong is all those “logical zeros”. Taking the sum of the product column is the same as:

   = SUM (0, 0, 0, 0, 0, 0, 0, 0, 20 , 0, 0, 0, 1, 0, 0, 0)

The fact that there are extra zeros in there makes no difference to SUM. However, it does make a difference to other aggregates, even the seemly correct MAX. If instead of positive values for contributions, the numbers were negative then MAX would be evaluating -20, -1 and the rest zeros. MAX then returns one of the logical zeros. This is what is happening with MIN. It is returning a logical zero.

The way around this is to exclude rows that logically evaluate to zero. The best way to create an array formula is to compute it for one row at a time, and then apply it to the entire array.

Write an equation that will either exclude the row or use its value. In this case:

  =IF(B2="Female",IF(D2="North",E2,"Exclude"),"Exclude")

Image:Array5.jpg

Applying MIN to the new helper column evaluates MIN {“Exclude”,20,1} or $1.

To rewrite the equation as an array equation, extend the ranges (E,g, B2 becomes B2:B17) and drop the “Exclude.”

   =MIN(IF(B2:B17="Female",IF(D2:D17="North",E2:E17)))

Note: =IF(AND(B2="Female",D2="North"),E2,"Exclude") will get you the same Exclude Helper Column results, but =MIN(IF(AND(B2:B17="Female",D2:D17="North"),E2:E17)) does not work. This technique only works if the model is created by nesting the IF statements.

Excluding non-qualifying rows not only works for MIN, but for all the aggregate functions. SUM is the only aggregate that can get away with using the shortcut.

A final note

The only requirement for array formulas to work is that the arrays be of the same size.

Typically array formulas are used to work with data on the same set of rows on the same worksheet; however it can also be used to compare ranges that are on other sheets, and even in other workbooks.

Array formulas can also compare offset rows where the decisions may depend on a “record’s” previous value. For example comparing range B3:B102 against C2:C101 or even comparing a column to itself: B3:B102 against B2:B101.

Array formulas can also compare rows to columns as in comparing range B2:B10 down to range C1:K1 across.

In the examples above, array formulas multiply conditions together meaning that the result is true only if each element is true. This technique is analogous to a logical AND statement. The formula could also be crafted to add the results together. This technique is analogous to a logical OR statement.

For example, to find the number of female contributors from the North and South regions, two SUMPRODUCTS can be used:

   =SUM ((B2:B17 = “Female”) * (D2:D17 = “North”)) + SUM ((B2:B17 = “Female”) * (D2:D17 = “South”))

Or the criteria can be combined.

   =SUM((B2:B17="Female")*((D2:D17="North")+(D2:D17="South")))

The latter formula essentially says:

   Count where Gender = “Female” AND (Region = “North” OR Region = “South”)
Edit Discussion
Custom Search
Thank you for your support!
This page has been accessed 9,150 times.  This page was last modified 20:38, 8 July 2013 by dflak. Contributions by Jack Leach and Glenn Lloyd  Disclaimers