Description

Computes basic parametric statistics on one or more arrays.

Return Type

A Variant value.  

A Variant holding a scalar or array of BASIC_STATISTICS UDTs. If ArrayToTest is a 1-D array, then the returned Variant will be a scalar UDT. If ArrayToTest is an N-D array (where N is 2 or more), then the returned Variant will be an N-1 dimesioned array of UDT, where the shape of the array is identical to the shape of ArrayToTest up to the fastest moving dimension.

Syntax

object.McBasicStatistics (ArrayToTest)

The McBasicStatistics Method syntax has these parts:

PartDescription
objectAn expression evaluating to an object of type McOMGlobal.
ArrayToTestRequired. A Variant value.

A one or higher dimensioned array of values on which statistics are to be computed for the fastest-moving dimension. All dimensions except the analyzed one must be one one size.

Remarks

This method computes parametric statistics on one or more arrays of values and returns one or more BASIC_STATISTICS structures filled with the results. For non-parametric statistics, use the McRankedValues method and for basic histogram statistics, use the McHistogramStatistics method.

When passed a 1-D array, the routine returns a single scalar BASIC_STATISTICS structure with the results of statistics taken on the values of the source array. When passed a 2-D or higher dimensioned array, then statistics are performed on the fastest moving dimension (the leftmost in VB, the rightmost in C/C++). This dimension may be variable in size, which can happen if the ArrayToTest is itself an array of Variant, where each Variant is an array (the McLineProfiles.ProfileValues used in the Examples is an property that is exposed in this way). The returned result from 2 and higher dimensioned arrays is an array of BASIC_STATISTICS structures, with dimensionality one less than the ArrayToTest (i.e., the fastest moving dimension is replaced by the BASIC_STATISTICS results).

Missing values in Single or Double ArrayToTest sources are detected (see McMissingSingle, McMissingDouble, and McIsMissingValue). All such values are skipped and their number is placed in the BASIC_STATISTICS.CountOfMissing field. The BASIC_STATISTICS.Count field is a count of all non-missing values analyzed and is the “N” used for all statistics. Should all values be missing, then BASIC_STATISTICS.Count will be zero, BASIC_STATISTICS.CountOfMissing will be the length of the source array and all other fields of the returned BASIC_STATISTICS structure are filled with the McMissingDouble value. If the source array is empty (zero-length), then BASIC_STATISTICS.Count will be zero, BASIC_STATISTICS.CountOfMissing will be zero and all other fields of the returned BASIC_STATISTICS structure will also be zero.

The actual statistical computations are standard, but a few items deserve discussion. Variance is computed using a “corrected two-pass algorithm” (Chan, Golub and LeVeque, 1983, American Statistician, vol. 37, pp. 242?247.), which tends to minimize rounding error. Variance (the second moment) is averaged by dividing the sum of squared deviations from the mean by N (BASIC_STATISTICS.Count), not N-1. If you want the population estimated Variance, then you can multiply BASIC_STATISTICS.Variance by N/(N-1).

The third moment, Skew, is presented in two ways. BASIC_STATISTICS.RawSkew is the average cubed deviation from the mean. BASIC_STATISTICS.NormalizedSkew is the above value divided by the standard deviation (BASIC_STATISTICS.StdDev) cubed, which results in a dimensionless quantity with mean of zero and a standard deviation on the order of sqrt(15/N) for gausian distributions. It will be negative for skew to the left and positive for skew to the right.

The fourth moment, Kurtosis, is also presented in two ways. BASIC_STATISTICS.RawKurtosis is the average 4th power of the deviation from the mean. BASIC_STATISTICS.NormalizedKurtosis is the above value divided by the fourth power of the standard deviation with 3 subtracted from that. This is a dimensionless quantity with mean of zero and a standard deviation on the order of sqrt(96/N) for gausian distributions. It will be negative for flat distributions (bread-box shaped) and positive for pointy distibutions.

Be aware that both Skew and Kurtosis are extremely sensitive to outliers. See Cramer, H. 1946, Mathematical Methods of Statistics (Princeton University Press), and Press and Vettering, “Numerical Recipes in C”, second edition (Cambridge University Press). Section 14.1 for more details.