After we have collected a group of data we then have to being describing it. To begin we are interested in two broad characterizations of data: measures of central tendency or location and measures of dispersion.
Measures of central tendency attempt to get a handle on the "middle" or representative observation. There are three different measures of central tendency. Even for one data set, the three measures can give different answers, because each measure looks at different aspects of "middle-ness."
For instance, let's say we collected information on the scores of a recent exam and found the scores to be: 75, 83, 98, 43, 92, 83, 85, 90, and 87. What is the "middle" score? Let's use the three measures of central tendency developed iin class to find the answer.
Mode
--is the value
with the maximum frequency in the data set, or the value that occurs most
often.
In our example the mode is 83 since it occurred twice.
Median
--is a positional
rank, or the value having the same number of observations above and below.
This requires that the observations are ordinally measured.
Thus for our exam score example, we first must order the data: 43, 75, 83, 83, 85, 87, 90, 92, 98. The median here is 85, because there are 4 observations above and below the score 85. Had there been an even number of observations in the data set, we simple would have averaged the middle two to come up with the median (For example if another student took the exam and got a 93, we would have averaged 85 and 87 to come up with a median of 86.)
Mean
,
--is
an arithmetic measure defined as the sum of all observations divided by
the total number of observations.

In our original data set the mean is 81.8 (or approximately 82).
Now that we have a sense of the "middle" or typical value of the interpretation as above, we now want to get a sense of how the data vary around the "middle" that we just found. These measures used to describe the data set are called measures of dispersions.
We will break these measures up into two categories depending on how the data are measured.
I. For Discrete/Nominal or Ordinal Variables:
These statistics measure variation in a sample by comparing the cases (i.e.,
scores or observations) to one another.
Variation Ratio (V)--is the ratio of NON-modal observations to the total number of observations (or is one minus the percent of observations in the modal category).
![]()
Minimum value for V: 0, where all cases are modal (a constant!)
Maximum value for V: 1, where no cases are modal.
Index of Diversity (D)--indicates the likelihood that two observations drawn at random from the sample are from different categories of the variable. The larger the number for D, the greater the dispersion.
![]()
where:
k = number of categories of the variable
p = proportion of cases in a given category
The problem with D is that you can only compare scores for D when the number of categories are held constant.
Minimum value for D: 0 (when all cases fall into a single category)
Maximum value for D: (K-1)/K (a function of the number of categories of
the variable)
Index of Qualitative Variation (IQV)--standardized version of D. Same interpretation as D, but IQV creates a maximum value of 1, regardless of the number of categories of the variable.
![]()
Improvement over D is that we can compare IQV scores across samples with different numbers of categories (K) because it is now standardized.
II. For Continuous/Interval Variables:
With the exception of the range, these statistics measure variation
in the sample by comparing the scores to the mean.
Range--difference between extreme values of a distribution (high value minus low value).
R = Highest value - Lowest value.
Minimum value for the range: 0 (where all observations are on the same
value)
Maximum value for the range: the distance from the lowest to the highest
possible measurement
Average Deviation--sum of deviation of scores (cases or observations) about the mean, divided by the total number of scores.

Variance--the sum of the squared deviation of the scores about the mean, divided by the total number of scores minus 1. (Note: we divide by (n-1) because, otherwise, the sample variance will be a biased estimator of the population variance. We will talk more later in the semester about what this means.)

Standard Deviation s,
--the
square root of the variance. Think of it as the "average" or
"typical" amount of variation around the mean. An "average"
observation is one standard deviation away from the mean.

Minimum value for s.d.: 0 (where there is no variation from the mean
at all; all cases are on the mean)
Maximum value for s.d.: approximately half of the total possible range
of measurement (for example, on a 100 point scale, max. s.d. is roughly
50)
Chebychev's Inequality Theorem:
For ANY distribution, at least (1-1/k2)*100% of the observations lie within k standard deviations of the mean.
At least 75% lie within 2 standard deviations
At least 89% lie within 3 standard deviations.
For Normally Distributed data:
About 68% lie within ONE standard deviation
About 95% lie within 2
About 99% lie within 3.
Posted February 18, 1998
Copyright Chris Fastnow