Statistics: Distributions and Standard Deviation

Learning Objectives

Define standard deviation as a measure of data dispersion.

Describe the characteristics of a normal distribution (bell curve).

Explain the 68-95-99.7 (empirical) rule for a normal distribution.

Describing Data Spread and Shape

Beyond measures of central tendency (mean, median), statistics uses measures of dispersion and distribution shape to describe a data set.

Standard Deviation: A Measure of Spread

The standard deviation (σ) is a number that quantifies the amount of variation or dispersion of a set of data values.

A low standard deviation indicates that the data points tend to be very close to the mean (the average).

A high standard deviation indicates that the data points are spread out over a wider range of values.

It is calculated as the square root of the variance, which is the average of the squared differences from the Mean.

The Normal Distribution

The normal distribution, often called the bell curve, is a very common and important probability distribution in statistics. It is symmetric about the mean. Many natural phenomena, such as height, blood pressure, and measurement errors, are approximately normally distributed.

In a normal distribution, the mean, median, and mode are all equal and are located at the center of the distribution.

The 68-95-99.7 (Empirical) Rule

For a normal distribution, a predictable percentage of the data falls within a certain number of standard deviations from the mean:

Approximately 68% of the data falls within 1 standard deviation of the mean (μ ± 1σ).

Approximately 95% of the data falls within 2 standard deviations of the mean (μ ± 2σ).

Approximately 99.7% of the data falls within 3 standard deviations of the mean (μ ± 3σ).

This rule is a powerful tool for quickly assessing the spread of data and identifying outliers in a normally distributed data set.

Key Terms

Standard Deviation: A measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range.
Normal Distribution: A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. It is often called a 'bell curve'.
Variance: A measure of dispersion, calculated as the average of the squared differences from the Mean.
68-95-99.7 Rule: A shorthand used to remember the percentage of values that lie within a band around the mean in a normal distribution with a width of two, four, and six standard deviations, respectively.