Variability

Variability and its properties

Mahesha Godekere Distribution

Measures of variability describe the amount of variability or spread in the data. The most common measures of variability are the range, the interquartile range (IQR), variance, and standard deviation.

Range

The range is a measure of the total spread of values in a quantitative dataset. Unlike other more popular measures of dispersion, the range actually measures total dispersion, more literally, as the difference between the largest and the smallest value in a dataset.

\( Range = X_{max} - X_{min} \)

Where \(X_{max}\) = Maximum data set value, \(X_{min}\) = Minimum data set value

Mid-Range

The mid-range of a set of statistical data values is the arithmetic mean of the maximum and minimum values in a data set, defined as:

\( Mid Range = \frac{X_{max} - X_{min}}{2} \)

Where \(X_{max}\) = Maximum data set value, \(X_{min}\) = Minimum data set value

Variance

Variance measures how far a data set is spread out from their average value. Mathematically, The variance \(σ^2\), is defined as the sum of the squared distances of each term in the distribution from the mean μ, divided by the number of items in the distribution N.

\(Variance = σ^2 = \frac {\Sigma{(X-μ)^2} }{N}\)

Example of samples from two populations with the same mean but different variances. The red population has mean 100 and variance 100 (SD=10) while the blue population has mean 100 and variance 2500 (SD=50).

Standard Deviation

Standard deviation (σ) is a measure that is used to quantify the amount of variation or dispersion or spread of a data values. σ is a measure of the average distance between the values of the data in the set and the mean.

A low standard deviation indicates that the data points tend to be close to the mean, while a high standard deviation indicates that the data points are spread out over a wider range of values.

\( Standard Deviation of population = σ = \sqrt{\frac {\Sigma{(X-μ)^2} }{N}} = \)

\( Standard Deviation of population sample = σ = \sqrt{\frac {\Sigma{(X-μ)^2} }{N-1}} = \)

Dark blue is one standard deviation on either side of the mean. For the normal distribution, this accounts for 68.27 percent of the set; while two standard deviations from the mean (medium and dark blue) account for 95.45 percent; three standard deviations (light, medium, and dark blue) account for 99.73 percent; and four standard deviations account for 99.994 percent.

Quartiles

Quartiles are the values that divide a list of numbers into quarters.

  1. The lower quartile is the value of the middle of the first set after dividing the data into two equal sets using median, where 25% of the values are smaller than Q1 and 75% are larger. This first quartile takes the notation Q1.

  2. The upper quartile is the value of the middle of the second set after dividing the data into two equal sets using median, where 75% of the values are smaller than Q3 and 25% are larger. This third quartile takes the notation Q3. It should be noted that the median takes the notation Q2, the second quartile.

Example 1 – Upper and lower quartiles

Data: 6, 47, 49, 15, 43, 41, 7, 39, 43, 41, 36

Ordered data: 6, 7, 15, 36, 39, 41, 41, 43, 43, 47, 49

Median: 41
Upper quartile = first quartile Q1 : 43
Lower quartile = third quartile Q3 : 15

Note: second quartile Q2 is nothing but the median

Interquartile Range (IQR)

The interquartile range (IQR) is the distance between the first quartile (Q​1) and the third quartile (Q3). 50% of the data are within this range.

IQR = Q​3​ − Q​1 ​​

Z-Score

z-score is the number of standard deviations from the mean a data point \( x \) is.

Z-Score = \( \frac { x - μ } {σ} \)