Chapter 3

Arithmetic mean

the primary measure of central location. Referred to as simply the "mean".

Mean can

be affected by outliers

Median

calculated as a measure of central location. Middle value of data set. If the mean and median differ significantly then the data must have outliers.

Mode

the value that occurs most frequently. A data set can have more than one mode, or even no mode.

Percentile

The median is called the 50th percentile.

Calculating the pth percentile

A. First arrange the data in ascending order

Quintiles

divide the data set into fifths

Deciles

divide the data into tenths

Percentiles

divide the data set into hundredths

Geometric Mean

is a multiplicative average, opposed to an additive average (the arithmetic mean)
Relevant measure when evaluating investment returns over several years. Also when calculating growth rates

The Geometric Mean return

Suppose you invested $1,000 in a stock that had a 10% return in 2009 and a -10% return in 2010. The arithmetic mean suggests that by the end of year 2010, you would be right back where you started with $1,000 worth of stock. It is true that the arithmetic

range

is the simplest measure of dispersion; it is the difference between the maximum and the minimum values in a data set.

The Mean Absolute Deviation

A good measure of dispersion should consider differences of all observations from the mean.

The Mean Absolute Deviation (MAD)

is an average of the absolute differences between the observations and the mean

Beginning to end, the steps to calculate MAD

1 Calculate the arithmetic mean for the data set
2 find the absolute differences between each value and the mean
3 sum the absolute differences
4 Divide by the sample for the population size

Variance

the average of the squared differences between observations and the mean.

Variance & Standard Deviation

variance and SD are the two most widely used measures of dispersion. Instead of calculating the avg of the absolute differences from the mean, as in MAD, we calculate the average of the squared differences from the mean. The squaring of differences from t

the average of the sum of squared differences from the mean is the

population variance

The coefficient of variation (CV)

in some instances, analysis entails comparing two or more data sets that have different means or units of measurement.
The CV serves as a relative measure of dispersion and adjusts for differences in the magnitudes of the means.

Calculating the CV

by dividing a data set's standard deviation by its mean, CV is a unitless measure that allows for direct comparisons of mean-adjusted dispersion across different data sets.

Mean-Variance Analysis

postulates that we measure the performance of an asset by its rate of return and evaluate this rate of return in terms of its reward (mean) and risk (variance). In general, investments with higher average returns are also associated with higher risk.

Sharpe ratio

defined with the reward specified in terms of the population mean and the variability specified in terms of the population. However, we often compute sharpe ratio in terms of the sample mean and sample variance, where the return is usually expressed as a

The Empirical Rule

68%, 95%, 100%
Can be used to determine the proportion of data points that fall within a specified number of standard deviations from the mean

Chebyshev's theorem

For any data set, the proportion of observations that lie within K standard deviations from the mean is at least 1 - 1/k^2, where k is any number greater than 1

Summarizing Grouped data

the mean and the variance are the most widely used descriptive measures in statistics.
The summary measures for grouped data are approximate values

weighted mean

when a mean is calculated and some observations are given greater importance or value, we refer to this measure of central location as (weighted mean)

when calculating a mean for grouped data,

the midpoint of each class is used to approximate the individual values in that class

median

generally, the _____ is the best measure of central location when outliers are present

Covariance

the numerical measure that reveals only the direction of the linear relationship between two variables

The correlation coefficient

describes both the direction and the strength of the linear relationship between two variables.

Correlation coefficient

the closer the correlation coefficient gets to 1 or -1 the stronger the relationship; the closer it gets to 0, the weaker the relationship

mode

summarizing a qualitative data set, the ____ is the best measure of central location

when a box plot is constructed, an outlier is a data point that is farther than

1.5xIQR from Q1 to Q3

in a box plot, if the median is right of center and the left whisker is longer than the right whisker, then the distribution is

negatively skewed

steps required to calculate a particular percentile for a data set

1, arrange data set in ascending order
2. determine the approximate location of the percentile by calculating L
3. determine whether the value provided by L is an integer
4. select or interpolate the appropriate value from the data set

Steps in order, from beginning to end, to calculate a mean for grouped data

1. find the midpoint for each class of grouped data
2. multiply the midpoint of each class by the number of observations in its class
3. sum the products of the midpoints and observations
4. divide by the total number of observations

x bar

sample mean

Calculate the mean absolute deviation

1. calculate the arithmetic mean for the data set
2. Find the absolute difference between each value and the mean
3. Sum the absolute differences
4. Divide by the sample (or the population) size.

coefficient of variation best described as

a relative measure of dispersion

quartiles

4

standard deviation

the sd is calculated by using squared differences from the mean

the average of the absolute differences between the values of the data set and the mean is

MAD

when calculating average growth rates, we apply the formula for the

geometric mean