stat ch.10-12 notes

types of variables

quantitative
categorical

quantitative variable

takes numerical vlaues for which arithmetic operations make sense
-example- amount of money number of children distance

categoraical variable

place and individual into on of several groups or categories
examples: gender,race academic major,zip code

distribution of a variable

tells us what values it takes and how often each value occurs.
Described by
tables or graphs
numerical summaries

frequency (count)

the number of times a value of a variable occurs in the data

relative frequency

proportion ( fraction or percent) of all observation that have a given value

Basic graphs for summarizing categorical variables (data) are

pie charts and bar graphs

Pie chart

shows the amount of data that
belongs to each category as a proportional
part of a circle

Bar graph

shows the amount of data that
belongs to each category as proportionally
sized rectangular areas (bars)
Categories are on horizontal axis
- Frequencies (or relative frequencies) are on vertical axis

Pictograms

Variation of the bar graph
All pictures should have the same width, otherwise the
pictures can mislead the reader.
Avoid!

Line Graphs

Shows behavior of a quantitative variable over time
Time marked on horizontal axis
Frequency (or relative frequency) of variable marked on vertical axis

Patterns in Line Graphs

Look for overall pattern
-Trend a long-term upward or downward
movement over time
Look for deviations from the overall pattern
- Spikes and plunges
Look for seasonal variation
- A change over time that has a regular pattern; pattern repeats itself at know

Scales on Line Graphs

Scales can change the observed pattern.

Basic graphs for displaying quantitative variables (data) are

histograms
stemplots

Histograms

1. Divide the data into classes of equal width.
2. Count the number (frequency) of observations in each class.
3. Draw the histogram.
� Variable scale is on the horizontal axis
� Frequency (or relative frequency) scale is on the vertical axis
� Each bar r

Shapes of Distributions

Symmetric distribution
Skewed distribution

Symmetric distribution

the right and left sides of the histogram are approximately mirror images of each
other.

Skewed distribution

one side of the center line contains more data than the other.
- Skewed to the right - the right side of the histogram extends much farther than the left side
- Skewed to the left - the left side of the histogram extends much farther than the right side

Interpreting Histograms

An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph.
To see the overall pattern of a histogram, ignore any outliers.
� When describing a distribution of a histogram, state the shape and whether

Stemplots

� Used for small data sets (usually less than 100 values)
� Similar to histogram, but they display the
actual values of the observations
How To Make
1. Separate each observation into a stem [all but the final rightmost digit of (rounded) data] and a leaf

Describing Distributions with Numbers

� A graph gives the best overall picture of a
distribution.
� We also need numbers to summarize the center and spread of a distribution

Numerical Summaries: Descriptive
Statistics

Median

Median

the midpoint of a distribution
when the observations are arranged in
increasing order; half the observations are
smaller and the other half are larger.
To find the median of a distribution:
- List the data in order from smallest to largest.
� If n is odd,

Measure of Spread

When describing a distribution with numbers, give both a measure of center and a measure
of spread.
� If you choose the median to describe center, we might want to use quartiles to describe the
spread.

Quartiles

divide ordered data into four equally
sized parts.

First Quartile (Q1)

the values such that 25% of the data
values lie below Q1 and 75% of the data values lie above Q1

Third Quartile (Q3)

- the value such that 75% of the data
values lie below Q3 and 25% of the data values lie above Q3

Second Quartile (Q2)

median

Finding Quartiles

� If n is odd: Split the data at the median but do not include the
median in either half.
� Q1 is the median of the smaller observations
� Q3 is the median of the larger observations
If n is even: Split the data between the 2 values that are
averaged to g

5-number summary

of a data set consists
of the following descriptive statistics.

boxplot

a graph of the 5-number
summary.

Constructing Boxplots

1. Compute the 5-number summary
2. Draw a number line that spans the range of the data
3. Draw a vertical line at Q1 and Q3 and make a box
4. Draw a vertical line in the box at the median
5. Draw lines from the box out to the minimum and the maximum

Mean and Standard Deviation

The most common numerical description of a
distribution
� Mean is measure of center
� Standard deviation is a measure of spread

Mean

The mean ( ) of a set of n observations is the
average.
� To find the mean, add the data values and
divide by

Standard Deviation

gives the average
distance of the observations from the mean

To find the standard deviation

1. Find the distance of each observation from the mean
and square each of these distances
Distance: deviation from the mean =
2. Average the squared distances by dividing their sum by
n-1. This value is the variance (s2).
3. The standard deviation (s) is

Properties of the Standard Deviation

The standard deviation (s) measures spread
about the mean
s = 0 only when there is no spread. This
happens only when all observations have the
same value

Choosing a Numerical Summary

How can we decide which of the two descriptions
of center and spread we should use?
The mean and the standard deviation are strongly
affected by extreme values. The median and quartiles
are less affected.
� The 5-number summary is usually better than the

types of variables

quantitative
categorical

quantitative variable

takes numerical vlaues for which arithmetic operations make sense
-example- amount of money number of children distance

categoraical variable

place and individual into on of several groups or categories
examples: gender,race academic major,zip code

distribution of a variable

tells us what values it takes and how often each value occurs.
Described by
tables or graphs
numerical summaries

frequency (count)

the number of times a value of a variable occurs in the data

relative frequency

proportion ( fraction or percent) of all observation that have a given value

Basic graphs for summarizing categorical variables (data) are

pie charts and bar graphs

Pie chart

shows the amount of data that
belongs to each category as a proportional
part of a circle

Bar graph

shows the amount of data that
belongs to each category as proportionally
sized rectangular areas (bars)
Categories are on horizontal axis
- Frequencies (or relative frequencies) are on vertical axis

Pictograms

Variation of the bar graph
All pictures should have the same width, otherwise the
pictures can mislead the reader.
Avoid!

Line Graphs

Shows behavior of a quantitative variable over time
Time marked on horizontal axis
Frequency (or relative frequency) of variable marked on vertical axis

Patterns in Line Graphs

Look for overall pattern
-Trend a long-term upward or downward
movement over time
Look for deviations from the overall pattern
- Spikes and plunges
Look for seasonal variation
- A change over time that has a regular pattern; pattern repeats itself at know

Scales on Line Graphs

Scales can change the observed pattern.

Basic graphs for displaying quantitative variables (data) are

histograms
stemplots

Histograms

1. Divide the data into classes of equal width.
2. Count the number (frequency) of observations in each class.
3. Draw the histogram.
� Variable scale is on the horizontal axis
� Frequency (or relative frequency) scale is on the vertical axis
� Each bar r

Shapes of Distributions

Symmetric distribution
Skewed distribution

Symmetric distribution

the right and left sides of the histogram are approximately mirror images of each
other.

Skewed distribution

one side of the center line contains more data than the other.
- Skewed to the right - the right side of the histogram extends much farther than the left side
- Skewed to the left - the left side of the histogram extends much farther than the right side

Interpreting Histograms

An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph.
To see the overall pattern of a histogram, ignore any outliers.
� When describing a distribution of a histogram, state the shape and whether

Stemplots

� Used for small data sets (usually less than 100 values)
� Similar to histogram, but they display the
actual values of the observations
How To Make
1. Separate each observation into a stem [all but the final rightmost digit of (rounded) data] and a leaf

Describing Distributions with Numbers

� A graph gives the best overall picture of a
distribution.
� We also need numbers to summarize the center and spread of a distribution

Numerical Summaries: Descriptive
Statistics

Median

Median

the midpoint of a distribution
when the observations are arranged in
increasing order; half the observations are
smaller and the other half are larger.
To find the median of a distribution:
- List the data in order from smallest to largest.
� If n is odd,

Measure of Spread

When describing a distribution with numbers, give both a measure of center and a measure
of spread.
� If you choose the median to describe center, we might want to use quartiles to describe the
spread.

Quartiles

divide ordered data into four equally
sized parts.

First Quartile (Q1)

the values such that 25% of the data
values lie below Q1 and 75% of the data values lie above Q1

Third Quartile (Q3)

- the value such that 75% of the data
values lie below Q3 and 25% of the data values lie above Q3

Second Quartile (Q2)

median

Finding Quartiles

� If n is odd: Split the data at the median but do not include the
median in either half.
� Q1 is the median of the smaller observations
� Q3 is the median of the larger observations
If n is even: Split the data between the 2 values that are
averaged to g

5-number summary

of a data set consists
of the following descriptive statistics.

boxplot

a graph of the 5-number
summary.

Constructing Boxplots

1. Compute the 5-number summary
2. Draw a number line that spans the range of the data
3. Draw a vertical line at Q1 and Q3 and make a box
4. Draw a vertical line in the box at the median
5. Draw lines from the box out to the minimum and the maximum

Mean and Standard Deviation

The most common numerical description of a
distribution
� Mean is measure of center
� Standard deviation is a measure of spread

Mean

The mean ( ) of a set of n observations is the
average.
� To find the mean, add the data values and
divide by

Standard Deviation

gives the average
distance of the observations from the mean

To find the standard deviation

1. Find the distance of each observation from the mean
and square each of these distances
Distance: deviation from the mean =
2. Average the squared distances by dividing their sum by
n-1. This value is the variance (s2).
3. The standard deviation (s) is

Properties of the Standard Deviation

The standard deviation (s) measures spread
about the mean
s = 0 only when there is no spread. This
happens only when all observations have the
same value

Choosing a Numerical Summary

How can we decide which of the two descriptions
of center and spread we should use?
The mean and the standard deviation are strongly
affected by extreme values. The median and quartiles
are less affected.
� The 5-number summary is usually better than the