types of variables
quantitative
categorical
quantitative variable
takes numerical vlaues for which arithmetic operations make sense
-example- amount of money number of children distance
categoraical variable
place and individual into on of several groups or categories
examples: gender,race academic major,zip code
distribution of a variable
tells us what values it takes and how often each value occurs.
Described by
tables or graphs
numerical summaries
frequency (count)
the number of times a value of a variable occurs in the data
relative frequency
proportion ( fraction or percent) of all observation that have a given value
Basic graphs for summarizing categorical variables (data) are
pie charts and bar graphs
Pie chart
shows the amount of data that
belongs to each category as a proportional
part of a circle
Bar graph
shows the amount of data that
belongs to each category as proportionally
sized rectangular areas (bars)
Categories are on horizontal axis
- Frequencies (or relative frequencies) are on vertical axis
Pictograms
Variation of the bar graph
All pictures should have the same width, otherwise the
pictures can mislead the reader.
Avoid!
Line Graphs
Shows behavior of a quantitative variable over time
Time marked on horizontal axis
Frequency (or relative frequency) of variable marked on vertical axis
Patterns in Line Graphs
Look for overall pattern
-Trend a long-term upward or downward
movement over time
Look for deviations from the overall pattern
- Spikes and plunges
Look for seasonal variation
- A change over time that has a regular pattern; pattern repeats itself at know
Scales on Line Graphs
Scales can change the observed pattern.
Basic graphs for displaying quantitative variables (data) are
histograms
stemplots
Histograms
1. Divide the data into classes of equal width.
2. Count the number (frequency) of observations in each class.
3. Draw the histogram.
� Variable scale is on the horizontal axis
� Frequency (or relative frequency) scale is on the vertical axis
� Each bar r
Shapes of Distributions
Symmetric distribution
Skewed distribution
Symmetric distribution
the right and left sides of the histogram are approximately mirror images of each
other.
Skewed distribution
one side of the center line contains more data than the other.
- Skewed to the right - the right side of the histogram extends much farther than the left side
- Skewed to the left - the left side of the histogram extends much farther than the right side
Interpreting Histograms
An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph.
To see the overall pattern of a histogram, ignore any outliers.
� When describing a distribution of a histogram, state the shape and whether
Stemplots
� Used for small data sets (usually less than 100 values)
� Similar to histogram, but they display the
actual values of the observations
How To Make
1. Separate each observation into a stem [all but the final rightmost digit of (rounded) data] and a leaf
Describing Distributions with Numbers
� A graph gives the best overall picture of a
distribution.
� We also need numbers to summarize the center and spread of a distribution
Numerical Summaries: Descriptive
Statistics
Median
Median
the midpoint of a distribution
when the observations are arranged in
increasing order; half the observations are
smaller and the other half are larger.
To find the median of a distribution:
- List the data in order from smallest to largest.
� If n is odd,
Measure of Spread
When describing a distribution with numbers, give both a measure of center and a measure
of spread.
� If you choose the median to describe center, we might want to use quartiles to describe the
spread.
Quartiles
divide ordered data into four equally
sized parts.
First Quartile (Q1)
the values such that 25% of the data
values lie below Q1 and 75% of the data values lie above Q1
Third Quartile (Q3)
- the value such that 75% of the data
values lie below Q3 and 25% of the data values lie above Q3
Second Quartile (Q2)
median
Finding Quartiles
� If n is odd: Split the data at the median but do not include the
median in either half.
� Q1 is the median of the smaller observations
� Q3 is the median of the larger observations
If n is even: Split the data between the 2 values that are
averaged to g
5-number summary
of a data set consists
of the following descriptive statistics.
boxplot
a graph of the 5-number
summary.
Constructing Boxplots
1. Compute the 5-number summary
2. Draw a number line that spans the range of the data
3. Draw a vertical line at Q1 and Q3 and make a box
4. Draw a vertical line in the box at the median
5. Draw lines from the box out to the minimum and the maximum
Mean and Standard Deviation
The most common numerical description of a
distribution
� Mean is measure of center
� Standard deviation is a measure of spread
Mean
The mean ( ) of a set of n observations is the
average.
� To find the mean, add the data values and
divide by
Standard Deviation
gives the average
distance of the observations from the mean
To find the standard deviation
1. Find the distance of each observation from the mean
and square each of these distances
Distance: deviation from the mean =
2. Average the squared distances by dividing their sum by
n-1. This value is the variance (s2).
3. The standard deviation (s) is
Properties of the Standard Deviation
The standard deviation (s) measures spread
about the mean
s = 0 only when there is no spread. This
happens only when all observations have the
same value
Choosing a Numerical Summary
How can we decide which of the two descriptions
of center and spread we should use?
The mean and the standard deviation are strongly
affected by extreme values. The median and quartiles
are less affected.
� The 5-number summary is usually better than the
types of variables
quantitative
categorical
quantitative variable
takes numerical vlaues for which arithmetic operations make sense
-example- amount of money number of children distance
categoraical variable
place and individual into on of several groups or categories
examples: gender,race academic major,zip code
distribution of a variable
tells us what values it takes and how often each value occurs.
Described by
tables or graphs
numerical summaries
frequency (count)
the number of times a value of a variable occurs in the data
relative frequency
proportion ( fraction or percent) of all observation that have a given value
Basic graphs for summarizing categorical variables (data) are
pie charts and bar graphs
Pie chart
shows the amount of data that
belongs to each category as a proportional
part of a circle
Bar graph
shows the amount of data that
belongs to each category as proportionally
sized rectangular areas (bars)
Categories are on horizontal axis
- Frequencies (or relative frequencies) are on vertical axis
Pictograms
Variation of the bar graph
All pictures should have the same width, otherwise the
pictures can mislead the reader.
Avoid!
Line Graphs
Shows behavior of a quantitative variable over time
Time marked on horizontal axis
Frequency (or relative frequency) of variable marked on vertical axis
Patterns in Line Graphs
Look for overall pattern
-Trend a long-term upward or downward
movement over time
Look for deviations from the overall pattern
- Spikes and plunges
Look for seasonal variation
- A change over time that has a regular pattern; pattern repeats itself at know
Scales on Line Graphs
Scales can change the observed pattern.
Basic graphs for displaying quantitative variables (data) are
histograms
stemplots
Histograms
1. Divide the data into classes of equal width.
2. Count the number (frequency) of observations in each class.
3. Draw the histogram.
� Variable scale is on the horizontal axis
� Frequency (or relative frequency) scale is on the vertical axis
� Each bar r
Shapes of Distributions
Symmetric distribution
Skewed distribution
Symmetric distribution
the right and left sides of the histogram are approximately mirror images of each
other.
Skewed distribution
one side of the center line contains more data than the other.
- Skewed to the right - the right side of the histogram extends much farther than the left side
- Skewed to the left - the left side of the histogram extends much farther than the right side
Interpreting Histograms
An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph.
To see the overall pattern of a histogram, ignore any outliers.
� When describing a distribution of a histogram, state the shape and whether
Stemplots
� Used for small data sets (usually less than 100 values)
� Similar to histogram, but they display the
actual values of the observations
How To Make
1. Separate each observation into a stem [all but the final rightmost digit of (rounded) data] and a leaf
Describing Distributions with Numbers
� A graph gives the best overall picture of a
distribution.
� We also need numbers to summarize the center and spread of a distribution
Numerical Summaries: Descriptive
Statistics
Median
Median
the midpoint of a distribution
when the observations are arranged in
increasing order; half the observations are
smaller and the other half are larger.
To find the median of a distribution:
- List the data in order from smallest to largest.
� If n is odd,
Measure of Spread
When describing a distribution with numbers, give both a measure of center and a measure
of spread.
� If you choose the median to describe center, we might want to use quartiles to describe the
spread.
Quartiles
divide ordered data into four equally
sized parts.
First Quartile (Q1)
the values such that 25% of the data
values lie below Q1 and 75% of the data values lie above Q1
Third Quartile (Q3)
- the value such that 75% of the data
values lie below Q3 and 25% of the data values lie above Q3
Second Quartile (Q2)
median
Finding Quartiles
� If n is odd: Split the data at the median but do not include the
median in either half.
� Q1 is the median of the smaller observations
� Q3 is the median of the larger observations
If n is even: Split the data between the 2 values that are
averaged to g
5-number summary
of a data set consists
of the following descriptive statistics.
boxplot
a graph of the 5-number
summary.
Constructing Boxplots
1. Compute the 5-number summary
2. Draw a number line that spans the range of the data
3. Draw a vertical line at Q1 and Q3 and make a box
4. Draw a vertical line in the box at the median
5. Draw lines from the box out to the minimum and the maximum
Mean and Standard Deviation
The most common numerical description of a
distribution
� Mean is measure of center
� Standard deviation is a measure of spread
Mean
The mean ( ) of a set of n observations is the
average.
� To find the mean, add the data values and
divide by
Standard Deviation
gives the average
distance of the observations from the mean
To find the standard deviation
1. Find the distance of each observation from the mean
and square each of these distances
Distance: deviation from the mean =
2. Average the squared distances by dividing their sum by
n-1. This value is the variance (s2).
3. The standard deviation (s) is
Properties of the Standard Deviation
The standard deviation (s) measures spread
about the mean
s = 0 only when there is no spread. This
happens only when all observations have the
same value
Choosing a Numerical Summary
How can we decide which of the two descriptions
of center and spread we should use?
The mean and the standard deviation are strongly
affected by extreme values. The median and quartiles
are less affected.
� The 5-number summary is usually better than the