stats ch. 1-ch. 5 sec. 2

what is meant by a variable

a characteristic of people or things

what is the difference between a categorical variable and a quantitative variable

categorical: gender, eye color qualitative (categories)
quantitative: age, height (numerical)

what is meant by exploratory data analysis

Exploratory data analysis uses graphs and numerical summaries to describe the variables in a data set and the relations among them.

what is meant by the distribution of the variable

The distribution of a variable tells us what values the variable takes and how often it takes these values.

what two types of charts/graphs are usually most appropriate for categorical data

bar graph, pie chart

when describing the overall pattern of a distribution of a quantitative variable what 3 features should you mention

shape center spread

what is a simple way to describe the center of a distribution of a quantitative variable

in a normal distribution the mean is the center. in a skewed it is the median

how do you describe the spread of a distribution of a quantitative variable

standard deviation

informally define an outlier

data value that is either much smaller or much larger than the rest of the data

list four graphs that are used for quantitative data

dot plot, histogram, stem and leaf plot

what information is lost when you choose a histogram over a dot plot or stem plot

individual data points

in statistics what are the most common measures of the center

mode and mean

explain how to calculate the mean

add the data values then divide by the sample size

explain how to find the median

put the data values in order from smallest to largest from there find the median if there is an odd number of values if there is an even find the average of the two

explain why the median is resistant to extreme observations, but the mean is non resistant

median will always be the middle value. outliers affect the mean

the mean and the median are close together if the distribution is what?

normal distribution

in a skewed distribution which will be farther towards the long tail the mean or the median

mean

which measure is most appropriate for a highly skewed distribution the mean or the median?

median

what is the definition of the range

the distance spanned by the entire data set

explain how to calculate the first quartile q1 and the third quartile q3

you find the median of the entire data set, then q1 is the median of the lower half and q3 is the median of the upper half

what is the interquartile range

the range of the middle 50% of the data

explain why it might be better to use the IQR instead of the range to describe the spread of the distribution

the IQR is not subject to peculiarities of the data set and it is not sensitive to outliers

what is the IQR based "rule of thumb

a potential outliers is a data value that is a distance of more than 1.5 interquartile ranges below the first quartile or above the third quartile IQR=Q3-Q1. QI-1.5(1QR) AND Q3+1.5(IQR)

what is the five number summary

minimum, 1st quartile, median, 3rd quartile, maximum

what type of graph gives the picture of the five number summary

box plot

the box in a box plot represents what percentage of the data

50%

the middle line of a box plot represents

the median

can the value of the mean be identified from a box plot

no

what does standard deviation measure

the measure of the variance or spread

can the standard deviation ever be negative

no

is the standard deviation resistant or nonresistant to extreme observations

nonresistant, outliers do affect it

when is it better to use the five number summary versus the mean and standard deviation

when the distribution is skewed left or skewed right

the box of box plot contains about half the data

true

when a distribution is strongly skewed to the right, the median is less than the mean

true

a data set always has a mode

false

when a distribution is strongly skewed to the right the 5 number summer is a better measure of the center and spread than the mean and standard deviation

true

How can we use IQR to determine outliers?

An observations is an outlier it if is more than 1.5*IQR above the third quartile of below the first quartile.

Explain why the median is resistant to extreme observations, but the mean is nonresistant.

The median is resistant because it is only based on the middle one or two observations of the ordered list. The mean is sensitive to the influence of a few extreme observations. Even if there are no outliers a skewed distribution will pull the mean toward

When does standard deviation equal zero?

The standard deviation = 0 only when there is no spread. This happens only when all observations have the same value. Otherwise s > 0. As the observations become more spread out about their mean, s gets larger.

What is the relationship between variance and standard deviation?

The standard deviation s is the square root of the variance s2.

How can we use IQR to determine outliers?

An observations is an outlier it if is more than 1.5*IQR above the third quartile of below the first quartile.

what does standard deviation measure? How do we calculate it?

The standard deviation is a measure of spread. It measures spread around the mean and should only be used when the mean is chosen as the measure of center.

Is standard deviation resistant or nonresistant to extreme observations? Explain.

s, like the mean, is not resistant. Strong skewness or a few outliers can make s very large.