AP Statistics Semester 1 Quiz/Checkpoint Questions

Which phase of inferential statistics is sometimes considered to be the most crucial because errors in this phase are the most difficult to correct?

Data gathering
(Data gathering is often considered the most critical phase of inferential statistics. It's crucial to have an unbiased and representative sample for a statistical study. It's also usually the most time-consuming phase.)

What is the term for organizing and summarizing data without a particular question in mind?

Exploratory data analysis
(In the term exploratory data analysis the word exploratory implies that researchers are looking at the data but not expecting to find a particular pattern.)

The newspaper uses a line graph to show the performance of stocks over the last month. This is an example of:

descriptive statistics
(The data were gathered and organized into a graph, which is an example of descriptive statistics.)

You claim that you're healthier than your friends. To support your claim, you randomly select some of your friends and track their meals for a month. You also track your meals during the same month. What you are doing is:

inferential statistics
(This is an example of inferential statistics. The data you collected is a sample used to infer whether you're healthier than your friends.)

Which one of the following activities is not an example of data gathering?

Reaching a conclusion about the results of a reading program
(Reaching a conclusion about the results of a reading program)

Inferential statistics is used in each of the following except:

creating a pictograph of the number of people struck by lightning each year.
(In creating a pictograph, you haven't attempted to predict or compare anything. You've used descriptive statistics.)

Let's say that a researcher administers a new type of vitamin supplement to a sample of 30 rats. Thirty other rats didn't receive the supplement. Later, he compares the weights of the supplement group with the non-supplement control group. In this case th

continuous data
(Weights are measured, so the data are continuous.)

True or False: You want to know how often residents of cold climates vacation in warm destinations. You randomly sample 50 residents and find out how many annual trips to warm destinations they've taken during their adult lives. True or False: This is an

false
(The number of trips is counted and therefore discrete.)

Which of these are categorical data?

The different types of anteaters.
(other options were weight, length, etc)
(The different types of anteaters cannot be expressed as a number, so this is an example of qualitative data.)

Your class is participating in an Internet game show and must choose whether a prize is between door #1 or door #2. You take a vote. The numbers for the doors are an example of which kind of data?

Categorical

When your class participates in an Internet game show and counts the votes for door #1 and door #2, the counts are examples of what kind of data?

Counted numerical

True or False: You perform a study to see how long the grass in your yard will live if you don't water it all summer. You conclude that no one's yard can live for more than three weeks without water. This is an example of descriptive statistics.

false

Which of the following would be an example of the use of inferential statistics?
I. You have your entire class's math grades, and you calculate the average math grade for your class.
II. You have your entire class's math grades and you use the grades to f

II and III

You want to know something about your neighbors, so you give them a survey. The survey collects the following data about each family on your block: family size, the kind of pets they have, the grade of the youngest child in the family, the family's annual

Only annual income

A family's phone number qualifies as what kinds of data?

Discrete and categorical

Besides phone numbers, what are the other categorical variables in your survey? (The survey asks for family size, the kind of pets they have, the grade of the youngest child in the family, the family's annual income in dollars, what the dad does for a liv

Grade of youngest child, dad's occupation, whether mom works, and kind of pets

True or False: In a survey of your neighbors (asking for family size, the kind of pets they have, the grade of the youngest child in the family, the family's annual income in dollars, what the dad does for a living, whether the mom works, and their phone

true

Which of the following combinations of data types is not possible?
- discrete and categorical
- continuous and categorical
- discrete and numeric
- continuous and numeric
- all combinations of data types are possible

continuous and categorical are not possible

You're using a ruler to measure lengths of stick-bugs. The ruler is marked for every centimeter. You end up taking all lengths to the nearest centimeter. (In other words, you can have 5 centimeters and 6 centimeters, but not 5.5 centimeters.)
True or Fals

true
(Sometimes the definitions get fuzzy. Measured data are usually thought of as continuous, but in this case the measured data are also discrete because they can only have certain values that are whole numbers. But you can also think of the data as rou

The midpoint of the interval whose boundaries are 27.5 and 38.5 is:

33

Which of the following indicates how many times every value in a distribution appears?

Frequency table
(The phrase how many times implies the count for each data value, which is best shown through a frequency table.)

Which of the following statements is true for numerical data?

It can be measured.

Which of the following would most likely be graphed as a bar chart rather than a histogram?
- number of students that use windows laptops vs. macintosh ones
- the number of cars in each color in a parking lot
- (one other option I forgot, obv. categorical

all of the above
(Each of these are examples of categorical data. They're counts of the members of a category rather than measured values of a numeric variable.)

True or False: If the total area of all the bars in a histogram is 1, the area of each bar is proportional to the total number of data values.

true
(Think about each bar as representing a proportion of the total area. A histogram can be thought of as an "area-picture" of a frequency table; the areas of the bars represent the frequencies, and a large area indicates a large frequency.)

Which of the following represents a plot or graph of the cumulative counts across each of the intervals or midpoints?

Cumulative frequency plot
(This indicates the cumulative count of observations across each of the intervals.)

Consider a complete table of relative frequencies. The sum of the relative frequency column in such a table must be:

1

True or False: It's possible to determine the frequencies (counts) within each interval from a cumulative frequency plot.

true
(In a cumulative frequency table, the difference between successive entries in a column is equal to the frequency of the lower entry. All frequencies can be "recaptured" by computing all such differences.)

Histograms are most useful in displaying:

large numeric data sets
(Histograms are used to display frequencies across intervals of numeric data.)

True or False: In a histogram, a single building or class contains all the values of the data set.

false

A histogram class is a collection of all the observations that fall between two:

class limits

What is the term for the width of a building in a histogram?

class interval

For this question, refer to the histogram you created for this Self Check. (The data you need for creating this histogram is at the end of your Study Guide.)
Change your histogram so that $5,000 is added to the Xmin and Xmax.
Change Xmin to $45,000
Change

$105,000 ? x < $115,000
$135,000 ? x < $145,000
$145,000 ? x < $155,000
$165,000 ? x < $175,000

What can you do with a calculator- or computer-generated histogram that you can't do with a hand-drawn stem-and-leaf plot?

You can change the class interval.

True or False: When you decrease the Xscl value, you decrease the class interval. By doing this, you increase the number of classes or "buildings.

true

The entire group we're interested in is called a:

population

True or False: We usually don't have to sample because we can always gather data from every population member.

false

We call any numerical fact about a population a:

parameter

n (for the size of the group), x? (for the mean), and s (for the standard deviation) are all measures calculated from what group?

sample

Most statisticians use statistics instead of parameters because:

data from an entire population is almost always very difficult to obtain.

True or False: A randomly selected sample is made up of any group of population members that's easy to find.

false

If we wanted to gather a sample representing all residents in a town, what is the problem with drawing a simple random sample from the local phone book?

A phone book isn't a complete listing of all population members.

True or False: A population contains 60% women and 40% men. To reflect the population group, we make sure that our sample also contains 60% women and 40% men. This is an example of a simple random sample.

false
(In a simple random sample, any possible combination of people must be equally likely. This statement describes a sample in which there are restrictions.)

Which of the following is a sample of the population of all high school students? (HINT: A sample is not always a simple random sample.)
- High school students taking chemistry
- Your math class
- All high school students in your state
- High school stude

All of the above
(Each of these is a subset of the population, which is all high school students. Any subset of a population is a sample of that population, though it may not be random.)

Which of the following is a sample of the population of bookstores on the West Coast of the United States?
I. All bookstores in California
II. Randomly selected children's bookstores
III. Internet bookstores

I only
(Only bookstores in California count as a sample of bookstores on the West Coast of the U.S. The others may overlap with the population of West Coast bookstores, but these groups probably have some bookstores that aren't on the West Coast.)

True or False: Your population of interest is whatever you decide it is. A population can be anything as long as it's defined as a population.

true
(If you're interested in trees in general, but elm trees in particular, especially those close to where you live, you could say that your population is not trees in general, but elm trees in the park next to your house. Then an appropriate sample wou

You're measuring the weights of a group of dogs. You get a mean weight of x-bar open parentheses top enclose straight x close parentheses equals 48.8 pounds and a standard deviation of s = 17.5 pounds. The dogs are:

a sample.
(The symbols used for the mean and standard deviation are symbols for statistics, not parameters. This is a clue that the group of dogs is a sample from some population of interest.)

Every ten years the United States takes a census, which is a survey of every person in the country. If you took the census data that told you the number of people in the United States, and if from all of those numbers you calculated the mean age, what sym

N and �.
(A census counts every member of a given population (though in practice it isn't always successful at reaching everyone). The symbol for the parameter population size is N, and the symbol for the parameter population mean is � (mu). n and x-bar a

True or False: If your sample is made up of power tools randomly selected from one hardware store, your population of interest is all power tools sold in hardware stores

false
(If you randomly select the tools, but only from one hardware store, the relevant population is all power tools in that hardware store alone. You can't assume that all hardware stores carry the same kinds of power tools.)

True or False: If the population of interest is all day care centers in the United States, a sample of day care centers could be either all day care centers in New York City or a randomly selected group of day care centers throughout the United States. Ei

false
(A simple random sample of day care centers in the U.S., rather than a sample that comes from only one city, is more likely to produce statistics that accurately estimate the population parameters you're interested in. This is because in an SRS each

A scientist is testing certain sampling strategies to see which one is best. She's gathered data from an entire population and calculates a population mean � (mu) = 14 and a population standard deviation ?(sigma) = 5. She draws five different samples from

Sample 2
(The statistics from Sample 2 are the best estimates of both population parameters, indicating that this is probably the best sampling strategy.)

The kind of sampling strategy least likely to produce statistics that are good estimates of population parameters is a:

haphazard sample

True or False: A simple random sample is not just a sample where every population member has an equal chance of being drawn.

true
(A simple random sample also has the requirement that all possible samples are equally likely, meaning that every possible combination of population members has the same chance of occurring.)

Consider these eight observations: {11, 6, 2, 5, 8, 4, 4, 9}. What is the mean?

6.125

Consider these eight observations: {11, 6, 2, 5, 8, 4, 4, 9}. What is the median?

5.5

A class has the following distribution of eye colors: 10 blue, 18 brown, 5 green. Which measure of central tendency should you use to find the eye color of the typical class member? What do you get when you use this measure?

mode; brown

Since the distribution of housing prices in a community is usually skewed right, which measure of center should you use for housing prices?

median

The blood pressure reading for any age group is normally distributed. The normal distribution is symmetric and mound-shaped. Therefore, the correct measure of central tendency to use for blood pressure is:

the mean.

A hockey team has completed 35 games. The team's median goals per game is 2. Which of the following must be true about the team's goal total so far?

The median doesn't allow us to infer the exact goal total, AND it is at least 36.

A basketball fan thinks the large salaries of NBA players will force the NBA to raise ticket prices. Here's how she came to this conclusion: since salaries are part of the total NBA operating costs, she used the NBA average salary to infer the total. Whic

the mean

What is the only measure of center that can be used with categorical, or non-numeric, data?

mode

Six radio listeners are surveyed. Their favorite FM stations are: 89.1, 89.1, 89.1, 94.7, 94.7, and 104.3. Based on these data, you want to name the favorite station of a typical listener. You should name:

the mode, which is 89.1

Mr. Thompson wants to curve student's exam scores based on the highest score in the class. He takes the highest score (which happens to be an outlier) and treats it as the perfect score. He then computes everyone else's score as a percentage of this perfe

Grading scores relative to the median score

If you list and graph the dates of coins in people's pockets and purses you'd probably find that the graph's distribution is skewed left, because more recent dates are more common. If you want to express the average date of a coin you'd use:

the median

A synonym for variation is:

spread.
(While distance is a component of calculating variation, it's not a synonym.)

In inferential statistics, variation is an essential measurement for:

making predictions.

Each of the following data sets has a mean of 40.
I {38, 43, 47, 27, and 45}
II {41, 40, 39, 42, and 38}
III {59, 41, 53, 17, and 30}
Estimate their population standard deviations (represented by sigma: ?) and list them from smallest to largest according

II (1.41), ! (7.16), III (15.23)

Estimate, to the nearest whole number, the sample standard deviation of this data set, which is a sample from a larger population: {71, 75, 65, 73, 69, 77, and 67}. The mean of the data in this sample is 71.

4

What measure of central tendency do you use with standard deviation?

Mean

How many degrees of freedom does a sample of 12 have when you calculate a standard deviation?

11
(n - 1, n = 12, 12 - 1 = 11)

All the following statements about the sample standard deviation are true, except:

the standard deviation is negative when there are extreme values in the sample.
(variation can never be in negative numbers)

True or False: Degrees of freedom are used to calculate both the population standard deviation and sample standard deviation formulas.

false

Based on the data in the table below, what is the smallest number of births (in thousands) a month could possibly have and still be an upper outlier?

for a data set like births, apparently the upper + lower outliers need to be whole numbers.
ie. 357.7 is the correct calculation, but 358 is the correct answer

In a distribution with many values, which of the following percentiles is equal to the median?

50th percentile

In a distribution with many values, which of the percentiles is also known as Q1?

25th percentile

If you have a data set of 40 whole numbers, which of the following could be true about the five-number summary?

The upper quartile does not have to be a whole number.

Why is the interquartile range (IQR) considered to be a resistant statistic?

Adding a new extreme observation has little effect on it.
(The IQR is based on the median, which is resistant. That is, a new extreme value added to the data set will have a much larger effect on the mean than on the median.)

What is the maximum length of a whisker in a modified box plot where the median = 120, Q1 = 100, Q3 = 150, the minimum = 20, and maximum = 270?

75

You have a distribution summarizing the number of days in the past two months (60 days) that an individual watched TV. The median number of days = 25, the lower quartile = 21, and upper quartile = 42. Given this information, which of the following stateme

there are no outliers in this distribution

The following hypothetical data set shows the purchase prices (in thousands) for a sample of 3-bedroom, 2-bathroom homes in Essex County, MA, over the past year. Compute the five-number summary and create a modified box-and-whisker plot. How many outliers

In this distribution Q1 = 234, Q3 = 298, IQR = 64, and IQR AP Statistics 1.5 = 96. Therefore, the threshold values for outliers are 138 and 394. You can see that three houses (129, 401, and 426) fall outside the threshold values.

True or False: The shape and standard deviation of a population distribution of a variable (such as income) can be estimated with a distribution of a sample of sufficient size

true
(Just as sample statistics are used to estimate population parameters, distributions of samples can be used to estimate the shapes and standard deviations of population distributions.)

True or False: The sample size you need to estimate the population distribution should always be at least 10% of the population size.

false
(The sample size you need isn't dependent on population size. A sample of 1,000 to 1,500 observation is usually enough to give a reliable estimate of the distribution of a variable in the population, no matter how big the population is. With as few

From left to right (smallest to largest), what is the order of the different measures of central tendency in a negatively (left) skewed distribution?

Mean, median, mode
(The mode is the peak of the curve, left of that is the median, which is the middle value of the distribution, and furthest over on the left is the mean.)

Which measure of central tendency and which measure of variation should be used with a heavily skewed distribution?

The median and inter-quartile range
(The median and inter-quartile range are used because they're less likely to be influenced by outliers in a skewed distribution.)

True or False: The mean and standard deviation are usually not used together because of outliers.

false
(The mean and standard deviation should be used together since the standard deviation measures deviation from the mean. Keep in mind, however, that the mean is sensitive to the effect of outliers)

True or False: One reason to use a sample to estimate the shape of a population distribution is to determine which statistics are appropriate to use for that variable, since different statistics have different characteristics.

true
(There are several different statistics you could use to measure central tendency, but different ones work better with different shapes of distributions.)

True or False: A large randomly selected sample always gives a better estimate of the population than a small randomly selected sample.

false
(Remember though, smaller samples can still work very well if the sample is representative of the population. It's even possible for a very large simple random sample to give a less accurate estimate than a smaller simple random sample, since the sa

In a sample's distribution of income, the modal income (mode, or most frequently occurring observation of income) is $27,000 a year, the median income is $35,000 a year, and the mean income is $45,000 a year. Which statistic do you think is the best estim

$35,000, skewed right
(The median value, $35,000, is the best average to use in a skewed distribution. You can tell the distribution is skewed right because the order of the statistics from left to right is mode, median, and mean.)

True or False: For large populations, 1,000 is the best sample size.

false
(The best size for your sample depends on many factors, including the shape of the distribution and acceptable margin of error in your study. Sometimes you may need a sample size of fewer than 1,000, and sometimes you may need a size greater than 1,

True or False: For a symmetric, mound-shaped distribution, the mean, median, and mode are all the same.

true
(A symmetric, mound-shaped distribution will have its mean as the most common value. Half of the values will be above the mean and half will be below it.)

You've drawn a simple random sample from a population. The standard deviation of this sample is a(n):

statistic

A standard deviation calculated from data from an entire population would be called a(n):

parameter

Look at the data set below. How many possible values are there for this variable? What kind of variable is it?
red yellow
blue yellow
white red
green blue
green green
blue yellow
red white
yellow orange

6, categorical

for a mound-shaped and symmetrical distribution, what measure of center and measure of variation should be used?

the mean and the standard deviation

for a skewed distribution, what measure of center and measure of variation should be used?

the median and the IQR
(When the data are numeric and the distribution is skewed either to the right or the left, the median is usually the best choice for the measure of central tendency. This is because the mean is too heavily influenced by extreme obse