Most analysts focus on the cost of tuition as the way to measure the cost of a college education. But
incidentals, such as textbook costs, are rarely considered. A researcher at the University of Oklahoma
wishes to estimate the textbook costs of first-yea
All first-year University of Oklahoma Students
The evening host of a dinner reached into a bowl, mixed all tickets around, and selected the ticket to award
the grand door prize. What kind of sample will be generated?
Simple random sample
A telemarketer set the company's computerized dialing system to contact every 25th person listed in the
local telephone directory. What kind of sample will be generated?
Systematic sample
The Dean of Students mailed a survey to a total of 400 students. The sample included 100 students
randomly selected from each of the freshman, sophomore, junior, and classes on campus last term. What
kind of sample will be generated?
Stratified sample
The width of each bar in a histogram corresponds to the
Differences between the boundaries of the class
In a perfectly symmetrical bell-shaped "normal" distribution
The arithmetic mean equals the median, The median equals the mode, and The arithmetic mean equals the mode
. In right-skewed distributions, which of the following is the correct statement?
The distance from Q1 to Q2 is less than the distance from Q2 to Q3
According to the empirical rule, if the data has a "bell-shaped" normal distribution, about _____________
percent of the observations will be contained within 2 standard deviations around the arithmetic mean.
95
Which of the following is NOT a measure of central tendency? (arithmetic mean, geometric mean, mode, interquartile range)
The interquartile range
You were told that the 1st, 2nd, 3rd quartiles of female students' weight at the University of Oklahoma 95
lbs, 125 lbs, and 138 lbs. What is the percentage of students who weigh more than 138 lbs?
25%
The rate of return for a stock over three year period is 0.527, 0.145, and 0.684. Which of the following
measures is the best measure of central tendency for these rates? (The arithmetic mean of return, the median return, the geometric mean rate of return
The geometric mean rate of return
Which of the following descriptive measures can be used to identify the outliers in a data set?
The Z-score for each observation
Let's play a game. You may win $200 with the probability of about 33%, and you may loose $100 with
the probability of about 66%. What is the expected value of this game?
$ 0
The Central Limit Theorem implies that:
Regardless of the population distribution, the sampling distribution of the mean is
approximately normal when the sample size is large enough.
The Standardized Normal Distribution
is bell-shaped and symmetric, with its mean being equal to zero (0) and its standard deviation
being equal to one (1).
Let's assume that B1 and B2 are mutually exclusive and collectively exhaustive events. Also assume
that the joint probability of A and B1 and the joint probability of A and B2 are non-zero. Given these
assumptions, identify that wrong statement:
The probability of an event like A can be written as the product of the conditional probability
of A given B1 and the conditional probability of A given B2
Using ________________, one may make use of new information to update a conditional probability.
Bayes' Theorem
The historical data on the number of times that a given team in Major League Baseball has clinched its
division (i.e., has made it to the next round of the games) is available to almost everyone. You are
asked to report the probability that a given team c
Binomial Probability Distribution Function
The marketing department of a middle-sized manufacturer has 45 employees. 20 of them are female
and 25 of them are male. A group of 5 employees are randomly chosen to travel and meet with regional
sales departments. You are asked to compute the probabilit
Hypergeometric Probability Distribution Function
The historical data on the number of electric network outages per month are available for a local utility
provider. You are asked to compute the probability that more than one (1) outage occurs each month.
What kind of Probability Distribution Function wo
Poisson Probability Distribution Function
X is a continuous random variable (e.g. time required to download a music file), which is normally
distributed with the mean ? and standard deviation ?. Probability of X being less than XL is equal to PL.
Probability of X being more than XU is equal to PU
P(XL ? X ? XU) = 1 - (PU + PL)
Let Y be a discrete random variable with a Poisson distribution. And let X be a continuous random
variable with a Normal distribution. Also, let C be a constant. Identify the correct statement.
P(X=C) is always equal to zero
The mean of the Sampling Distribution of the Means is an unbiased estimator of the population
mean...
when the sample size is large enough
Identify the correct statement:
***A.We make use of sample statistics (e.g. the sample mean) to estimate population parameters
(e.g. the population mean)***
b. We make use of population parameters (e.g. population mean) to estimate sample statistics
(e.g. the sample mean)
c. The samplin
A and B are two independent events. P(A|B) is equal to:
P(A)
X is a continuous random variable, which is distributed normally. From the X continuum, we choose a
given value, called X
. The Z-value of X
is equal to Z
. Probability of Z<=Z
is equal to ?*. What is
the probability of X>X*?
P(X>X
) = 1 - ?
X is a continuous random variable. Z is the Z-value associated with the observations in X. We can say
that X is normally distributed when:
X is a linear function of Z
A population of interest is not distributed normally. A group of researchers repeatedly choose a number
of random samples from this population. As they choose more samples, they increase the sample sizes.
Complete the following statement:
As sample sizes
decreases
X is a continuous normal variable with a Normal Probability Distribution. Z is the Z-values associated
with X. The Cumulative Standardized Normal Distribution table/function includes:
P(Z<=Z*)
DCOVA
DEFINE the variables that you want to study in order to solve a problem or meet an objective
COLLECT the data for those variables from appropriate sources
ORGANIZE the data collected by developing tables
VISUALIZE the data collected by developing charts
A
Categorical Variables
Have values that can only be placed into categories such as yes and no
Numerical Variables
Have values that represent a counted or measured quantity
Discrete variables
have numerical values that arise from a counting process.
EX: Number of items purchased
Continuous variables
have numerical values that arise from a measuring process.
EX: Time spent waiting in a checkout line
Nominal scale
classifies data into distinct categories in which no ranking is implied
Ordinal scale
classifies values into distinct categories in which ranking is implied
Interval scale
an ordered scale in which the difference between measurements is a meaningful quantity but does not involve a true zero point
Ratio scale
ordered scale in which the difference between the measurements involves a true zero point, as in height, age, or salary measurements.
Primary Data Source
Collect your own data
Secondary data source
Someone else collected the data you are using
Population
Consists of all items or individuals about which you want to reach conclusions
Sample
portion of the population selected for analysis
Structured data
data that follows some organizing principle or plan, typically a repeating pattern
Unstructured data
follows no repeating pattern
Mutually exclusive
The category definitions cause each data value to be placed in one and only one category
Collectively exhaustive
The set of categories you create for the new, recoded variables include all the data values being recoded
Simple random sample
Every item from a frame has the same chance of selection as every other item, and every sample of a fixed size has the same chance of selection as every other sample of that size.
Systematic sample
You partition the N items in the frame into n groups of k items, where k=N/n
Round k to the nearest integer. To select a systematic sample, you choose the first item to be selected at random from the first k items in the frame. Then, you select the remain
Stratified sample
You first subdivide the N items in the frame into separate subpopulations, or strata. A stratum is defined by some common characteristic, such as gender or year in school. You select a simple random sample within each f the strata and combine the results
Cluster sample
divide the N items in the frame into clusters that contain several items. Clusters are often naturally occurring groups, such as counties. You then take a random sample of one or more clusters and study all items in each selected cluster
Summary table
Tallies the values as frequencies or percentages for each category. Helps you see the differences among the categories by displaying the frequency, amount, or percentage of items in a set of categories in a separate column
Contingency table
cross-tabulates, or tallies jointly , the values of two or more categorical variables, allowing you to study patterns that may exist between the variables. Tallies can be shown as a frequency, percentage of the overall total, percentage of the row total,
Frequency distribution
Tallies the values of a numerical variable into a set of numerically ordered classes. Each class groups a mutually exclusive range of values, called a class interval. Each value can be assigned to only one class, and every value must be contained in one o
What are class intervals identified by?
Their class midpoints
Relative frequency distribution
presents the relative frequency, or proportion, of the total for each group that each class represents
Percentage distribution
presents the percentage of the total for each group that each class represents
=proportionx100%
Proportion (or relative frequency)
is equal to the number of values in each class divided by the total number of values.
Cumulative percentage distribution
provides a way of presenting information about the percentage of values that are less than a specific amount. You use a percentage distribution as the basis to construct a cumulative percentage distribution.
Bar chart
visualizes a categorical variable as a series of bars, with each bar representing the tallies for a single category. The length of each bar represents either the frequency or percentage of values for a category and each bar is separated by a gap
Pareto chart
The tallies for each category are plotted as vertical bars in descending order, according to their frequencies, and are combined with a cumulative percentage line on the same chart. They get their name from the pareto principle, the observation that in ma
Side-by-side bar chart
uses sets of bars to show the joint responses from 2 categorical variables
Histogram
visualizes data as a vertical bar chart in which each bar represents a class interval from a frequency or percentage distribution.
Percentage polygon
Used when using a categorical variable to divide the data of a numerical variable into 2 or more groups. This chart uses the midpoints of each class interval to represent the data of each class and then plots the midpoints, at their respective class perce
Cumulative percentage polygon (ogive)
uses the cumulative percentage distribution to plot the cumulative percentages along the Y axis. Unlike the percentage polygon, the lower boundary of the class interval for the numerical variable are plotted, at their respective class percentages, as poin
Multidimensional contingency table
used to tally the responses of 3 or more categorical variables.
Lurking variable
A variable that is affecting the results of the other variables
Central tendency
the extent to which the values of a numerical variable group around a typical, or central, value.
Variation
measures the amount of dispersion, or scattering, away from a central value that the values of a numerical variable show. The shape of a variable is the pattern of the distribution of values from the lowest value to the highest value
Arithmetic mean
typically referred to as the mean, is the most common measure of central tendency.
Median
Middle value in an ordered array of data that has been ranked from smallest to largest.
=(n+1)/2
If you have an even amount of numbers, average 2 middle values.
Geometric mean
Used when you want to measure the rate of change over time
=the nth root of the product of n values
Variance and Standard deviation
2 commonly used measures of variation that account for how all the values are distributed
How to hand compute sample variance
1. Compute the difference between each value and the mean
2. square each difference
3. sum the squared differences
4. divide this total by n-1 to compute sample variance
5. take the square root of the sample variance to compute sample standard deviation
Coefficient of Variation
measures the scatter in the data relative to the mean.
=standard deviation/mean
z score
the difference between that value and the mean, divided by the standard deviation. A z score of 0 indicates that the value is the same as the mean. If it is a positive or negative number, it indicates whether value is above or below the mean and by how ma
Left skewed
Most values are in upper portion
Right skewed
Most values are in the lower portion
Mean<Median
negative, or left-skewed distribution
Mean=Median
Symmetrical distribution with 0 skewness
Mean>Median
Positive, or right-skewed distribution
Kurtosis
Measures the extent to which values that are very different from the mean affect the shape of the distribution of a set of data. It affects the peakedness of the curve of the distribution- that is, how sharply the curve rises approaching the center of the
lepokurtic
a kurtosis value that is greater than 0
platykurtic
a kurtosis value that is less than 0
Quartiles
split the values into 4 equal parts
First quartile
divides the smallest 25% of the values from the other 75% that are larger
Second quartile
the median; 50% of the values are smaller than or equal to the median, and 50% are larger than or equal to the median.
Third quartile
divided the smallest 75% of the values from the largest 25%
Percentiles
Split a variable into 100 equal parts
Interquartile range
measures the difference in the center of a distribution between the third and first quartiles
Resistant measures
Descriptive statistics such as the median, Q1,Q3, and the interquartile range, which are not influenced by extreme values
Population mean
Sum of the values in the population divided by the population size, N.
Empirical Rule
States that for population data that form a normal distribution, the following are true:
1. Approximately 68% of the values are within +- 1 standard deviation from the mean
2. Approx. 95% of the values are within +-2 standard deviations from the mean
3. A
The Chebyshev Rule
States for any data set, regardless of shape, the percentage of values that are found within distances of k standard deviations from the mean must be at least (1-1/k^2)x100%. You can use this rule for any value of k greater than 1. Use this rule for heavi
Covariance
measures the strength of the linear relationship between 2 numerical variables
Coefficient of Correlation
Measures the relative strength of a linear relationship between 2 numerical variables. Range from -1 for a perfect negative correlation to +1 for a perfect positive relationship
Probability
Numerical value representing the chance, likelihood, or possibility that a particular event will occur.
priori probability
Probability of an occurrence is based on prior knowledge of the process involved.
Empirical probability
Probabilities are based on observed data, not on prior knowledge of a process
Subjective probability
differs from person to person; usually based on a person's past experience, personal experience, and analysis of a particular situation
Event
Each possible outcome of a variable
Simple event
Described by a single characteristic
Joint event
An event that has 2 or more characteristics
Sample space
collection of all possible events
Simple probability
probability of the occurrence of a simple event
Joint probability
probability of occurrence involving 2 or more events
Marginal probability
Consists of a set of joint probabilities (Add them all together)
General addition rule
P(A or B)= P(A)+P(B)-P(A and B)
Conditional Probability
refers to the probability of event A, given information about the occurrence of another event, B.
P(AlB) = P(A and B)/ P(B)
Decision tree
alternative to a contingency table
Independence
When the outcome of one event does not affect the probability of occurrence of another event. 2 events are independent if P(A l B) = P(A)
General multiplication rule
P(A and B) = P(AlB)P(B)
Multiplication rule for independent events
P(A and B)= P(A)P(B)
Bayes' theorem
used to revise previously calculated probabilities based on new information
Probability distribution for a discrete variable
mutually exclusive list of all the possible numerical outcomes along with the probability of occurrence of each outcome
Expected value
the mean of the probability distribution
Covariance of a probability distribution
measures the strength of the relationship between 2 variables
Mathematical modelv
mathematical expression that represents a variable of interest.
Probability distribution function
math model for discrete random variables
Binomial distribution
Used when the discrete variable is the number of events of interest in a sample of n observations; has 4 important properties:
1. The sample consists of a fixed number of observations, n.
2. Each observation is classified into one of 2 mutually exclusive
Poisson Distribution
Used to calculate probabilities in situations such as these if the following properties hold:
1. You are interested in counting the number of times a particular event occurs in a given area of opportunity. The area of opportunity is defined by time, lengt
Hypergeometric Distribution
The sample data are selected without replacement from a finite population, thus the result of one observation is dependent on the results of the previous observations.
Normal distribution
the most common continuous distribution used in statistics. It is vitally important in statistics for 3 main reasons:
1. Numerous continuous variables common in business have distributions that closely resemble the normal distribution
2. The normal distri
Important theoretical properties of the normal distribution
1. It is symmetrical, and its mean and median are therefore equal
2. It is bell-shaped in appearance
3.Its interquartile range is equal to 1.33 standard deviations. Thus, the middle 50% of the values are contained within an interval of two-thirds of a sta
Normal probability plot
a visual display that helps you evaluate whether the data are normally distributed
Sampling distribution of the mean
The distribution of all possible sample means if you select all possible samples of a given size
Central Limit theorem
As the sample size gets large enough, the sampling distribution of the mean is approx. normally distributed. This is true regardless of the shape of the distribution of the individual values in the population
Conclusions of the central limit theorem
1. For most distributions, regardless of the shape of the population, the sampling distribution of the mean is approx. normally distributed if samples of at least size 30 are selected
2. If the distribution of the population is fairly symmetrical, the sam
Sampling Error
the variation that occurs due to selecting a single sample from the population. The size of the sampling error is primarily based on the amount of variation in the population and on the sample size. Large samples have less sampling error than small sample
t distribution
very similar in appearance to the standardized normal distribution. The t distribution has more area in the tails and less in the center than does the standardized normal distribution.
What 3 quantities do you need to compute the sample size
1. The desired confidence level, which determines the value of the critical value from the standardized normal distribution
2. The acceptable sampling error
3. The standard deviation
Null Hypothesis
The hypothesis that the population parameter is equal to the company specification.
Alternative hypothesis
the conclusion reached by rejecting the null hypothesis
Summary of null and alternative hypothesis
1. The null hypothesis represents the current belief in the situation
2. The alternative hypothesis is the opposite of the null hypothesis and represents a research claim or specific inference you would like to prove
3. If you reject the null hypothesis,
Critical Value
The first thing you determine to make a decision concerning the null hypothesis; it divides the nonrejection region from the rejection region. The size of the rejection region is directly related to the risks involved in using only sample evidence to make
Type 1 Error
occurs if you reject the null hypothesis when it is true and should not be rejected; known as a "false alarm
Type 2 error
occurs if you do not reject the null hypothesis when it is false and should be rejected; known as a "missed opportunity" to take some corrective action
Level of significance
probability of committing a type 1 error
B risk
probability of committing a type 2 error
Confidence coefficient
the complement of the probability of a type 1 error; the probability that you will not reject the null hypothesis when it is true and should not be rejected.
Power of a statistical test
The complement of the probability of a type 2 error; the probability that you will reject the null hypothesis when it is false and should be rejected
p-value
the probability of getting a test statistic equal to or more extreme than the sample result, given that the null hypothesis is true; known as the observed level of significance. Using the p-value to determine rejection and nonrejection is another approach
The decision rules for rejecting the null hypothesis in the p-value approach are
1. If the p-value is greater than or equal to a, do not reject the null hypothesis
2. If the p-value is less than a, reject the null hypothesis
IF THE P-VALUE IS LOW, THE NULL HYPOTHESIS MUST GO
Robust test
t test is an example; it does not lose power if the shape of the population departs somewhat from a normal distribution, particularly when the sample size is large enough to enable the test statistic to follow the t distribution.
Summary of the null and alternative hypotheses for one-tail tests
1. The null hypothesis represents the status quo or the current belief in a situation
2. The alternative hypothesis is the opposite of the null hypothesis and represents a research claim or specific inference you would like to prove
3. If you reject the n
pooled-variance t test
Can be used if you assume that the random samples are independently selected from 2 populations and that the populations are normally distributed and have equal variances to determine whether there is a significant difference between the means
When do you reject the null hypothesis in a two tail test?
if the computed test statistic is greater than the upper-tail critical value from the t distribution or if the computed test statistic is less than the lower tail critical value from the t distribution
Separate -variance t test
Used if you can assume that the 2 independent populations are normally distributed but cannot assume that they have equal variances, you cannot pool the two sample variances into the common estimate and therefore cannot use the pooled-variance t test.
paired t test
Can use if you assume that the difference scores are randomly and independently selected from a population that is normally distributed in order to determine whether there is a significant population mean difference