Business Statistics Test #1

What are descriptive statistics?

Collecting, summarizing, visualizing, presenting and analyzing data.- Sample mean, standard deviation, and proportion- graphs and tables

What are Inferential Statistics?

Using data collected from a small group to draw conclusions about a larger group.- Confidence intervals

Big Data

a term that describes the large volume of data - both structured and unstructured - that inundates a business on a day-to-day basis

What are Categorical variables?

t(qualitative) variables take categories as their values such as "yes", "no", or "blue", "brown", "green".

What are numerical values?

(quantitative) variables have values that represent a counted or measured quantity.

What are discrete variables?

variables that arise from a counting process

What are continuous variables?

variables that arise from a measuring process

What is a population?

All the items or individuals about which you want to draw a conclusion. The population is the "large group.

What is a sample?

The portion of a population selected for analysis. The sample is the "small group.

What is a parameter?

numerical measure that describes a characteristic of a population.

What is a statistic?

A numerical measure that describes a characteristic of a sample.

What is the symbol for population parameter?

μ (Population Mean)

What is the symbol for sample statistics?

x bar (Sample Mean)

What are the characteristics of the mean?

the sum of values divided by the number of values. Affected by extreme values (outliers)

What are the characteristics of the median?

the "middle" number (50% above, 50% below) of a rank order data set; 50% percentile. Less sensitive to extreme values

What is an outlier?

an extreme value located far away from its mean

Variance

Average (approximately) of squared deviations of values from the mean. A necessary measure of variation for computational purposes, but not practical since the measurement is in units squared

Standard deviation

Most commonly used measure of variation. Shows variation about the mean. Is the square root of the variance. More practical measurement of variation since it is in the same units as the original data

How are the variance, standard deviation, and coefficient of variation used to determinethe degree of variability or dispersion or consistency?

The more the data are spread out, the greater the range, variance, and standard deviation. The more the data are concentrated, the smaller the range, variance, and standard deviation. If the values are all the same (no variation), all these measures will be zero. These measures are NEVER negative.

What is Coefficient of Variation and how is it interpreted?

Measures relative variation. Always in percentage (%). Shows variation relative to mean. Can be used to compare the variability of two or more sets of data measured in different units

What are the characteristics of a Z-Score and how is it interpreted to determine extreme outliers?

The Z-score is the number of standard deviations a data value is from the mean. A data value is considered an extreme outlier if its Z-score is less than -3.0 or greater than +3.0. The larger the absolute value of the Z-score, the farther the data value is from the mean.

How do you identify a left-skewed distribution shape?

Mean < Median, negative skewness

How do you identify a symmetric distribution shape?

Mean = Median, zero skewness

How do you identify a right-skewed distribution shape?

Median < Mean, positive skewness

How do you Interpret the 1st, 2nd, and 3rd Quartiles

1st:the 25th percentile; value for which 25% of the observations are smaller and 75% are larger2nd:the median; the 50% percentile; (50% of the observations are smaller and 50% are larger)3rd:the 75th percentile; value for which 75% of the observations are smaller and 25% are larger

What 5 statistics make up the 5 number summary?

MinimumFirst Quartile: (Q1)Median: (Q2)Third Quartile: (Q3)Maximum

What are the characteristics of quartiles and the interquartile range?

Quartiles split the ranked data into 4 segments with an equal number of values per segment.IQR measures the spread in the middle 50% of the data The IQR is also called the midspread because it covers the middle 50% of the data

Where are the statistics of the 5 number summary graphed on a Boxplot

smallest - 25% - Q1Q1 - 25% - MedianMedian - 25% - Q3Q3 - 25% - Largest

What are the 3 points of the Empirical rule?

Approximately 68% of the data in a bell shaped distribution is within 1 standard deviation of the mean, or μ ± 1σApproximately 95% of the data in a bell-shaped distribution lies within two standard deviations of the mean, or μ ± 2σApproximately 99.7% of the data in a bell-shaped distribution lies within three standard deviations of the mean, or μ ± 3σ(it applies ONLY to the Normal Distribution)

What is meant by "statistical inference"?

obtaining information about a population from information contained in a sample

What is the difference between a parameter and a statistic?

a parameter is a numerical characteristic of a population and a statistic is a numerical characteristic of a sample

What are the 4 Types of Survey Errors and what is an example of each?

Coverage error or selection biasExists if some groups are excluded from the frame and have no chance of being selectedExample: A researcher who obtains a sample by calling only households with landline phones would exclude any household that does not have a landline phone. This would create Coverage Error.2. Nonresponse error or biasPeople who do not respond may be different from those who do respond3. Sampling errorVariation from sample to sample will always exist; cannot be avoided4. Measurement errorDue to weaknesses in question design, respondent error, and interviewer's effects on the respondent

What is the difference between Validity and Reliability of samples?

Validity: Are assessments accurate? Do they measure what they are supposed to measure?Reliability: Are assessments consistent? (over time, across population, across respondents, etc.)

What is the Central Limit Theorem?

The sampling distribution of the sample mean, (x-bar), approaches a normal distribution as the sample size, (n), increases, regardless of the population from which it came.

What is considered a large sample in order for the Central Limit Theorem to apply?

For most distributions, n ≥ 30 will give a sampling distribution that is nearly normal

What does extrapolating from small samples tell us?

Sample means converge to the population mean as the sample size increases. Thus, you will see more extreme values in small samples.

What are the characteristics of the Normal Distribution?

Bell ShapedSymmetricalMean, Median and Modeare EqualLocation is determined by the mean, μSpread is determined by the standard deviation, σ

What are the characteristics of the STANDARD Normal Distribution (z)?

Also known as the "Z" distributionMean is equal to 0Standard Deviation is equal to 1

What is the margin of error (or the sampling error) in interval estimation?

a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a survey of the entire population. The margin of error or the sampling error is the value that will be added & subtracted to the sample statistic to create the interval

What is "alpha" (α) in confidence interval estimation?

the level of significance. In confidence interval estimation, alpha (α) will be equal to the sum total of the area in both tails of a normal probability distribution curve

What is meant by the "level of significance" of your study in confidence interval estimation?

that you are % certain that the true value of your population mean lies between --- and --- values

What is the most common confidence level used by national polling companies and statisticians?

95%

When can you use z vs. t when creating confidence intervals?

If the sample sizes are larger, that is both n1 and n2 are greater than 30, then one uses the z-table. If either sample size is less than 30, then the t-table is used.

What is the difference between "accuracy" and "precision" in interval estimation?

ACCURACY: How confident? A higher confidence level will produce a more accurate interval estimate than a lower confidence level.PRECISION: How close?"exactness" or "specificity"A higher confidence level will produce a less precise interval estimate than a lower confidence level. The interval becomes wider.

What happens to the width of the interval and the precision level of a confidence interval as......the confidence level changes...the sample size changes...the value for the standard deviation changes?

As he/she raises the confidence level, the estimate will become more accurate. But the width of the interval increases and becomes less precise.As he/she increases the sample size, the accuracy level is not affected. But the width of the interval decreases and becomes more precise.As the observations of the sample become more varied, the standard deviation value will increase and the width of the interval will also increase. Therefore, the interval will be less precise.

How do you identify Z-scores for confidence intervals at abnormal confidence levels?

Z = (X-Xbar)/S

What are the characteristics of the t-distribution?

The mean of the distribution is equal to 0 . The variance is equal to v / ( v - 2 ), where v is the degrees of freedom and v > 2. ... With infinite degrees of freedom, the t distribution is the same as the standard normal distribution

What is a pilot sample or a pilot study?

a preliminary sample taken to calculate a sample standard deviation, s, to be used as the planning value for s

When determining sample sizes for studies of the proportion, how would you determine the MOST CONSERVATIVE estimate for the sample size?

the most CONSERVATIVE estimate for the sample size can be determined by using 0.5 as an estimate for π