Continuous data with unique zero point.
Ratio
Orders data at equal distance apart.
Interval
Place qualitative objects in some kind of order.
Ordinal
Identify, group, or categorize.
Nominal
There are two types of statistics (Analytics)
descriptive and inferential
Descriptive statistics are used to ______
Inform / Explanatory
Inferential statistics are used to ______
Predict / Trend
Name the 4 levels of measurement
NOIR:
Nominal
Ordinal
Interval
Ratio
Levels of Measurement
1. Nominal
2. Ordinal
3. Interval
4. Ratio
Types of Errors
1. Out-of-Range
2. Random Error-No Correlation
3. Omission Error - Distorted Results
4. Systematic Error - Skewed Results
5. Reduce/Minimize Errors
Outliers create this type of error
out of range
Unpredictable error
Random Error - No correlation
Error may occur from missing data.
(Example: Space not filled in)
Omission Error - Distorted results
This error repeats itself
Systematic Error - Skewed results
What is the process of quality control?
Reduce/ minimize errors
Types of Studies
- Experimental
- Observations
- Blind
- Treatments
- Double Blind
All variable measurements and manipulations are under the researcher's control
Experimental study
Participants are not told if they are in the treatment group or control group
Blind Study
Used when impractical or impossible to control the conditions of the study
Observational study
The procedure the researcher applies to each subject
Treatments
Neither the treatment allocator nor the participants know who is in the treatment group or control group
Double blind study
Types of Bias
- Information
- Expected Monetary Value (EMV) Analysis
- Outliers (can be included or excluded)
- Measurement
- Conscious
Questions favor an outcome or the interviewer ask questions that favor an outcome.
Information
The average outcome (payoff) when the future includes scenarios that may or may not happen
Expected Monetary Value (EMV) Analysis
Observation points that are distant from other observations.
Outliers
Note: Can be included or excluded in analysis (causes skewness)
Bias that occurs from not selecting a random sample
Measurement bias
Bias introduced because respondents believe it will be beneficial if selected.
Conscious bias
Descriptive Statistic Measurements
- Median
- Z-Score = (Value-Mean)/Std Deviation
- Variance (Standard Deviation)
- Standard Deviation
- Mean
Middle score for a set of data
Median
Note: Skewness does not affect the median
Tells us the number of standard deviations a data point is from the mean.
Z-score = (Value - Mean) / Std Deviation
If the average is the same for two groups, what will determine their difference?
Variance (Standard Deviation)
The spread of data in a sample. How far the data points are from the mean.
Standard deviation
Measure of central tendency that is influenced by the size of the values in a dataset.
Mean
Note: Skewness does affect the mean
Percentile Scores
- Quartiles
- IQR: Inter-quartile range
- Box Plot
Each of the four quartile groups a population can be divided
Quartiles
Measures the difference between the third and first quartile
IQR: Inter-quartile range
Note: Must be ordered in lowest to highest value
Used to study the composition of a data set and examine the distribution
Box Plot
Rules of Probability
1. The probability of any event must be between 0 and 1, inclusive. 0 ? P(E) ? 1.
2. The sum of the probabilities of all outcomes must equal 1.
3. If E and F are disjoint events, then P(E or F) = P(E) + P(F). If E and F are not disjoint events, then P(E o
There are six toll booths to enter the highway. What probability does each toll booth worker have of getting the next customer?
1 customer and 6 booths = 1/6 or 16.7%
The order you pick you sample in does not matter
Combination
Picking employees for a shift. Order doesn't matter.
When given P(A) given P(B), you can use this to find the P(B) given P(A)
Bayes Theorem
You must know P(A), P(B), P(A) given B
Apply this rule when looking for two events occurring (AND)
Multiplication
Use this rule when looking for one or the other event happening (OR)
Addition
Statistical Tests
- Linear programming
- Linear regression
- Multiple regression
- Correlation coefficient
- R2 (R-Square)
A technique for minimize total cost or maximize profit based on constraints
Linear programming
A technique using a single independent variable to predict a single dependent variable
Linear regression
A technique using more than one independent variable to predict a single dependent variable
Multiple regression
Measures the strength of a linear relationship
Correlation coefficient
Measures the goodness of fit in a regression analysis
R2 (R-Square)
Time Series
- Trend
- Irregularity
- Cyclicality
- Seasonality
A simple regression using time as the independent variable
Time series
A general slope upward or downward over a period of time
Trend
Unforeseen circumstances causing random deviations
irregularity
Repetition in up and down patterns
Cyclicality
Regular pattern within a single year
seasonality
Scatterplots and Correlation
Two most common ways of describing and quantifying association
Distributions and confidence levels
- Cumulative distribution
- Probability Distribution
- Normal Distribution
Represents the probability that a variable falls with a certain range
Cumulative distribution
A list of all the different probabilities of each outcome that can occur
Probability Distribution
Z-score for 99% level of confidence
2.576
Z-score for 95% level of confidence
1.960
Measures of central tendency are approximately equal (Mean and Median)
Normal Distribution
Analysis Techniques
- ANOVA
- F-value
(must be higher than critical value to reject the null)
- T-value
(must be higher than critical value to reject the null)
- Zero
- 1 or -1
Used to compare the mean of three or more groups
ANOVA
ANOVA uses this test statistics
F-Value (must be higher than the critical value to reject the null)
T-test uses this test statistic
T-Value (must be higher than the critical value to reject the null)
A correlation is weak if the coefficient is close to ____
Zero
A correlation is strong if the coefficient is close to ____
1 or -1
Seven Basic Quality Tools
These seven tools are used in quality planning and in quality control and their keywords:
- Run Chart (Over time)
- Control Chart (In limits)
- Cause and Effect (Process identification [why])
- Flow Chart (Process identification [Where])
- Check Sheet (Co
Illustrates performance measurements over a period of time
Run Chart
Illustrates limits or constraints a process should not exceed
Control Chart
Assists in brainstorming issues that are causing a problem
Cause and Effect Diagram
Not measurements!
Visual tool to understand a process
flowchart
Easy tool to collect data to create other charts
check sheet
Graphical display of a data set with one bar for each category
Histogram and Pareto
Graphical display of data set centered
histogram
Graphical display of data set in highest to lowest order
Pareto
Used for potential relationships and correlation between variables
Scatter diagram
Can the seven quality tools be used independently?
Yes
What percent of quality problems does Ishikawa claim the seven tools can solve?
90% - 95%
Diagram demonstrating all of the elements that can influence a process before it starts.
SIPOC (Supplier - Input - Process - Output - Customer)
Manufacturing approach to improving processes.
Six Sigma
In manufacturing, statistics is used for:
quality control
Plan - Do - Study - Act
Which step is a response to analytical results?
ACT
PDSA
Plan, Do, Study, Act
Shows whether a result meets a requirement or not
attribute
Shows how well a result meets the requirement
variable
Variations accepted as the normal part of the process
Ex:
On a chart, you are allowed to be within a certain range.
Common cause variation
Variation from an abnormality causing large discrepancy in results
Ex:
Outside of the allowed range on a chart.
Special cause variation
Model of designing, analyzing, and scoring tests
IRT: Item Response Theory
How does the government differ than private sector cost-benefit analysis?
Government benefits aren't always money. Could be flood prevention or welfare.
Compares one individual's performance to other individuals
Norm Referenced
Compare individual's performance to a standard score (Example: Cut Score 64%)
Criterion referenced
Management strategy that uses results as the central measurement of performance
RBM: Results Based Management
Big Data & Health Care Performance Measures
- Very large data sets
- Prevalence
- Incidence
- Criterion referenced
- Cost-benefit analysis
What is Big Data?
Very large data sets
Used to count ALL of the existing cases in a disease.
Prevalence
Used to count only the NEW cases of a disease.
Incidence (Incident rate)
Compare individual's performance to a standard score (Example: Cut Score 64%)
Criterion referenced
Used to analyze if funding is worth the outcome of a project
Cost-benefit analysis
Performance Measures
- KPI - Key performance indicator
- Balanced Scorecard
Performance measure for one specific goal
KPI - Key performance indicator
Multiple KPIs are displayed for the big picture
KPI dashboard
More than one chart is needed
What does a balanced scorecard measure?
CLIF - (customer, learning, internal process, financial performance) Are we meeting the strategy?
Advantage or Disadvantage of balanced score card? Requires time and effort to establish a meaningful scorecard
Disadvantage
Advantage or Disadvantage of balanced score card? Improves Internal and External Communication
Advantage
Balanced Scorecard: Difficult to maintain momentum
Disadvantage
Balanced Scorecard: Improves organizational alignment
Advantage
Balanced Scorecard: Links strategy to organizational results
Advantage
KPI: Data driven results make it easier to quantify performance
Advantage
KPI: Difficult to change once set up
Disadvantage
Homoscedasticity
A regression in which the variances in y for the values of x are equal or close to equal.
Variance is consistent.
Heteroscedasticity
A regression in which the variances in y for the values of x are not equal.
Variance is inconsistent.
Regression Analysis
A method of predicting sales based on finding a relationship between past sales and one or more independent variables, such as population or income.
Y=MX+B
M = Completions
X = Variable
B = Intercept
Multiple Regression: Autocorrelation
In a longitudinal design, the correlation of one variable with itself, measured at two different times.
Cluster Analysis
a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible.
Use in marketing to target customer gr