C207 Study PP

Continuous data with unique zero point.

Ratio

Orders data at equal distance apart.

Interval

Place qualitative objects in some kind of order.

Ordinal

Identify, group, or categorize.

Nominal

There are two types of statistics (Analytics)

descriptive and inferential

Descriptive statistics are used to ______

Inform / Explanatory

Inferential statistics are used to ______

Predict / Trend

Name the 4 levels of measurement

NOIR:
Nominal
Ordinal
Interval
Ratio

Levels of Measurement

1. Nominal
2. Ordinal
3. Interval
4. Ratio

Types of Errors

1. Out-of-Range
2. Random Error-No Correlation
3. Omission Error - Distorted Results
4. Systematic Error - Skewed Results
5. Reduce/Minimize Errors

Outliers create this type of error

out of range

Unpredictable error

Random Error - No correlation

Error may occur from missing data.
(Example: Space not filled in)

Omission Error - Distorted results

This error repeats itself

Systematic Error - Skewed results

What is the process of quality control?

Reduce/ minimize errors

Types of Studies

- Experimental
- Observations
- Blind
- Treatments
- Double Blind

All variable measurements and manipulations are under the researcher's control

Experimental study

Participants are not told if they are in the treatment group or control group

Blind Study

Used when impractical or impossible to control the conditions of the study

Observational study

The procedure the researcher applies to each subject

Treatments

Neither the treatment allocator nor the participants know who is in the treatment group or control group

Double blind study

Types of Bias

- Information
- Expected Monetary Value (EMV) Analysis
- Outliers (can be included or excluded)
- Measurement
- Conscious

Questions favor an outcome or the interviewer ask questions that favor an outcome.

Information

The average outcome (payoff) when the future includes scenarios that may or may not happen

Expected Monetary Value (EMV) Analysis

Observation points that are distant from other observations.

Outliers
Note: Can be included or excluded in analysis (causes skewness)

Bias that occurs from not selecting a random sample

Measurement bias

Bias introduced because respondents believe it will be beneficial if selected.

Conscious bias

Descriptive Statistic Measurements

- Median
- Z-Score = (Value-Mean)/Std Deviation
- Variance (Standard Deviation)
- Standard Deviation
- Mean

Middle score for a set of data

Median
Note: Skewness does not affect the median

Tells us the number of standard deviations a data point is from the mean.

Z-score = (Value - Mean) / Std Deviation

If the average is the same for two groups, what will determine their difference?

Variance (Standard Deviation)

The spread of data in a sample. How far the data points are from the mean.

Standard deviation

Measure of central tendency that is influenced by the size of the values in a dataset.

Mean
Note: Skewness does affect the mean

Percentile Scores

- Quartiles
- IQR: Inter-quartile range
- Box Plot

Each of the four quartile groups a population can be divided

Quartiles

Measures the difference between the third and first quartile

IQR: Inter-quartile range
Note: Must be ordered in lowest to highest value

Used to study the composition of a data set and examine the distribution

Box Plot

Rules of Probability

1. The probability of any event must be between 0 and 1, inclusive. 0 ? P(E) ? 1.
2. The sum of the probabilities of all outcomes must equal 1.
3. If E and F are disjoint events, then P(E or F) = P(E) + P(F). If E and F are not disjoint events, then P(E o

There are six toll booths to enter the highway. What probability does each toll booth worker have of getting the next customer?

1 customer and 6 booths = 1/6 or 16.7%

The order you pick you sample in does not matter

Combination
Picking employees for a shift. Order doesn't matter.

When given P(A) given P(B), you can use this to find the P(B) given P(A)

Bayes Theorem
You must know P(A), P(B), P(A) given B

Apply this rule when looking for two events occurring (AND)

Multiplication

Use this rule when looking for one or the other event happening (OR)

Addition

Statistical Tests

- Linear programming
- Linear regression
- Multiple regression
- Correlation coefficient
- R2 (R-Square)

A technique for minimize total cost or maximize profit based on constraints

Linear programming

A technique using a single independent variable to predict a single dependent variable

Linear regression

A technique using more than one independent variable to predict a single dependent variable

Multiple regression

Measures the strength of a linear relationship

Correlation coefficient

Measures the goodness of fit in a regression analysis

R2 (R-Square)

Time Series

- Trend
- Irregularity
- Cyclicality
- Seasonality

A simple regression using time as the independent variable

Time series

A general slope upward or downward over a period of time

Trend

Unforeseen circumstances causing random deviations

irregularity

Repetition in up and down patterns

Cyclicality

Regular pattern within a single year

seasonality

Scatterplots and Correlation

Two most common ways of describing and quantifying association

Distributions and confidence levels

- Cumulative distribution
- Probability Distribution
- Normal Distribution

Represents the probability that a variable falls with a certain range

Cumulative distribution

A list of all the different probabilities of each outcome that can occur

Probability Distribution

Z-score for 99% level of confidence

2.576

Z-score for 95% level of confidence

1.960

Measures of central tendency are approximately equal (Mean and Median)

Normal Distribution

Analysis Techniques

- ANOVA
- F-value
(must be higher than critical value to reject the null)
- T-value
(must be higher than critical value to reject the null)
- Zero
- 1 or -1

Used to compare the mean of three or more groups

ANOVA

ANOVA uses this test statistics

F-Value (must be higher than the critical value to reject the null)

T-test uses this test statistic

T-Value (must be higher than the critical value to reject the null)

A correlation is weak if the coefficient is close to ____

Zero

A correlation is strong if the coefficient is close to ____

1 or -1

Seven Basic Quality Tools

These seven tools are used in quality planning and in quality control and their keywords:
- Run Chart (Over time)
- Control Chart (In limits)
- Cause and Effect (Process identification [why])
- Flow Chart (Process identification [Where])
- Check Sheet (Co

Illustrates performance measurements over a period of time

Run Chart

Illustrates limits or constraints a process should not exceed

Control Chart

Assists in brainstorming issues that are causing a problem

Cause and Effect Diagram
Not measurements!

Visual tool to understand a process

flowchart

Easy tool to collect data to create other charts

check sheet

Graphical display of a data set with one bar for each category

Histogram and Pareto

Graphical display of data set centered

histogram

Graphical display of data set in highest to lowest order

Pareto

Used for potential relationships and correlation between variables

Scatter diagram

Can the seven quality tools be used independently?

Yes

What percent of quality problems does Ishikawa claim the seven tools can solve?

90% - 95%

Diagram demonstrating all of the elements that can influence a process before it starts.

SIPOC (Supplier - Input - Process - Output - Customer)

Manufacturing approach to improving processes.

Six Sigma

In manufacturing, statistics is used for:

quality control

Plan - Do - Study - Act
Which step is a response to analytical results?

ACT

PDSA

Plan, Do, Study, Act

Shows whether a result meets a requirement or not

attribute

Shows how well a result meets the requirement

variable

Variations accepted as the normal part of the process
Ex:
On a chart, you are allowed to be within a certain range.

Common cause variation

Variation from an abnormality causing large discrepancy in results
Ex:
Outside of the allowed range on a chart.

Special cause variation

Model of designing, analyzing, and scoring tests

IRT: Item Response Theory

How does the government differ than private sector cost-benefit analysis?

Government benefits aren't always money. Could be flood prevention or welfare.

Compares one individual's performance to other individuals

Norm Referenced

Compare individual's performance to a standard score (Example: Cut Score 64%)

Criterion referenced

Management strategy that uses results as the central measurement of performance

RBM: Results Based Management

Big Data & Health Care Performance Measures

- Very large data sets
- Prevalence
- Incidence
- Criterion referenced
- Cost-benefit analysis

What is Big Data?

Very large data sets

Used to count ALL of the existing cases in a disease.

Prevalence

Used to count only the NEW cases of a disease.

Incidence (Incident rate)

Compare individual's performance to a standard score (Example: Cut Score 64%)

Criterion referenced

Used to analyze if funding is worth the outcome of a project

Cost-benefit analysis

Performance Measures

- KPI - Key performance indicator
- Balanced Scorecard

Performance measure for one specific goal

KPI - Key performance indicator

Multiple KPIs are displayed for the big picture

KPI dashboard
More than one chart is needed

What does a balanced scorecard measure?

CLIF - (customer, learning, internal process, financial performance) Are we meeting the strategy?

Advantage or Disadvantage of balanced score card? Requires time and effort to establish a meaningful scorecard

Disadvantage

Advantage or Disadvantage of balanced score card? Improves Internal and External Communication

Advantage

Balanced Scorecard: Difficult to maintain momentum

Disadvantage

Balanced Scorecard: Improves organizational alignment

Advantage

Balanced Scorecard: Links strategy to organizational results

Advantage

KPI: Data driven results make it easier to quantify performance

Advantage

KPI: Difficult to change once set up

Disadvantage

Homoscedasticity

A regression in which the variances in y for the values of x are equal or close to equal.
Variance is consistent.

Heteroscedasticity

A regression in which the variances in y for the values of x are not equal.
Variance is inconsistent.

Regression Analysis

A method of predicting sales based on finding a relationship between past sales and one or more independent variables, such as population or income.
Y=MX+B
M = Completions
X = Variable
B = Intercept

Multiple Regression: Autocorrelation

In a longitudinal design, the correlation of one variable with itself, measured at two different times.

Cluster Analysis

a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible.
Use in marketing to target customer gr