1. Introduction to Statistics

Data

Collections of observations, such as measurements, genders, or survey responses

Statistics

The science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data

Population

Complete collection of all measurements or data that are being considered

Census

Collection of data from every member of a population

Sample

Subcollection of members selected from a population

Voluntary Sample Response

Respondents themselves decide whether to be included or not

Steps for statistical and critical thinking:

1. Prepare2. Analyze3. Conclude

Statistical Significance

In a study if the likelihood of an event occurring by chance is 5% or less

Practical Significance

Common sense might suggest that the treatment or finding does not make enough of a difference to justify its use or to be practical.

Misleading Conclusions

When forming a conclusion based on a statistical analysis, we should make statements that are clear even to those who have no understanding of statistics and its terminology

Sample Data Reported Instead of Measured

When collecting data from people, it is better to take measurements yourself instead of asking subjects to report results.

Loaded Questions

If survey results are not worded carefully, the results of a study can be misleading

6 Potential Pitfalls When Analyzing Data:

- Misleading conclusions- Sample data reported instead of measured- Loaded questions- Order of questions- Nonresponse- Percentages

Order of Questions

Sometimes survey questions are unintentionally loaded by the order of the items being considered.

Nonresponse

When someone either refuses to respond or is unavailable

Percentages

Some studies cite misleading percentages

Parameter

Numerical measurement describing some characteristic of a population

Statistic

Numerical measurement describing some characteristic of a sample

Quantitative Data

Consists of numerous representing counts or measurements

Categorical Data

Consists of names or labels, not numbers that represent counts or measurements

Working with Quantitative Data

Distinguishing between discrete and continuous types

Discrete Data

Result when the data values are quantitative and the number of values is finite or "countable

Continuous Data

Result from infinitely many possible quantitative values, where the collection of values is not countable

4 Levels of Measurement:

- Nominal- Ordinal- Interval- Ratio

Nominal Level

Data that consist of names, levels, or categories only and the data cannot be arranged in some order

What is an example of a nominal level?a) Course grade (A,B,C,D,F)b) Yes, No, and Undecided c) Years 1000, 2000, 1776, and 1492d) Class times of 50 mins and 100 mins

b) Yes, No, and Undecided

Ordinal Level

Data arranged in some order but differences between data values either cannot be determined or are meaningless

What is an example of an ordinal level?a) Course grade (A,B,C,D,F)b) Yes, No, and Undecided c) Years 1000, 2000, 1776, and 1492d) Class times of 50 mins and 100 mins

a) Course Grade (A,B,C,D,F)

Interval Level

Data arranged in order, and the differences between data values can be found and are meaningful. No natural zero starting point

What is an example of an interval level?a) Course grade (A,B,C,D,F)b) Yes, No, and Undecided c) Years 1000, 2000, 1776, and 1492d) Class times of 50 mins and 100 mins

c) Years 1000, 2000, 1776, and 1492

Ratio Level

Data arranged in order, difference can be found and are meaningful and there is a natural zero starting point

What is an example of a ratio level?a) Course grade (A,B,C,D,F)b) Yes, No, and Undecided c) Class times of 50 mins and 100 minsd) Years 1000, 2000, 1776, and 1492

c) Class times of 50 mins and 100 mins

Big Data

Data sets so large & complex that their analysis is beyond the capabilities of traditional software tools

Data Science

Applications of statistics, computer science and software engineering, along with other relevant fields

2 Reasons for missing data:

- Missing completely at random: any data is just as likely to be missing as any other data value- Missing not at random: missing value is related to the reason its missing

Corrections for missing data:

- Delete Cases- Impute Missing Values

What sources do we obtain data from?

- Observational Studies - Experiments

Experiment

Apply some treatment & then proceed to observe its effects on the individuals

Observational Study

Observing and measuring specific characteristics without attempting to modify the subjects being studied

What are the 4 designs of experiments?

- Replication- Blinding- Double-Blind- Randomization

Replication

Repetition of an experiment on more than one individual (needs large sample size)

Blinding

Subject doesn't know where he or she is receiving a treatment or placebo

Double Blind

1. Subject doesn't know where he or she is receiving the treatment or placebo2. Experimenter doesn't know where he or she is administering the treatment or placebo

Randomization

Subjects are assigned to different groups through a process of random selection

Simple Random Sample

Every possible sample of the same size has the same chance of being selected

Systematic Sampling

Select some starting point and then select every kth element in the population

Convenience Sampling

Use data that are very easy to get

Stratified Sampling

Subdivide the population into at least two different subgroups that share the same characteristics, then draw a sample from each subgroup (or stratum).

Cluster Sampling

Divide the population into sections then randomly select some of those clusters and choose all the members from those selected clusters

Multistage Sampling

Collect data by using some combination of the basic sampling methods

Observational Studies

Observe and measure but do not modify

What are the types of observational studies?

- Cross-sectional - Retrospective (or case control)- Prospective (or longitudinal or cohort)

Confounding

Experimenter is not able to distinguish between the effects of different factors

Completely Randomized Experimental Design

Assign subjects to different treatment groups through a process of random selection

Randomized Block Design

A group of subjects that are similar, but blocks differ in ways that might affect the outcome of the experiment

Matched Pairs Design

Compare 2 treatment groups by using subjects matched in pairs that are somehow related or have similar characteristics

Rigorously Controlled Design

Assign subjects to different treatment groups so that given each treatment are similar in the ways that are important to the experiment

Sampling Error

Sample has been selected with a random method but there is a problem between a sample result and the true population result