STATS FINAL | Statistics

what are the types of numeric variables?

- continuous: arise from a measurement e.g. age
- discreet: arise from counting e.g. number of students in class

what are the types of categorical variables?

- nominal: have no natural order e.g. group membership
- ordinal: have a natural order e.g. education level

what are independent variables?

describe the treatments

what are dependent variables?

the variable of interest

why are comparative experiments desirable in RCT's?

eliminate the placebo effect

what is the purpose of randomisation?

helps remove possible bias in a comparative experiment

the choice of sample size depends on factors such as...

- the size of the effect you are trying to detect
- the amount of residual variability
- the design of the experiment, including how samples are selected

what are the advantages of sampling?

- more economical
- time efficient
- can be more accurate because there is greater control over the measurements and procedures used

how do you ensure you achieve a representative sample?

- clearly define the target population which is determined according to the goals of the study
- specify inclusion and exclusion criteria

when can sampling bias occur?

when members of a sample over or under represent attributes of the population that are related to areas being studied

generalisation is done on the assumption that...

the sample mirrors the characteristics, heterogeneity and variations of the target population

what are the methods for probability sampling?

- random
- systematic
- stratified random
- disproportionate
- cluster

what are the methods for non-probability sampling?

- convenience sampling
- quota sampling
- purposive sampling
- snowball sampling

what is cluster sampling? (P)

- involves the successive random sampling of a series of units in a population
- convenient and efficient but may compound the effects of sampling bias

what is disproportionate sampling? (P)

- if the strata in a population are of substantially unequal size then stratified random sampling may give inadequate sample size for comparison
- still probability sampling but the probabilities of subjects being selected are not equal

what is stratified random sampling? (P)

- the population is split into groups of similar individuals (strata) from which a random sample is drawn
- strata are chosen to correspond to subgroups thought to be important for proper representation of the pop

what is systematic sampling? (P)

similar to random sampling but every nth subject from a population list is chosen instead

what is random sampling? (P)

- purest form of probability sampling with each member of population having an equal chance of selection
- subjects are randomly selected from a population
- reduces risk of systematic bias
- not always practical to implement

what are the disadvantages of single subject designs?

- difficult to conclude that the treatment alone resulted in any of the differences as other factors may change over time
- external validity and ability to generalise is weak --> not that useful for making broader inferences about the effects of an inter

what are the advantages of a single subject design?

useful for making decisions about particular patients

what are the advantages of surveys?

- efficiency and convenience
- can reduce bias
- respondents may be more candid in responses

what is a disadvantages of questionnaires?

potential for misunderstanding questions

what are two important ways bias is assessed?

- pilot testing
- debriefing with respondents

for exploring the distribution of a single numeric variable we are interested in describing...

- location
- spread
- shape
- deviations from overall pattern

what is an advantage and disadvantage of a histogram?

- useful for visualising large numbers of observations
- difficult to compare more than two groups

what are the advantages of a box plot?

- gives a summary of location, spread and shape
- good for comparing multiple distributions

what do we want to describe in a scatter plot?

- direction
- linearity
- strength

the n deviations are not n independent pieces of information but only n-1 pieces of information, why?

- to measure the spread of our values, we can measure how far away each value is from the mean (ie. value - mean)
- if we add all these values of spread together it will equal 0
- thus, given n-1 of the n deviations we can always deduce the value of the l

what is the 68-95-99.7 rule?

in any normal distribution:
- within 1 standard deviation of the mean are 68% of values
- within 2 standard deviations of the mean are 95% of values
- within 3 standard deviations of the mean are 99.7% of values

we can view the sample mean of a particular data set as just one outcome of the process of taking the mean of a random sample. If we repeat this random process again and again we get a different outcome (mean) each time, what conclusions can be drawn from

- on average the sample mean gives the population mean
- the variability in this estimate gets smaller as the sample size increases

The sample mean is an unbiased estimator of the population mean and the spread gets smaller as n increases, what does this imply?

the sample mean is a more precise estimator of the population mean for larger samples

the shape of our distribution of data depends on sample size, therefore if we have a larger sample we can expect the distribution of data to be...

more symmetric

what is the central limit theorem?

- if the population has a normal distribution then the sample mean is normal for any sample size
- if the population is not normal, then the sample mean is still approximately normal and gets more normal as the sample size increases

what is the 95% confidence interval for the population mean?

- we are 95% confident that the population mean is +/- 2 standard deviations from the hypothesised value

what is the standard error?

- in practice, we do not know the population standard deviation so we have to estimate it using the sample standard deviation
- the estimated standard deviation of a statistic is known as its standard error

why do we use a t distribution (like in SPSS) instead of a normal distribution?

- using random sample standard deviation instead of the fixed population standard deviation adds more uncertainty to our estimates
- instead of a normal distribution, we use a t distribution based on the degrees of freedom of our estimates
- t distributio

what is the difference between standard deviation and standard error?

- standard deviation quantifies variability in the population
- standard error quantifies how precisely you know the population mean

what is a type I error?

if we reject the null hypothesis when it is in fact true

what is a type II error?

if we retain the null hypothesis when it is in fact false

how can power be improved?

- increasing effect size
- decreasing the variability
- increasing the sample size
- increasing the significance threshold

what is the power of an experiment?

- the probability of detecting an effect when there is indeed an effect
- estimating the signal to noise ratio (mean:standard deviation)

how does an independent samples t-test work?

- takes the difference between the two sample means and compares it to the standard error of difference
- this gives the t statistic, the number of standard errors that the difference between our groups is away from a hypothesised difference of 0

what is a parametric test?

is based on estimating parameters e.g. population means, from the sample

what is a non-parametric test?

looks at some other comparison between groups e.g. comparing the ranks of values instead of the values themselves

how does an ANOVA work?

- start by measuring the total variability in the response
- then look at the variability within each group
- if the within group variance is less than the total variability it suggests that knowing which group a person belonged to has given some informat

what is the f statistic?

measures how different the groups are, relative to their variability

what is R squared

variability explained by the model / total variability

why is the assumption of normal variability particularly important for small sample sizes?

if the population is not normally distributed then the type I error rate might be inflated (reflects the central limit theorem)

what is the relationship between residuals and normality?

our assumption of normal variability is equivalently the assumption that the residuals have a normal distribution

what is the one-way ANOVA model?

there is a mean response for each group with constant normal variability about that mean

what is the linear regression model?

the mean response is given by a straight line with constant normal variability about that line

what if normality assumptions are not satisfied and the sample size is large?

- may not have much of an effect on the inference about the association
- will undermine predictions made from the linear model
- we should consider transforming the data in some way to improve the fit

what is the "within group" row in an ANOVA table?

- n-k - k = "between groups

what is the "between groups" row in an ANOVA table?

a measure of variability between the k group means

what is "R"?

the pearson correlation coefficient

how do regression models work?

use a straight line to estimate the mean response based on a predictor

why is regression important?

it allows us to model the effects of more than one variable on a response

what is inter-rater reliability?

the degree to which ratings given by different observers agree

what is the intra-rater reliability?

the degree to which ratings given by the same observer on different occasions agree

what is cohen's kappa?

adjusting the raw percentage of agreement (or whatever the variable is) by the chance rate of agreement ie. the proportion of times there is agreement not by chance

kappa attains a value of 1 if....

there is perfect agreement

kappa can be negative, but a value of 0 roughly suggests....

no agreement

for any disagreements in ratings, we can additionally measure how much the scores differed, what is this called?

weighted kappa statistic where the weights indicate the seriousness of the disagrement

what does chronbach's alpha test?

whether items in a particular scale are internally consistent

what is chronbach's alpha based on?

the correlations between items in a scale

if the items are completely independent we would expect chronbach's alpha to be...

0

what type of plot can you use to check equal variance?

residuals plot (equal variance = funnel/megaphone shape)

what is sampling variability

- the process whereby statistics would give different results if the random sampling process was repeated
- we thus need to account for sampling variability when making any conclusions from our data

what is the role of the central limit theorem on statistical inference?

- t-test and ANOVA are based on the assumptions of normality
- we can still use these methods even if the data is slightly skewed, particularly for large sample sizes

which is more powerful? Parametric or non-parametric? Why?

parametric tests are often more powerful than their non-parametric counter parts and provide direct estimates for effects