BVMS1


Types of statistics


Descriptive stats (summary of the data, represent data in an easily understandable way/graphs, charts, tables, average, range)inferential stats (sample vs. pop., experimental design, hypothesis testing)


Sample vs population


samples should be:representative and unbiases (males & females, all ages)every type of subject should have the same chance of being included(normal distribution/random sampling)


Types of Data


Categoricalcontinuousdiscrete


Categorical Data


nominal (unordered) eg dog breedsordinal (ordered) eg body condition score of cattle


Continuous data


Any positive value theoretically possibleeg: weight, height


discrete data


can only be integer values (whole numbers)eg: numbers of piglets in a litter


Bar Charts


Good for frequency (categorical or discrete)


Scatterplot


used for correlation and progression (regression)


Descriptive Statistics


Data is collected so that we can obtain INFORMATION about a certain topicwhen only a few observations are made, it might be easy to see a potential relationshipas more data is collected it's more difficult to obtain an overall picture


Histogram


Quantitive data/distribution


Stem and leaf


(not often used) number of observation above or below the median....??


Boxplot


single and double to compairline represents median value


Numerical measures


numerical measures are used to summarise the position, spread and shape of a data distributionused to describe the data so that we have a general idea of the data that we have and the population that it might have come from or represent


Measures of central tendency


"averages" or the middle of dataMean= sum of all observations/number of observationsmedian= middle observation (half smaller, half larger)mode= value that occurs most frequently


Measures of spread


Range- difference between largest and smallest observationstandard deviation- measure of spread about the mean (SD= Square root of variance)


Shape (skewness/kurtosis/etc)


Various shape statistics exist:Skewness (is it symmetrical or not)Kurtosis (how concentrated is the data around the mean)(and more)


Probablility Theory


Generally very poorly understooddescribes outcomes that depend on chanceeg rolling a dice, tossing a coin, infected with disease, pups in a litter, etc.can almost never predict an outcome w/ total accuracy, but can describe whay MIGHT happen, or the probability of different outcomes


Probability distributions


The probability of an outcome given that we know what happens in the 'system' (variability, predict the future)What we believe about the 'system' given that we know the outcome (uncertanty, estimating the true population parameters)


Normal distribution


(Gaussian, Bell curve)Described by mean, sddata can be any continuous valuesymmetrical distributionmean=median=modeex: birth weights, heights, live weights gains, body temperatures, serum biochemistry parameters


Poisson distribution


used for count data (integer)described by mean onlyasymmetrical distributionmean does not equal median does not equal modeexamples: pups in a litter, cars on the street, earthquaks in a year


Binomial distribution


used for binary outcomes (yes/no, pass/fail, m/f, dies/survvives)described by the probability of a success at each trial, and the number of trialsex: number of heads out of 10 tosses of a coin, number of female calves from sexed semen, number of you will pass exams


Hypothesis testing


used for research scientists:does drug A kill mice faster than drug Bdo a greater proportion of smokers than non-smokers get lung cancer?(also relevant to vets)5 Steps!!


5 steps to Hypothesis testing


Think of a question you want to askput the question into a testable formatcollect the dataapply the correct statistical testinterpret the results of the test


Generating a hypothesis to test:


what do we want to find?how many groups are we comparing?typically a simple question with a yes or no answerex: are these 2 groups of calves growing at the same rate?, did pyoderma cases given synulox recover at the same speed as ampicillin?


The 'NULL' hypothesis (and alternative hypothesis)


The baseline belief- there is NO difference in groups/drugs (denoted H0)the alternative hypothesis:opposite of the baseline belief- there IS a difference in groups/drugs (denoted H1)


Hypothesis testing


Goal is to provide evidence that the 'Null' hypothesis is WRONG!- there is a difference between groups/drugsBUT! we have to account for the effects of outcomes being uncertain the difference between the groups/drugs is more than would be expected by chance


Rejecting the Null hypothesis


It is always possible that the difference between 2 sets of observations is entirely chance! (that the pops. are really the same even though the samples look diff.)this becomes less and less likely as the magnitude of the differences increase and number of observations increases


Confidence intervals


use confidence intervals to look at the data in a more formal waydo the confidence intervals for the parameter of interest in each group overlap (95%- 2.5 high and 2.5%low)The more data we have the small the confidence intervals become


Rejecting the null hypothesis with Confidence intervals


the amount of overlap in confidence intervals reflects the probability (p-value) with which we reject the null hypothesisif there is LITTLE overlap, we reject Hohow little is given by the p-value (0.05)this makes no comment at all about the magnitude (or biological impact) of the difference


Failing to reject the null hypothesis


if there is not enough evidence to prove the groups are different we cannot reject the null hypothesis(this does not necessarily mean that there really is no difference, only we couldn't find any difference in the samples obtained)


Statistical signifiance DOES NOT EQUAL biological relevance


Remember this!


Can compare means by?


Using a t-test


Camparison of means...?


can compare one mean or two or more than two


Compare ONE mean (with a fixed number)


Confidence interval approachlook at sample mean, size and sd. 95% confidence interval... does the fixed number overlap the conficence interval?Null: population mean = XXXalt: Population does not = XXX


Significance Testing


looks at how far the observed sample mean is from the population meanif P value is lower thatn 0.05 than it is significant (reject null) if greater than 0.05 than it is NOT significatn and accept null


comparison of TWO means (with each other)


T-test95% CI for difference between meanstake mean of each groupnull: means are the samealt: means are not the sameP > 0.05 accept nullP< 0.05 reject null


Paired Values


Pre and post treatment (somatic cell count- sub clinical mastitis, createnine kinase- exertional rhabdomyolysis)before on or after a certain date (hormone levels for oestrus detection)compare the same thing at 2 different times in the same animal


Paired T-tests


(example)weight before diet and weight at 3 months on diet95% CI for mean differenceT-test of mean difference (=0 versus not = 0) NOT independent!!! CI becomes tighter


Comparison of means (more than 2 means)


comparison of means may be extended into 3 groupsmore complex- takes into account the variance between and w/in groupsex: are the daily grouth rates of pigs in 3 rearing units different?


ANOVA (analysis of Variance)


null: all means are the samealt: at least one mean is differentF (variance ratio)P < 0.05 = evidence of diff. between population means (but which ones!!??) Must compare each...


Compare Ranks


use a non-parametric equivalent of a t-test (or similar)use if data is NOT normally distributed


Non-parametric tests


compare rankscorresponding confidence intervals is for difference in population medianstests work by ranking the scores and then computing the average rank and then testing:Null: there is no diff. between sum of ranks of groupsAlt: a deff. exists between sum of ranks of groups


Parametric vs. non-parametric (equivalents)


Parametric: Non-Parametric1-sample t-test----------1 sample Wilcoxon signed rank test2-sample t-test----------Mann Whitney U test/Wilcoxon rank sum testPaired t-test-------------Wilcoxon signed rank testOne Way ANOVA-----Kruskal Wallis Test


Comparison of proportions


use a Chi-square test (or equivalent)these are used for categorical datanull: proportion of infection is the same for draped and undraped cases OR Null: That drape use and infection are independentalt: proportion of infection is DIFFERENT between draped and undraped caseseg:252 surgical colic cases102 used a drape, 150 did not73 post-op infectionsis there a difference in proportion of infection in those that used a drape compared to those who did not?Chi-square test statistic is then based on the differences between observed and expected values(remember this is a shown association/relationship- DOES NOT SAY: leaving undraped WILL cause infection/bias....)