Statistics (Ch. 1-3)

Statistics

art and science of collecting, analyzing, presenting, and interpreting data

data

facts and figures collected, analyzed, and summarized for presentation and interpretation

Data set

all the data collected in particular study

elements

entities on which data are collected

variable

characteristics of interest for the elements

observation

set of measurements obtained for a particular element

nominal scale

scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. Nominal data may be non numeric or numeric.

Ordinal scale

scale of measurement for a variable if the data exhibit the properties of nominal data and the order or rank of the data is meaningful. ordinal data may be non numeric or numeric

interval scale

scale of measurement for a variable if the data demonstrate the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric.

ratio scale

scale of measurement for a variable if the data demonstrate all the properties of interval data and the ratio of two values is meaningful. Ratio data are always numeric.

Categorical data

labels or names used to identify an attribute of each element. Categorical data us either the nominal or ordinal scale of measurement and may be non numeric or numeric

Quantitative data

numeric values that indicate how much or how many of something. Obtained using either interval or ratio sale of measurement

categorical variable

variable with categorical data

quantitative variable

variable with quantitative data

cross-sectional data

data collected at the same or approximately the same point in time

Time series data

data collected over several time periods

descriptive statistics

tabular, graphical, and numerical summaries of data

population

set of all element of interest in a particular study

sample

subset of the population

census

survey to collect data on the entire population

sample survey

survey to collect data on a sample

statistical inference

process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population

data mining

process of using procedures from statistics and computer science to extract useful information from extremely large databases

frequency distribution

tabular summary of data showing the number of data values in each of several non overlapping classes

relative frequency distribution

tabular summary of data showing the fraction or proportion of data values in each of several non overlapping classes

percent frequency distribution

tabular summary of data showing the percentage of data values in each of several non overlapping classes

bar chart

graphical device for depicting qualitative data that have been summarized in a frequency, relative frequency, or percent frequency distribution

pie chart

graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class

class midpoint

value halfway between the lower and upper class limits

dot plot

graphical device that summarizes data by the number of dots above each data value on the horizontal axis

histogram

graphical presentation of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the class intervals on the horizontal axis and the frequencies, relative frequencies, or per

cumulative frequency distribution

tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each class

cumulative relative frequency distribution

tabular summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class

cumulative percent frequency distribution

tabular summary of quantitative data showing the percent of data values that are less than or equal to the upper class limit of each class

ogive

graph of a cumulative distribution

exploratory data analysis

methods that use simple arithmetic and easy-to-draw graphs to summarize data quickly

stem-and-leaf display

exploratory data analysis technique that simultaneously rank orders quantitative data and provides insight about the shape of the distribution

crosstabulation

tabular summary of data for two variables. the classes for one variable are represented by the rows; the classes for the other variable are represented by the columns

simpson's paradox

conclusions drawn from two or more separate crosstabulations that can be reversed when the data are aggregated into a single crosstabulation

scatter diagram

graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis

trendline

line that provides an approximation of the relationship between two variables.

sample statistic

numerical value used as a summary measure for a sample

population parameter

numerical value used as a summary measure for a population

point estimator

sample statistic, such as x bar, s2, and s, when used to estimate the corresponding population parameter

mean

measure of central location computed by summing the data values and dividing by the number of observations

median

measure of central location provided by the value in the middle when the data are arranged in ascending order

mode

measure of location, defined as the value that occurs with greatest frequency

percentile

value such that at least p percent of the observations are less than or equal to this value and at least (100-p) percent of the observations are greater than or equal to this value.

quartiles

specific percentiles

range

the difference between the largest and smallest data values