Statistics
art and science of collecting, analyzing, presenting, and interpreting data
data
facts and figures collected, analyzed, and summarized for presentation and interpretation
Data set
all the data collected in particular study
elements
entities on which data are collected
variable
characteristics of interest for the elements
observation
set of measurements obtained for a particular element
nominal scale
scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. Nominal data may be non numeric or numeric.
Ordinal scale
scale of measurement for a variable if the data exhibit the properties of nominal data and the order or rank of the data is meaningful. ordinal data may be non numeric or numeric
interval scale
scale of measurement for a variable if the data demonstrate the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric.
ratio scale
scale of measurement for a variable if the data demonstrate all the properties of interval data and the ratio of two values is meaningful. Ratio data are always numeric.
Categorical data
labels or names used to identify an attribute of each element. Categorical data us either the nominal or ordinal scale of measurement and may be non numeric or numeric
Quantitative data
numeric values that indicate how much or how many of something. Obtained using either interval or ratio sale of measurement
categorical variable
variable with categorical data
quantitative variable
variable with quantitative data
cross-sectional data
data collected at the same or approximately the same point in time
Time series data
data collected over several time periods
descriptive statistics
tabular, graphical, and numerical summaries of data
population
set of all element of interest in a particular study
sample
subset of the population
census
survey to collect data on the entire population
sample survey
survey to collect data on a sample
statistical inference
process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population
data mining
process of using procedures from statistics and computer science to extract useful information from extremely large databases
frequency distribution
tabular summary of data showing the number of data values in each of several non overlapping classes
relative frequency distribution
tabular summary of data showing the fraction or proportion of data values in each of several non overlapping classes
percent frequency distribution
tabular summary of data showing the percentage of data values in each of several non overlapping classes
bar chart
graphical device for depicting qualitative data that have been summarized in a frequency, relative frequency, or percent frequency distribution
pie chart
graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class
class midpoint
value halfway between the lower and upper class limits
dot plot
graphical device that summarizes data by the number of dots above each data value on the horizontal axis
histogram
graphical presentation of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the class intervals on the horizontal axis and the frequencies, relative frequencies, or per
cumulative frequency distribution
tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each class
cumulative relative frequency distribution
tabular summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class
cumulative percent frequency distribution
tabular summary of quantitative data showing the percent of data values that are less than or equal to the upper class limit of each class
ogive
graph of a cumulative distribution
exploratory data analysis
methods that use simple arithmetic and easy-to-draw graphs to summarize data quickly
stem-and-leaf display
exploratory data analysis technique that simultaneously rank orders quantitative data and provides insight about the shape of the distribution
crosstabulation
tabular summary of data for two variables. the classes for one variable are represented by the rows; the classes for the other variable are represented by the columns
simpson's paradox
conclusions drawn from two or more separate crosstabulations that can be reversed when the data are aggregated into a single crosstabulation
scatter diagram
graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis
trendline
line that provides an approximation of the relationship between two variables.
sample statistic
numerical value used as a summary measure for a sample
population parameter
numerical value used as a summary measure for a population
point estimator
sample statistic, such as x bar, s2, and s, when used to estimate the corresponding population parameter
mean
measure of central location computed by summing the data values and dividing by the number of observations
median
measure of central location provided by the value in the middle when the data are arranged in ascending order
mode
measure of location, defined as the value that occurs with greatest frequency
percentile
value such that at least p percent of the observations are less than or equal to this value and at least (100-p) percent of the observations are greater than or equal to this value.
quartiles
specific percentiles
range
the difference between the largest and smallest data values