how to calculate the group width of a histogram
(upper limit of last group-lower limit of first group) / # of groups
categorical variables involve
counting
a graphical depiction of a frequency distribution for numerical data in the form of a column chart is called a
histogram
the _______________ is the observation that occurs most frequency
mode
which is not the scatter plot characteristics:
-form
-direction
-strength
-angle
angle
which is not a difference between a histogram and a bar graph?
a) the frequencies on a bar graph represent the counts from categories while the frequencies on a histogram represent the counts of quantitative data values grouped in the classes
b) one axis
d
which is an example of a measure of dispersion?
a) median
b) mode
c) midrange
d) variance
d. variance
the difference between the first and third quartiles is referred to as the
interquartile range
________________ is an ordered sequence of values of a single quantitative variable measured at regular intervals over time
time series
which of the following is not a data ethics principle?
a)ownership
b)privacy
c)responsibility
d)consent
c) responsibility
which of the following describes standard deviation?
a) the average of the greatest and least values in the data set
b) the square root of the variance
c) the difference between the first and third quartiles of a date set
d) the average of the squared dev
b) the square of the variance
a z-score of 1 means that
the observation if 1.0 standard deviation to the right of the mean
when the mean and median are equal or approximately equal, the distribution is described as
symmetrical
which is not true for a median?
a) when you add all the numbers it should equal the median
b) for an even number of observations, the median is the mean of the 2 middle numbers
c) for an odd number of observations, the median is the middle of the sorted n
a
mode describes a dataset as (which of the following does not belong to mode dataset)
a) bi-modal
b) multi-modal
c) unimodal
d) three modal
d) three modal
the standard deviation, variance, and range belong to
measures of variation
which of the following is not true for normal curve distribution?
a) approximately 88% will lie within some standard deviations from the mean
b) approximately 98% will lie within some standard deviations from the mean
c) approximately 95% will lie within
a
________________ is the difference between the largest value and the smallest value
range
Q1=25, Q2=50, Q3=75, Q4=100
What is the value of the IQR?
Q3 - Q1
75-25 = 50
a quartile is a measure of relative standing in which a population is divided into ___________ equal groups of data values
4
`___________________ are extremely high or low observations
outliers
which of the following is not a descriptive output?
a) range
b) analytics
c) standard error
d) mean
b) analytics
z-score =
(data value - mean) / standard deviation
a student took the exam and received a score in the 80 percentile. the percentile means that 80 percent
this percentile means that 80% of all students who took the test with that student made less than he or she
_______________ is the interpretation of historical data to better understand changes that have occurred in a business
descriptive analytics
____________________ displays the distribution of one categorical variable in rows and another categorical variable in columns and shows the relationship between 2 variables using frequencies or probabalitiles
contingency table
a ____________________ is a 2-dimensional graph of 2 quantitative variables where one variable, usually the independent variable, is plotted on the horizontal axis and the dependent variable is on the vertical axis
scatter plot
A _________________ displays the distribution of a categorical variable, showing the counts for each category next to each other for comparison purposes
bar chart
A ________________ illustrates all the categories of a variable in a circle, with each category shown as a proportional sector of the circle
pie chart
measures to ensure a person's data is being protected and kept private
privacy
individuals providing data provide informed and explicitly expressed consent of what personal data they provide and how the data will be collected, used, and reported
consent
when a dataset has excessive values, known as outliers, on either side of the distribution, the dataset is
skewed
what is the business analytics process
1. define the problem
2. manage the data
3. give descriptive analytics
4. give predictive analytics
5. five prescriptive analytics
6. make recommendations
7. ask next question
has numerical value
has units
can be mathematically anallyzed
quantitative data
quantitative data can be either
discrete or continuous
qualitative data can be either
structured or unstructured
can be categorized or counted
qualitative data
examples of structured date
phone number
zip code
address
date
examples of unstructured data
photos
videos
blogs
sensor data
using data to explain what has happened up to the present time
descriptive analytics
data accuracy, data integrity, and data completeness are examples of
data quality
data collected from a fitbit is
sensor data
__________________ data is data collected from individual's activity on facebook, twitter, and linkedin
social media
is the liklelihood that an outcome occurs
probability
is the process that results in an outcome
experiment
which of the following characterizes a random variable having 2 possible outcomes, each with a constant probability of occurence
Bernoulli distribution
an experiment is a process that
results in an outcome
a trial is a collection of
independent experiments
the probability of an event is always between
0 and 1
probability is a number that measures
the likelihood that an event will occur
which of the following is not the property of a normal distribution?
a) symmetric
b) range of x is unbounded
c) mean = median = mode
d) range of y is unbounded
d
what are the 2 parameters of normal distribution?
mean and standard deviation
normal distribution is a _____________________ curve
bell-shaped
suppose x is normal with mean of 18.0 seconds and SD of 5.0 seconds. Find P(x<18.6)
0.5478
what is the probability of data is between .84 and 1.95`
.1749
what is true of normal distributions
mean = median = mode
consider an event X comprised of one outcome whose probability is 35%. compute the probability of the complement of the event
.65
mode < median < mean
right skewed
how to find conditional probabilities
P(A and B) = P(A) ? P(B|A)
when to use the poisson distribution?
to determine the probability a specific event will occur over a specified interval of time
R squared is called
coefficient of determination
the value of R square is between
0 and 1
when the value of R square is equal to 0, what kind of relationship is this?
no linear relationship
simple linear regression has how many independent variables?
1
the correlation coefficient is scaled between
-1 and 1
a correlation of 0 indicates that
the 2 variables have no linear relationship to each other
what is a positive linear relationship
as x increases, y increases
____________________ is a measure of the linear relationship between 2 variables, X and Y, which does not depend on units of measurement
correlation coefficient
_______________________ is a technique which identifies the strength of association between pairs of products purchased together and identifies patterns of co-occurence
market basket analysis
which of the following is not the key metrics for market basket analysis?
a) support
b) lift
c) shopping
d) confidence
c) shopping
market basket analysis creates
if-then rules
_________________ is a technique that identifies co-occurence relationships among activities performed by specific individuals or groups
affinity analysis
a lift greater than 1
suggests that the purchase of product A has increased the probability that the purchase of product B will occur on this transaction
the binomial distribution
models n independent replications of a Bernoulli experiment, each with a probability p of success