Business Analytics Final

how to calculate the group width of a histogram

(upper limit of last group-lower limit of first group) / # of groups

categorical variables involve

counting

a graphical depiction of a frequency distribution for numerical data in the form of a column chart is called a

histogram

the _______________ is the observation that occurs most frequency

mode

which is not the scatter plot characteristics:
-form
-direction
-strength
-angle

angle

which is not a difference between a histogram and a bar graph?
a) the frequencies on a bar graph represent the counts from categories while the frequencies on a histogram represent the counts of quantitative data values grouped in the classes
b) one axis

d

which is an example of a measure of dispersion?
a) median
b) mode
c) midrange
d) variance

d. variance

the difference between the first and third quartiles is referred to as the

interquartile range

________________ is an ordered sequence of values of a single quantitative variable measured at regular intervals over time

time series

which of the following is not a data ethics principle?
a)ownership
b)privacy
c)responsibility
d)consent

c) responsibility

which of the following describes standard deviation?
a) the average of the greatest and least values in the data set
b) the square root of the variance
c) the difference between the first and third quartiles of a date set
d) the average of the squared dev

b) the square of the variance

a z-score of 1 means that

the observation if 1.0 standard deviation to the right of the mean

when the mean and median are equal or approximately equal, the distribution is described as

symmetrical

which is not true for a median?
a) when you add all the numbers it should equal the median
b) for an even number of observations, the median is the mean of the 2 middle numbers
c) for an odd number of observations, the median is the middle of the sorted n

a

mode describes a dataset as (which of the following does not belong to mode dataset)
a) bi-modal
b) multi-modal
c) unimodal
d) three modal

d) three modal

the standard deviation, variance, and range belong to

measures of variation

which of the following is not true for normal curve distribution?
a) approximately 88% will lie within some standard deviations from the mean
b) approximately 98% will lie within some standard deviations from the mean
c) approximately 95% will lie within

a

________________ is the difference between the largest value and the smallest value

range

Q1=25, Q2=50, Q3=75, Q4=100
What is the value of the IQR?

Q3 - Q1
75-25 = 50

a quartile is a measure of relative standing in which a population is divided into ___________ equal groups of data values

4

`___________________ are extremely high or low observations

outliers

which of the following is not a descriptive output?
a) range
b) analytics
c) standard error
d) mean

b) analytics

z-score =

(data value - mean) / standard deviation

a student took the exam and received a score in the 80 percentile. the percentile means that 80 percent

this percentile means that 80% of all students who took the test with that student made less than he or she

_______________ is the interpretation of historical data to better understand changes that have occurred in a business

descriptive analytics

____________________ displays the distribution of one categorical variable in rows and another categorical variable in columns and shows the relationship between 2 variables using frequencies or probabalitiles

contingency table

a ____________________ is a 2-dimensional graph of 2 quantitative variables where one variable, usually the independent variable, is plotted on the horizontal axis and the dependent variable is on the vertical axis

scatter plot

A _________________ displays the distribution of a categorical variable, showing the counts for each category next to each other for comparison purposes

bar chart

A ________________ illustrates all the categories of a variable in a circle, with each category shown as a proportional sector of the circle

pie chart

measures to ensure a person's data is being protected and kept private

privacy

individuals providing data provide informed and explicitly expressed consent of what personal data they provide and how the data will be collected, used, and reported

consent

when a dataset has excessive values, known as outliers, on either side of the distribution, the dataset is

skewed

what is the business analytics process

1. define the problem
2. manage the data
3. give descriptive analytics
4. give predictive analytics
5. five prescriptive analytics
6. make recommendations
7. ask next question

has numerical value
has units
can be mathematically anallyzed

quantitative data

quantitative data can be either

discrete or continuous

qualitative data can be either

structured or unstructured

can be categorized or counted

qualitative data

examples of structured date

phone number
zip code
address
date

examples of unstructured data

photos
videos
blogs
sensor data

using data to explain what has happened up to the present time

descriptive analytics

data accuracy, data integrity, and data completeness are examples of

data quality

data collected from a fitbit is

sensor data

__________________ data is data collected from individual's activity on facebook, twitter, and linkedin

social media

is the liklelihood that an outcome occurs

probability

is the process that results in an outcome

experiment

which of the following characterizes a random variable having 2 possible outcomes, each with a constant probability of occurence

Bernoulli distribution

an experiment is a process that

results in an outcome

a trial is a collection of

independent experiments

the probability of an event is always between

0 and 1

probability is a number that measures

the likelihood that an event will occur

which of the following is not the property of a normal distribution?
a) symmetric
b) range of x is unbounded
c) mean = median = mode
d) range of y is unbounded

d

what are the 2 parameters of normal distribution?

mean and standard deviation

normal distribution is a _____________________ curve

bell-shaped

suppose x is normal with mean of 18.0 seconds and SD of 5.0 seconds. Find P(x<18.6)

0.5478

what is the probability of data is between .84 and 1.95`

.1749

what is true of normal distributions

mean = median = mode

consider an event X comprised of one outcome whose probability is 35%. compute the probability of the complement of the event

.65

mode < median < mean

right skewed

how to find conditional probabilities

P(A and B) = P(A) ? P(B|A)

when to use the poisson distribution?

to determine the probability a specific event will occur over a specified interval of time

R squared is called

coefficient of determination

the value of R square is between

0 and 1

when the value of R square is equal to 0, what kind of relationship is this?

no linear relationship

simple linear regression has how many independent variables?

1

the correlation coefficient is scaled between

-1 and 1

a correlation of 0 indicates that

the 2 variables have no linear relationship to each other

what is a positive linear relationship

as x increases, y increases

____________________ is a measure of the linear relationship between 2 variables, X and Y, which does not depend on units of measurement

correlation coefficient

_______________________ is a technique which identifies the strength of association between pairs of products purchased together and identifies patterns of co-occurence

market basket analysis

which of the following is not the key metrics for market basket analysis?
a) support
b) lift
c) shopping
d) confidence

c) shopping

market basket analysis creates

if-then rules

_________________ is a technique that identifies co-occurence relationships among activities performed by specific individuals or groups

affinity analysis

a lift greater than 1

suggests that the purchase of product A has increased the probability that the purchase of product B will occur on this transaction

the binomial distribution

models n independent replications of a Bernoulli experiment, each with a probability p of success