Intro to Data exam 2

correlation

a mutual relationship or connection between two or more things

correlation does not mean

causation

cross tabulation

groups variables to understand the correlation between different variables. It also shows how correlations change from one variable grouping to another. It is usually used in statistical analysis to find patterns, trends, and probabilities within raw data

Nearest Neighbor Search

A proximity search aimed to optimize how to find the closest point to another given point

Standard approach to nearest neighbor

Every point is of equal weight

weighted approach to nearest neighbor

Every point is given a unique preference

Benefits of NN

Define similarities, Typically Very Visual, Reduces chance for irrelevant attributes/data

there is no mathematical formula to figure out what your nearest neighbor is

...

euclidean distance

a direct distance between 2 points

manhattan distance

calculating the distance across and up and down

Cluster

collection of data objects that are similar to each other

what is the most popular online activity

social networking

what is the social media analytics process

capture, understand, and present

capture

Identify conversations on social media platforms related to its activities and interests

understand

Clean data by using some statistical methods and other techniques from text, data mining, machine translation, network analysis

present

Summarize and evaluate the finding

triggers

An automatic response to certain activities on particular data

event trigger

an event happens that causes something; ex: your GPA

data trigger

data situation that happened; ex: your grade

dashboards

a collection of widgets that give you an overview of the reports and metrics you care about most

Key Performance Measure (KPI)

A measurable value that demonstrates how effectively a company is achieving key business objectives