Organization's data architecture consists of:
-Data stores
-ETL processes
-Metadata
-Data access
Data stores
operational and analytical
ETL processes
to move and transform data from one data store to another
Metadata
describing data stores and relationships between them
Data access
analysis software and middleware for controlling and providing access to data for users
Issues with Integrating Data
-Inconsistent key structures in different systems
-Synonyms: Different systems use different names
-Free-form vs. structured fields
-Inconsistent data values across systems
-Missing data
Techniques for Data Integration
-Consolidation
-Data federation
-Data propagation
Consolidation
all data into a centralized database
Data federation
provides a virtual view of data without actually creating one centralized database
Data propagation
Duplicate data across databases, with near real-time delay--replication
Static extract
capturing a snapshot of the source data at a point in time-take everything from desired columns
incremental extract
capturing changes that have occurred since the last static extract-select rows based on update/insert dates
Record level
-selection:data partioning
-joining-data combining
-aggregation-data summarization
Field-level
-single-field: from one field to one field
-multi-field: from many fields to one, or one field to many
Refresh mode
bulk rewriting of target data at periodic intervals--drop indexes; load; reindex
Update mode
only changes in source data are written to data warehouse--maintain indexes
Accuracy
degree to which value matches reality
Uniqueness
degree to which an entity is only represented once in system
Consistency
data representing same entity should have same value or a consistent value
Completeness
all values are represented
Timeliness
data is available when needed
Currency
data represents the current state of the entity
Conformance
data conforms to MetaData rules
Referential integrity
data complies with referential integrity constraints
Causes of poor data quality
-External data sources
-Redundant data storage and inconsistent metadata
-Data entry
-Lack of organizational commitment
Who is responsible for data quality
-Data governance
-Data steward
Data governance
high-level organizational groups and processes overseeing data stewardship across the organization
Data steward
A persona responsible for ensuring that organizational applications properly support the organization's data quality goals
TQM Principles
-Defect prevention
-Continuous improvement
-Use of enterprise data standards
-Strong foundation of measurment
Master Data Management (MDM)
Disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areas
Identity registry
master data remains in source systems; registry provides applications with location
Integration hub
data changes broadcast through central service to subscribing databases
Persistent
central "golden record" maintained; all applications have access