Monday, December 8, 2014

Data Warehouse - What is Data Integration

What is data integration?

Data integration is a process in which heterogeneous data are retrieved and combined as an incorporated form and structure. Data integration allows different data types (such as data sets, documents and tables) to be merged by users, organizations and applications, for use as personal or business processes and/or functions.

Data integration primarily supports the analytical processing of large data sets by aligning, combining and presenting each data set from organizational departments and external remote sources to fulfill integrator objectives. 


3 ETL steps-process
  • Extracts data from homogeneous or heterogeneous data sources
  • Transform the data for storing it in proper format or structure for querying and analysis purpose
  • Load it into the final target (database, more specifically, operational data store, data mart, or data warehouse)

Why ETL is important for Data Warehouse?

Because without ETL the process from finding the data to the data warehouse will become messy

What is Data Warehouse?

A data warehouse is a relational multidimensional database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

How is Data Warehouse different from Database?

Data Warehouse:        -Subject Oriented
                                  -Non volatile
                                  -Not normalized
                                   - For analytical
Database:                   -Object Events Oriented
                                  -Volatile
                                  -Normalized
                                  - For transactional

What is Operational Data Store (ODS)?

It is a database designed to integrate data from multiple sources for additional operations on the data. Unlike a master data store, the data is not passed back to operational systems. It may be passed for further operations and to the data warehouse for reporting.

The difference between Data Mart, ODS, and EDW
  • Data Mart: is a small part of the Data Warehouse, example: department "something" data mart
  • ODS: Is a database designed to integrate data from multiple sources for additional operations on the data. Unlike a master data store the data is not passed back to operational systems. It may be passed for further operations and to the data warehouse for reporting.
  • Enterprise Data Warehouse (EDW): it should be same like data warehouse definition

What is metadata?

Metadata is "data about data". There are two (metadata types), structural metadata, about the design and specification of data structures or "data about the containers of data"; and descriptive metadata about individual instances of application data or the data content.

No comments:

Post a Comment