本章考点多为概念:
Data Orchestration(数据编排)
传统的数仓都是ETL
现代的数据湖一般是ELT
DII
数据集成(Data integration): Consolidates data into consistent forms(physical or virtual). Two or more systems can share data.
数据互操作能力(Data Interoperability)Provides ability for multiple systems to communicate: two or more systems remain unchanged and can work together.
DII 的业务驱动力:
1, when managing data movement efficiently is primary
2,Need to simplify sharing transcational and operational data across the organization
3,Integration of various data stores and accommodating external applications
Data types:
数据虚拟化:把很多的物理数据库,虚拟的存储在一起。
ETL:不一定只做数据仓库。
E(Extract): selecting, extracting,staging(physically or in memory)
T(Transform): Makes data compatible; remove ,copied trigger events. Examples: Format, structure, semantic conversion, de-duping, re-ordering
L(Load): Physically storing or presenting results in the target system in final or near-final form.
ELT: Common in big data where ELT load the data lake.
Transformations occur after load.
Allows source data to be in raw form in the target datastore.
12essential concepts: