00:04:38
34:33
2:35:27
1:05继续
15:15结束
37分开始
Document & Content Managament
Definition: Planning,implementation, and control activities for lifecycle management of data and information found in any form or medium - outside of relational databases.
Goals:
1:To comply with legal obligations and customer expectations regarding records management
2:To ensure effective and efficient storage, retrieval and use of documents and content.
3:To ensure integration capabilities between structured and unstructured content
Business Drivers:
Regulatory compliance:
Laws and regulations require that organizations maintain records of certain types of activities.
Ability to respond to litigation and e-disovery requests
arma 协会的8个原则(GARP:Generally Acceptable Recordkeeping Principles)
important point:
Flat taxonomy: All categories are equal
Hierarchical Taxonomy: Tree structure
Polyhierarchy: Tree-like structure with more than one node relation rule. Child nodes may have more than multiple parents.
本章考点多为概念:
Data Orchestration(数据编排)
传统的数仓都是ETL
现代的数据湖一般是ELT
DII
数据集成(Data integration): Consolidates data into consistent forms(physical or virtual). Two or more systems can share data.
数据互操作能力(Data Interoperability)Provides ability for multiple systems to communicate: two or more systems remain unchanged and can work together.
DII 的业务驱动力:
1, when managing data movement efficiently is primary
2,Need to simplify sharing transcational and operational data across the organization
3,Integration of various data stores and accommodating external applications
Data types:
数据虚拟化:把很多的物理数据库,虚拟的存储在一起。
ETL:不一定只做数据仓库。
E(Extract): selecting, extracting,staging(physically or in memory)
T(Transform): Makes data compatible; remove ,copied trigger events. Examples: Format, structure, semantic conversion, de-duping, re-ordering
L(Load): Physically storing or presenting results in the target system in final or near-final form.
ELT: Common in big data where ELT load the data lake.
Transformations occur after load.
Allows source data to be in raw form in the target datastore.
12essential concepts:
The definition of data security:
Data Security ensures that data privacy and confidentiality are maintained, that data is not breached, and that data is accessed appropriately.
The reason of data security:
Secure data is in the best interest of all stakeholders.
Data Security Business Drivers
1,reduce risk
2,business growth
stakeholder would like to invest in this site.
Privacy(金字塔模型)
1:Privacy Awareness
2:Privacy Governance(合同,工具,流程)
3:Privacy Engineering(管理,安全上的)
最终落地一定事在合同,工具,流程里。
隐私跟安全不一样。
CEP-CMR
数据安全主要活动:
4A 模型(安全访问的流程)
4 issue
what is data? 存储
—————file system 文件系统=collection of files (local, distributed, Cloud)
-File-Stream(e.g. Flink, Kafka etc.)
文件:collection of records(可以是结构化的也可以是半结构化的或者非结构化的。)
(e.g. database, image,video,podcast,csv,pdf....)
流(stream):collection of records(可以是结构化的也可以是半结构化的或者非结构化的。)
(data in motion)
e.g. IoT监控,log, audio(video)....
Record: Collection of data types- fixed length VS variable length(e.g. name DOB)
byte (1byte=8bit)
bit(binary digit)0,1
非关系数据库:Non—relational Database
No SQL=Not only SQL
列数据库= Colunm oriented Database
空间数据库:Spatial Database
数据建模往往和数据架构有关
数据模型是一种展示,方便人们更好理解数据,也能描绘出内在逻辑结构,和影响。
好的数据模型要完整,不冗余,贯彻业务规则,模型能够重用,稳定性,有弹性,最后是美观(优雅elegance)
做数据模型需要先找实体(who,what,when,where,why,how)
normalization:通俗的解释:1,put the data into tabular form(removing repeated groups)
2,remove duplicated data to separate tables
One fact one column!!!
KEY 是表格里能被唯一识别的,经常作为一个表格的column。
NF越高表格分的越细越简洁
superkey ——符合唯一性都叫超键
没有被选中的:alternate key次要/备用键
组合键compound key=一组俩个或更多的集合
复合键composite key=组合键compound key+其他键
Structured key(non DMBOK)
外键foreign key用处是Cross-referencing columns
example:
数据建模4steps:
数据架构分概念,逻辑,物理(conceptual,logical,physical )
数据架构肯定会设计到数据建模,和数据设计
数据架构需要包含:1,当前系统状态描述。2,系统有哪些组件。3,设计这个系统的原则。4,对未来的有益的架构。5,数据架构会涵盖很多工作。6,数据架构师会做更多的事情。
完整的企业架构:BA(业务),DA(数据),AA(应用),TA(技术)
完整的企业架构可以:让公司数据架构的理解当前的状态,确定目标,管理数据,合规,管理数据存在的系统。
数据架构的商业驱动:1,企业数据很多,多到一个人很难理解。2,数据需要展示,每个人看的数据是不一样的。3,数据架构师能创造很多制品帮助人理解公数据。4,数据架构能帮助公司做出改进的计划。5,管理交互。6,翻译需求。7,能保持业务和it的一致性,以及能让公司快速转型。
企业数据模型(EDM)所有的项目都要遵循最开始的EDM
数据管理层级分化
数据管理的商业动力
definition of ethics
seven examples of ethical
four reasons of ethics' importantance
3 core concepts(通过这3点做不道德的事)
有道德地处理数据的6个核心活动
为何要做数据管理的道德
数据管理属于业务的要求,而不是IT的需求很重要,学了cdmp是给客服讲而不是给IT讲
数据战略来自于业务战略
第一讲
1.需要打造符合自身企业的DAMA之轮
2.Overview of everything
balance short term and long term goals
data-driven === insight driven