What is data science, data analytics, data exploitation?
What is a database, data warehouse, data lake?
How is data produced, where is it produced, how is it ingested?
What is end-to-end data observability and monitoring?
What is Extract, Transform, Load (ETL)? What is traditional vs modern ETL?
What is Extract, Load, Transform (ELT)? How does it differ from ETL?
How mature is the organization in the collection, manipulation, exploitation of data?
Where are the data silos in our organization and how did they come to be?
What are our business needs for data (e.g. latency, scale, security)?
What does the data tell us about our current business performance?
How can we improve our customer experience based on the data?
How can we design and implement a scalable data pipeline to ingest and process large volumes of structured and unstructured data from multiple sources?
What is a typical design of a cloud-native stack to derive business intelligence?
3. Objectives
Know where we truly are amid (self created) hype on data science, transformation, AI
Know what's possible to ingest, ETL, exploit data, vs whats not in the organization
Understand the link between business needs, data collection, analysis, storytelling, arriving at actionable insights
Learn about modern cloud-based stacks that let you build a scalable data pipeline
Get exposed to efficiency, cost optimization, ROI for designing and implementing data architectures
Understand the needs and implementation of data governance and data security
4. My Observations
Stop creating a new data lake to succeed previous data lakes and warehouses!