Course Outline: Agile Data Warehouse and ETL Training

Agile Data Warehouse and ETL Design taught by Joe Caserta

Dimensional modeling is the proven technique for developing understandable, high-performance data warehouses. This session offers dimensional modeling techniques and the practical detailed steps of the data warehouse ETL system including extracting, cleaning, conforming, and delivering data and its associated metadata. You'll learn to successfully design, obtain, prepare and publish data in a dimensional data warehouse. You'll also learn how to precisely design and build reusable processes to deliver the foundation for an efficient, reduced cost, successful data warehouse implementation.

Highlights:

  • Plan, design and incrementally develop agile data warehouse solutions
  • Model data requirements directly with stakeholders
  • Translate data analysis requirements into efficient, flexible dimensional models
  • Maximize the usability and performance of your data warehouse design

Intended Audience:

This course is appropriate for anyone involved or interested in learning the latest techniques for planning, designing, and managing dimensional data warehouses and ETL processes.

AGILE DATA WAREHOUSE COURSE OUTLINE

Dimensional Modeling Fundamentals

  • Modeling for measurement - the case for dimensional modeling
  • Fundamentals of stars, snowflakes, facts and dimensions
  • Slowly changing dimensions - accurately reflecting history, supporting current, historically correct and alternative views
  • Common dimensional modeling techniques - Time, multi-role and degenerate dimensions, surrogate keys, value chains and other common multi-star design patterns
  • The Data Warehouse Bus Architecture of conformed dimensions and facts, how data marts can enable incremental data warehouse development

Dimensional Analysis

  • Gathering Analytical Requirements - how to ask the right questions
  • Identifying business events and processes that must be measured
  • Identifying business dimensions by classification (Who, What, Where, When and Why or People, Things, Places, Timestamps and Reasons)
  • Identifying and documenting the relationships between business events and dimensions - the dimensional matrix (reloaded)
  • Identifying Key Performance Indicators (KPIs) and Metrics - aggregates, comparisons and exceptions
  • Defining granular facts - additive, semi-additive and non-additive measures
  • Identifying and classifying dimensional attributes and hierarchies - completeness checks

Advanced Dimensional Modeling

  • Rules for combining and separating dimensions
  • Dealing with Very Large Dimensions - B2C customers
  • Multi-valued dimensions - allocation problems, impact and correctly weighted analysis
  • Advanced slowly changing dimension techniques - multiple alternate realities, hybrid type 2/3 dimensions
  • Variable-depth hierarchies - organization structures, bill of materials recursive relationships, dynamic hierarchies, and generic hierarchy maps

ETL COURSE OUTLINE 

Functional Practices

  • Planning and designing your ETL system
  • Choosing the appropriate architecture
  • Managing the implementation
  • Managing the day to day operations
  • Building the development/test/production suite of ETL processes
  • Building a data cleaning subsystem
  • Understanding the tradeoffs of various staging data structures, including flat files, normalized schemas, XML, and dimensional schemas
  • Analyzing and extracting source data
  • Creating the logical data mapping

Technical Practices

  • Structuring the data into dimensional schemas for the most effective delivery to end users
  • Conforming heterogeneous data from multiple sources into standardized dimension tables and fact tables
  • Building ETL modules for handling the three distinct types of slowly changing dimensions (SCDs)
  • Building ETL modules for multi-valued dimensions and hierarchical dimensions
  • Running high-performance surrogate key pipelines
  • Loading the three fundamental fact table grains - transaction, periodic snapshot and accumulating snapshot
  • Handling late arriving dimensions and facts
  • Optimizing ETL processes to fit into highly constrained load windows
  • Structuring and presenting metadata
  • Converting batch and file-oriented processes into continuously streaming real-time ETL systems

Register Here


Participant Comments

Some of the feedback received from participating students in our courses:

Course content

  • Very informative, sufficient, high level, nicely detailed
  • Covered the relevant stressful topics
  • Presented in an excellent manner; Even I grasped it being new to data warehousing

Course material

  • Brilliant, summarized material and a textbook for reference
  • Easy to follow, very useful
  • Well balanced with the ETL Toolkit - thanks for another 'Bible'
  • Will be re-used in future; Well organized
  • Superb, ETL Book will prove very useful

Course instructor / presenter

  • Knowledgeable and experienced; Interesting great presenter
  • Instructor was great, sharing personal examples
  • Very knowledgeable, good sense of humor
  • Knows what he is talking about
  • Very helpful and keen - good examples
  • Can relate to personal experiences

Other comments

  • Really enjoyed the practical exercises
  • Very well presented
  • Very interesting, good interaction
  • The course was designed for me - thanks
Sign up for Email Updates: