Course subjects
Module 1: Introduction to Data Engineering
Explore the role of a data engineer
Analyse data engineering challenges
Introduction to BigQuery
Data lakes and data warehouses
Transactional databases versus data warehouses
Partner effectively with other data teams
Manage data access and governance
Build production-ready pipelines
Review Google Cloud customer case study
Lab: Using BigQuery to do Analysis
Module 2: Building a Data Lake
Introduction to data lakes
Data storage and ETL options on Google Cloud
Building a data lake using Cloud Storage
Securing Cloud Storage
Storing all sorts of data types
Cloud SQL as a relational data lake
Lab: Loading Taxi Data into Cloud SQL
Module 3: Building a Data Warehouse
The modern data warehouse
Introduction to BigQuery
Getting started with BigQuery
Loading data
Exploring schemas
Schema design
Nested and repeated fields
Optimising with partitioning and clustering
Lab: Loading Data into BigQuery
Lab: Working with JSON and Array Data in BigQuery
Module 4: Introduction to Building Batch Data Pipelines
Module 5: Executing Spark on Dataproc
Module 6: Serverless Data Processing with Dataflow
Introduction to Dataflow
Why customers value Dataflow
Dataflow pipelines
Aggregating with GroupByKey and Combine
Side inputs and windows
Dataflow templates
Dataflow SQL
Lab: A Simple Dataflow Pipeline (Python/Java)
Lab: MapReduce in Dataflow (Python/Java)
Lab: Side inputs (Python/Java)
Module 7: Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
Building batch data pipelines visually with Cloud Data Fusion
Components
UI overview
Building a pipeline
Exploring data using Wrangler
Orchestrating work between Google Cloud services with Cloud Composer
Apache Airflow environment
DAGs and operators
Workflow scheduling
Monitoring and logging
Lab: Building and Executing a Pipeline Graph in Data Fusion
Optional Lab: An introduction to Cloud Composer
Module 8: Introduction to Processing Streaming Data
Module 9: Serverless Messaging with Pub/Sub
Module 10: Dataflow Streaming Features
Module 11: High-Throughput BigQuery and Bigtable Streaming Features
Streaming into BigQuery and visualising results
High-throughput streaming with Cloud Bigtable
Optimising Cloud Bigtable performance
Lab: Streaming Analytics and Dashboards
Lab: Streaming Data Pipelines into Bigtable
Module 12: Advanced BigQuery Functionality and Performance
Analytic window functions
Use With clauses
GIS functions
Performance considerations
Lab: Optimising your BigQuery Queries for Performance
Optional Lab: Partitioned Tables in BigQuery
Module 13: Introduction to Analytics and AI
Module 14: Prebuilt ML Model APIs for Unstructured Data
Unstructured data is hard
ML APIs for enriching data
Lab: Using the Natural Language API to Classify Unstructured Text
Module 15: Big Data Analytics with Notebooks
Module 16: Production ML Pipelines
Module 17: Custom Model Building with SQL in BigQuery ML
BigQuery ML for quick model building
Supported models
Lab option 1: Predict Bike Trip Duration with a Regression Model in BigQuery ML
Lab option 2: Movie Recommendations in BigQuery ML
Module 18: Custom Model Building with AutoML
Why AutoML?
AutoML Vision
AutoML NLP
AutoML tables