AWS Big Data Course

Course Curriculum: 

Lesson 01 – AWS in Big Data introduction Introduction to Cloud Computing 

Cloud Computing Deployments Models 

Amazon Web Services Cloud Platform 

The Cloud Computing Difference 

AWS Cloud Economics 

AWS Virtuous Cycle 

AWS Cloud Architecture Design Principles 

Why AWS for Big Data – Reasons 

Why AWS for Big Data – Challenges 

Databases in AWS 

Relational vs Non-Relational Databases 

Data Warehousing in AWS 

Services for Collecting, Processing, Storing, and Analyzing Big Data 1.Amazon Redshift 

2.Amazon Kinesis 

3.Amazon EMR 

4.Amazon DynamoDB 

5.Amazon Machine Learning 

6.AWS Lambda 

7.Amazon Elasticsearch Service 

8.Amazon EC2 (big data analytics software on EC2 instances) Amazon Redshift 

Amazon Kinesis 

Amazon EMR 

Amazon DynamoDB 

Amazon Machine Learning 

AWS Lambda 

Amazon Elasticsearch Service 

Amazon EC2 (big data analytics software on EC2 instances) Key Takeaway 

Knowledge Checks 

Lesson End Project

Lesson 02 – Collection 

Objectives 

Amazon Kinesis Fundamentals 

Loading Data into Kinesis Stream 

Kinesis Data Stream High-Level Architecture Kinesis Stream Core Concepts 

Kinesis Stream Emitting Data to AWS Services Kinesis Connector Library 

Kinesis Firehose 

Transferring Data Using Lambda 

Amazon SQS 

IoT and Big Data 

IoT Framework 

AWS Data Pipeline 

AWS Data Pipeline Components 

Key Takeaway 

Knowledge Checks 

Lesson End Project 

Lesson 03 – Storage 

Objectives 

Introduction to AWS Big Data Storage Services Amazon Glacier 

Glacier and Big Data 

DynamoDB Introduction 

The Architecture of the DynamoDB Table DynamoDB in AWS Ecosystem 

DynamoDB Partitions 

Data Distribution 

Local Secondary Index (LSI) ** 

Global Secondary Index (GSI) ** 

DynamoDB GSI vs LSI 

DynamoDB Stream 

Cross-Region Replication in DynamoDB Partition Key Selection 

Snowball & AWS Big Data 

AWS DMS 

AWS Aurora in Big Data 

Key Takeaway 

Knowledge Checks 

Lesson End Project

Lesson 04 – Processing I Objectives 

Introduction to AWS Big Data Processing Services Amazon Elastic MapReduce (EMR) 

Apache Hadoop 

EMR Architecture 

Storage Options 

EMR File Storage and Compression 

Supported File Format and File Size 

Single-AZ Concept 

EMR Operations 

EMR Releases 

AWS Cluster 

Launching a Cluster 

Advanced EMR Setting Option 

Choosing Instance Type 

Number of Instances 

Monitoring EMR 

Resizing of Cluster 

Using Hue with EMR 

Setup Hue for LDAP 

Hive on EMR 

Hive Use Cases 

Key Takeaway 

Knowledge Checks 

Lesson End Project

Lesson 05 – Processing II 

HBase with EMR 

HBase Use Cases 

Comparison of HBase with Redshift and DynamoDB HBase Architecture HBase on S3 

HBase and EMRFS 

HBase Integration 

HCatalog 

Presto with EMR 

Advantages of Presto 

Presto Architecture 

Spark with EMR 

Spark Use Cases 

Spark Components 

Spark Integration With EMR 

AWS Lambda in AWS Big Data Ecosystem Limitations of Lambda 

Lambda and Kinesis Stream 

Lambda and Redshift 

Key Takeaway 

Knowledge Checks 

Lesson End Project 

Lesson 06 – Analysis I 

Objectives 

Introduction to AWS Big Data Analysis Services RedShift 

RedShift Architecture 

RedShift in the AWS Ecosystem 

Columnar Databases 

RedShift Table Design 

RedShift Workload Management 

RedShift Loading Data 

RedShift Maintenance and Operations 

Key Takeaway 

Knowledge Checks 

Lesson End Project

Lesson 07 – Analysis II 

Machine Learning 

Machine Learning – Use Cases 

Algorithms 

Amazon SageMaker 

Elasticsearch 

Amazon Elasticsearch Service 

Loading of Data into Elasticsearch 

Logstash 

Kibana 

RStudio 

Characteristics 

Athena 

Presto and Hive 

Integration with AWS Glue 

Comparison of Athena with Other AWS Services 

Lab Run Query on S3 Using Serverless Athena 

Key Takeaway 

Knowledge Checks 

Lesson End Project 

Lesson 08 – Visualisation 

Objectives 

Introduction to AWS Big Data Visualization Services Amazon QuickSight 

Amazon QuickSight – Use Cases 

LAB Create an Analysis with a Single Visual Using Sample Data Working with Data  

Assisted Practice: TBD 

QuickSight Visualization 

Big Data Visualization 

Apache Zeppelin 

Jupyter Notebook 

Comparison Between Notebooks 

D3.js (Data-Driven Documents) 

MicroStrategy 

Key Takeaway 

Knowledge Checks 

Lesson End Project

Lesson 09 – Security 

Objectives 

Introduction to AWS Big Data Security Services EMR Security 

Roles 

Private Subnet 

Encryption At Rest and In Transit 

RedShift Security 

KMS Overview  

SloudHSM 

Limit Data Access 

STS and Cross Account Access 

Cloud Trail 

Key Takeaway 

Knowledge Checks 

Lesson End Project