Courses

INFO-H 516 Cloud Computing for Data Science

3 credits

Prerequisite(s): CSCI 54100, LIS S511, INFO B512, or INFO B556; prior programming experience required
Delivery: On-Campus

Description

This course covers data science concepts, techniques, and tools to support big data analytics, including cloud computing, parallel algorithms, nonrelational databases, and high-level language support. The course applies the MapReduce programming model and virtual-machine utility computing environments to data-driven discovery and scalable data processing for scientific applications.

Topics

Clouds with infrastructure, platform, and software as a service
Virtualization technologies and tools
MapReduce and data parallel applications using Apache Spark
Apache Hadoop Distributed File System
YARN cluster resource management and Mesos distributed system kernel
Large-scale data storage: NoSQL databases (Google BigTable and Hadoop HBase) and parallel query processing
Large-scale machine learning: Classification, regression, and clustering using MLlib
Spark streaming
Amazon AWS (EC2 and S3) and its applications
Exploring large spatiotemporal datasets

Learning Outcomes

Research the main concepts, models, technologies, and services of cloud computing, the reasons for the shift to this model, and its advantages and disadvantages.
Examine the technical capabilities and commercial benefits of hardware virtualization.
Analyze tradeoffs for data centers in performance, efficiency, cost, scalability, and flexibility.
Evaluate the core challenges of cloud computing deployments, including public, private, and community clouds, with respect to privacy, security, and interoperability.
Create cloud computing infrastructure models.
Demonstrate and compare the use of cloud storage vendor offerings.
Develop, install, and configure cloud-computing applications under software-as-a-service principles, employing cloud-computing frameworks and libraries.
Apply the MapReduce programming model to data analytics in informatics-related domains.
Enhance MapReduce performance by redesigning the system architecture (e.g., provisioning and cluster configurations).
Overcome difficulties in managing very large datasets, both structured and unstructured, using nonrelational data storage and retrieval (NoSQL), parallel algorithms, and cloud computing.
Apply the MapReduce programming model to data-driven discovery and scalable data processing for scientific applications.

Policies and Procedures

Please be aware of the following linked policies and procedures. Note that in individual courses instructors will have stipulations specific to their course.

Luddy School of
Informatics, Computing, and Engineering

Courses

INFO-H 516 Cloud Computing for Data Science

Description

Topics

Learning Outcomes

Policies and Procedures

Additional links and resources

Explore

Happening at Luddy

Information For

Courses

INFO-H 516 Cloud Computing for Data Science

Description

Topics

Learning Outcomes

Policies and Procedures

Luddy School of Informatics, Computing, and Engineering resources and social media channels

Additional links and resources

Explore

Happening at Luddy

Information For