Spark/Scala MOOC + Capstone Project Now Live on Coursera!

Monday 13 March 2017

Heather Miller

We’re very excited to announce that our new MOOC “Big Data Analysis with Scala and Spark” and the Scala Specialization Capstone project are now live on Coursera!

Big Data Analysis with Scala and Spark

The Big Data Analysis with Scala and Spark course is 4 weeks long, and aims to teach those coming from Scala the basics of distributed computation using Apache Spark. It’s more than whirlwhind tour of Spark’s basic APIs – this course aims to teach you fundamentally what is going on when you’re doing a distributed job, and it goes all the way from the basics of RDDs to intermediate/advanced topics such as tweaking for data locality or writing your own aggregation logic.

The course features three auto-graded data analysis programming assignments, all based on real-life data sets. So we hope that makes them even more fun, because it means you can explore the data and explore more insights beyond the questions in the assignments!

You can sign up and participate in the course for free on Coursera (without a certificate of completion), as part of Coursera’s paid certificate program, or as part of the Scala Specialization (a Scala minidegree) on Coursera.

Topics that Big Data Analysis with Scala and Spark covers include:

  • Going from Data Parallel (shared memory) to Distributed Data Parallel
  • Basics of Spark’s distributed collections, RDDs
  • Pair RDDs, Spark’s distributed key-value pairs
  • Joins, Grouping, and Reduction Operations
  • Shuffling, and how to avoid it!
  • Relational Queries and Automatic Optimization with Spark SQL’s DataFrames
  • Unifying Performance and APIs with Spark SQL’s Datasets

Note this is our first time running this course material as a MOOC, so your feedback on how we can improve course is welcome! Also please be patient with us as we work out any hiccups with our new graders.

[spark-course]

Scala Specialization Capstone

The Scala Specialization Capstone is 6 weeks long. It’s a project course; so rather than following lectures week-to-week, students are expected to exercise the concepts that they’ve learned through the Scala Specialization so far on a significant project meant to be undertaken over the full 6 weeks.

Students are challenged to implement, almost completely from scratch, a full application aimed at processing and visualizing several gigabytes of climate data. It covers not only data analysis, but also building ways for users to interactively visualize and explore this climate data. Students will combine everything that they’ve learned so far to process data and allow users to visualize it in their browsers, and to react to user input, using Scala.js.

Note this is our first time running this course material as a MOOC, so your feedback on how we can improve course is welcome! Also please be patient with us as we work out any hiccups with our new graders.

[capstone]