Parallel Programming and MapReduce



(Back to docs.huihoo.com)

Introduction

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

Documents

• MapReduce HTML Slides
• MapReduce PDF Version

Lectures and Reading Material

• Introduction to Parallel Programming
• Advanced MapReduce Lecture #1
• Advanced MapReduce Lecture #2
• Advanced MapReduce Lecture #3

Quiz and Reading Questions

• Advanced MapReduce Quiz
• Advanced MapReduce Reading Questions

Coding Projects

• Advanced MapReduce Project #1
• Advanced MapReduce Project #2
• MapReduce Codelab
• Advanced MapReduce Group Exercises

University of Washington Lectures

• Lecture 1 - Introduction & Parallelization
• Lecture 2 - MapReduce: Theory and Implementation
• Lecture 3 - Networks and Distributed Systems
• Lecture 4 - Distributed File Systems
• Lecture 5 - Other Distributed Systems

University of Washington Labs

• Lab 1 - Introduction to MapReduce
• Lab 2 - A Simple Inverted Index
• Lab 3 - PageRank on the Wikipedia Corpus
• Lab 4 - Clustering the Netflix Movie Data

Links

• Open Source MapReduce: http://lucene.apache.org/hadoop/