Motivation: Large Scale Data Processing
Many tasks: Process lots of data to produce other data
Want to use hundreds or thousands of CPUs
- ... but this needs to be easy
MapReduce provides:
- Automatic parallelization and distribution
- Fault-tolerance
- I/O scheduling
- Status and monitoring