MapReduce algorithmsPosted: July 19, 2012
MapReduce is a software framework for distributed computing on huge data sets. It was created by Google and is used internally by Amazon, eBay, Facebook, Google, IBM, Nokia, Yahoo!, and many other organizations.
- Pick a computational problem that can be parallelized.
- Design a MapReduce-style algorithm to solve the problem. You will probably be able to reuse ideas from the vast literature on parallel algorithms.
- Implement the algorithm and analyze its performance.
Here are some ideas for important problems that might be worth investigating. If you’re interested in a specific problem, you’ll need to do a brief literature review to confirm that the problem is neither impossible, nor trivially easy, to solve with MapReduce.
- Matrix multiplication using Strassen’s algorithm
- Binary space partition
- Minimum spanning trees, possibly with a variant of Boruvka’s algorithm
- 2D arrangements
- 2D convex hulls or 3D convex hulls
- Voronoi diagrams
- Factoring using the Quadratic Sieve or General Number Field Sieve
- Graph separators
- Linear programming