MapReduce algorithms

MapReduce is a software framework for distributed computing on huge data sets.  It was created by Google and is used internally by Amazon, eBay, Facebook, Google, IBM, Nokia, Yahoo!, and many other organizations.

DiscoHolumbus, and Apache Hadoop are open source implementations of MapReduce, for Python, Haskell, and Java, respectively.  Other implementations are available as well.

Project:

  1. Pick a computational problem that can be parallelized.
  2. Design a MapReduce-style algorithm to solve the problem. You will probably be able to reuse ideas from the vast literature on parallel algorithms.
  3. Implement the algorithm and analyze its performance.

Here are some ideas for important problems that might be worth investigating. If you’re interested in a specific problem, you’ll need to do a brief literature review to confirm that the problem is neither impossible, nor trivially easy, to solve with MapReduce.

  1. Matrix multiplication using Strassen’s algorithm
  2. Binary space partition
  3. Minimum spanning trees, possibly with a variant of Boruvka’s algorithm
  4. 2D arrangements
  5. 2D convex hulls or 3D convex hulls
  6. Voronoi diagrams
  7. Factoring using the Quadratic Sieve or General Number Field Sieve
  8. Graph separators
  9. Linear programming
Advertisements