MapReduce algorithms

MapReduce is a software framework for distributed computing on huge data sets.  It was created by Google and is used internally by Amazon, eBay, Facebook, Google, IBM, Nokia, Yahoo!, and many other organizations.

DiscoHolumbus, and Apache Hadoop are open source implementations of MapReduce, for Python, Haskell, and Java, respectively.  Other implementations are available as well.

Project:

  1. Pick a computational problem that can be parallelized.
  2. Design a MapReduce-style algorithm to solve the problem. You will probably be able to reuse ideas from the vast literature on parallel algorithms.
  3. Implement the algorithm and analyze its performance.

Here are some ideas for important problems that might be worth investigating. If you’re interested in a specific problem, you’ll need to do a brief literature review to confirm that the problem is neither impossible, nor trivially easy, to solve with MapReduce.

  1. Matrix multiplication using Strassen’s algorithm
  2. Binary space partition
  3. Minimum spanning trees, possibly with a variant of Boruvka’s algorithm
  4. 2D arrangements
  5. 2D convex hulls or 3D convex hulls
  6. Voronoi diagrams
  7. Factoring using the Quadratic Sieve or General Number Field Sieve
  8. Graph separators
  9. Linear programming
Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s