Anthony Darmiento, a CSUF Kinesiology grad student, is looking to collaborate on a project. This could be a suitable MS Project/Thesis topic. Anthony writes:
My name is Anthony Darmiento and I am currently graduate student at CSU Fullerton studying Kinesiology. Per graduation requirements I am working on a project and require some collaborative help. The project involves creating a device related to sport performance and exercise. Specifically, the device measures kinematics of a barbell (weightlifting bar) in a way that no device out there is currently capable of matching. This device will have practical application related to Olympic Weightlifting, Powerlifting, Crossfit, Sport Performance, Strength and Conditioning and many other areas of fitness, exercise and training. If you have one or more of the skills desired below and are interested in collaborating on the design and creation of a state of the art device please contact Anthony Darmiento at Darmiento.firstname.lastname@example.org
Individuals with experience in one or more of the following areas are the best fit:
- Microcontroller programming (C language preferred)
- Digital/Analog Filter Design.
- Experience with wireless communication protocols (Wifi, Bluetooth…)
- PCB Design experience.
- Objective-C (iPhone app/iOS development)
- SQL or other querying language (database creation and integration)
I teach courses involving programming, but I don’t have a good way of collecting and evaluating student submissions. I think a web application would be a great way of doing this. I’m inspired by the Project Euler and UVa Online Judge systems, which allow users to submit short programs, which are then tested and scored automatically by the system. Those sites don’t work in the educational setting, though, where instructors need to evaluate submissions by their own criteria, enforce deadlines and plagiarism policies, and protect student privacy.
Implementing and supporting a complete system with every feature on my wishlist is far more than a one-semester project. I’d expect that a student could implement a prototype, with a minimal set of essential features, as a project. Then other students could add features and improvements in the future.
I think the minimal requirements for a one-semester prototype are
- Represent the following as first-class objects: user, class, class section, assignment, submission attempt, instructor assessment.
- Roles: site administrator, instructor, student.
- Instructors create, edit, delete, and administer classes, sections, and assignments.
- Students can associate with sections, view assignments, and submit assignment attempts.
- An assignment attempt is a single source code file for the C++ language.
- Assignments may have deadlines, after which submissions may not be made.
- The system validates the source code, to sanity check that it resembles C++ source, is of an appropriate size, and is not obviously malicious.
- Code that passes validation is compiled with g++.
- Code that compiles without errors is run, and tested for correctness against a set of instructor-provided input/output text files.
- Submitted code is run in some kind of sandbox to prevent attackers from doing something like deleting the entire filesystem. A chroot or FreeBSD jail is probably the easiest way of doing this.
- Code that runs too long must be killed to prevent a denial-of-service attack from infinite loops.
- The verdict of each submission is emailed to the student and visible to the student and instructor on the website. Possible verdicts are: upload error, failed validation, did not compile, exceeded timeout, incorrect output.
- Instructors may convert verdicts to numeric scores and export them to CSV files for the purposes of grading.
- All activity should be logged to a text file.
- Data should be stored in a form (e.g. database) that can be backed up, migrated to a different host, and exported to a human-readable form.
- Language must be something that I, and a critical mass of other CSUF students, know. Preferably Python, but I’m open to Java, C++, C#, Scheme, or Haskell if there’s a compelling reason. No PHP.
- Host code in a public source code control system (e.g. SourceForge or Google Code) under an open source license.
Future projects could add other important features:
- Support for Python, and possibly other, submission languages.
- More elaborate security precautions, such as running each submission in its own virtual machine, and reporting suspected malicious submissions to the site admin.
- Support multiple C++ compilation environments (Visual Studio, Xcode, *nix g++, *nix clang).
- Synchronizing user lists with Moodle.
- Synchronizing grades with Moodle.
- Automated plagiarism detection, perhaps using MOSS.
- Use CSUF usernames/passwords for authentication.
- More elaborate submission correctness tests. Perhaps incorporate ideas from unit testing. Or allow the instructor to provide a reference implementation.
- Fuzzy output correctness testing; ignore whitespace and capitalization differences, interpret “1” and “01” as the same thing, be robust to floating point rounding errors, etc.
- Runtime correctness testing; allow the instructor to provide a timeout value for specific queries. Even better, allow them to provide a big-O time bound.
- Group submissions.
- Additional roles: grader, guest instructor, guest student.
- A failure viewer that shows students the discrepancies in their program’s output that is causing it to fail. Incorporate ideas from diff viewers such as highlighting differences and collapsing identical sections.
- Multiple-file submissions.
- Support instructor-supplied data files that are moved into each submission’s working directory.
- Output file formats other than text; especially media such as images or sound files.
- Allow assignments to require supporting materials, such as a lab report in PDF format.
- Allow instructors to clone classes or assignments.
- Allow instructors to edit assignments offline.
There are a lot of ways to go with this. But right now I have nothing, so I’m more interested in getting the ball rolling than any of the particular requirements listed above.
UPDATE (10/16/2012): I’ve learned about two similar projects:
Rather than writing something from scratch, it may make more sense to build on one of those projects.
A Graphics Processing Unit (GPU) is a kind of computer processor originally designed to render 3D graphics for computer games. It turns out that the kinds of computations that GPUs do can be applied to many other general purpose computing problems, and in some cases GPU algorithms can be vastly faster than traditional CPU algorithms. Computational geometry is a branch of theoretical computer science concerned with algorithms and data structures for geometric objects.
Project: Implement a novel data computational geometry data structure and/or algorithm on the GPU. Measure its performance and confirm that it is faster than comparable CPU implementations.
Some ideas for worthwhile data structures or algorithms:
- A doubly connected edge list (DCEL) or comparable structure. This is an important one; many algorithms depend on using a DCEL structure.
- Low-dimension linear programming, convex programming, or LP-type programming.
- k-dimension convex hulls. 2D and 3D convex hulls have already been done.
- Voronoi diagrams.
- Binary space partitions.
- Range trees.
- Quad trees.
- Planar graph separators.
- Well separated pairs decomposition.
- Anything else implemented in CGAL.
- Problems on my list of MapReduce topics might apply too.
MapReduce is a software framework for distributed computing on huge data sets. It was created by Google and is used internally by Amazon, eBay, Facebook, Google, IBM, Nokia, Yahoo!, and many other organizations.
- Pick a computational problem that can be parallelized.
- Design a MapReduce-style algorithm to solve the problem. You will probably be able to reuse ideas from the vast literature on parallel algorithms.
- Implement the algorithm and analyze its performance.
Here are some ideas for important problems that might be worth investigating. If you’re interested in a specific problem, you’ll need to do a brief literature review to confirm that the problem is neither impossible, nor trivially easy, to solve with MapReduce.
- Matrix multiplication using Strassen’s algorithm
- Binary space partition
- Minimum spanning trees, possibly with a variant of Boruvka’s algorithm
- 2D arrangements
- 2D convex hulls or 3D convex hulls
- Voronoi diagrams
- Factoring using the Quadratic Sieve or General Number Field Sieve
- Graph separators
- Linear programming
A musician playing a guitar, or a related fretted instrument, forms musical chords by arranging up to four fingers on the instrument’s fretboard. The positions of the fingers dictate the musical pitch that is generated by each string. Strings can also be muted so that they do not produce any sound. For every common musical chord, there are several known frettings, or ways of playing that chord, and these frettings are available in databases such as chordbook.com. However, it would be interesting and useful to generate these frettings algorithmically.
This is a difficult computational problem. It amounts to choosing a subset of finger positions, and the number of subsets grows exponentially in the number of finger positions, of which there are hundreds. Fender Stratocasters typically have 6 strings, each of which may be left open, muted, or sounded in 21 different frets, for a total of 6*23 = 138 finger positions and 2^138 sets of finger positions.
However, there are also many constraints on which fingerings are valid. A fingering must only generate musical pitches that are actually in the desired chord, and in the Western musical temperament there are only 12 pitches to worry about. Also, many conceivable fingerings are impossible to play due to anatomical limitations of the human hand.
The problem of generating guitar chords algorithmically can be posed as a branch-and-bound problem, where each step involves adding one finger to a candidate fingering. This formulation limits the depth of the search to the number of fingers, which is typically four. The branching factor is the number of legal places to position each next finger, which, naively, is large. However musical and anatomical constraints make the effective branching factor much smaller.
Alternatively, using dynamic programming, integer linear programming, a constraint solver, or logic programming, may be feasible. I haven’t thought about these approaches as much.
Go is a classical strategic board game. It has very simple rules, with only two kinds of pieces, yet requires a depth of strategy similar to Chess. The computational problem of making optimal Go movies is computationally difficult. However, there is a simple randomized algorithm using Monte Carlo methods that seems to work well in practice:
- Enumerate all the liberties (legal moves)
- For each liberty, play out a random game where players take turns making completely random, but legal moves. Try k random trials, and keep track of the number of times w that the computer player wins.
- Choose the liberty with the highest win ratio w/k
This is a nice algorithm because it is simple, the player strength and running time can be tuned by adjusting the k parameter, it is embarrasingly parallel, and it seems to work well against human opponents. The basic algorithm has already been designed, implemented, and documented, but there is room for more work in this area.
Project: Implement a Monte-Carlo Go player. The project can be extended by parallelizing the algorithm, or by developing the player on a resource-constrained mobile device.
Wikipedia is the online encyclopedia that everyone can edit. It would be nice to be able to carry around an offline copy of Wikipedia for many reasons, such as reading on mobile devices when network service is unavailable, or as an archive in case of a prolonged Internet outage. Wikipedia stores a lot of data, so compressing it would be helpful. There is even a contest for solving this problem well.
General purpose text compression algorithms such as LZW (GZip) and BWT (bzip2) do a decent job, but it is possible to do even better. Standard text compression algorithms limit themselves to linear-time algorithms for obvious reasons. However, we might be able to do better with a relatively high-order polynomial-time algorithm. One approach would be to use a polynomial-time approximation scheme for a NP-complete problem that corresponds to text compression.
- Reduce hypertext compression to a known approximable NP-complete problem.
- Implement the approximation algorithm.
- Compare the compression performance to competitors.