Project Overview

All students are expected to select and complete a course project. This typically involves writing a paper exploring a topic covered in the course (or a related topic which we didn't cover). Projects can be done individually or in groups of two.

The main goal of the project is to allow you to customize the content of the course to your own interests. The goal is not to force you all to produce novel results in one semester. Course projects like this often lead to collaborations that eventually yield exciting research. In the hopefully-likely event that you end up enjoying your project, come see me about taking it further (say, to publication).

Deliverables

There are three deliverables associated with the course project:

Project Proposal — a one-page PDF. The proposal should explain what you expect to learn from the project (i.e., why is it interesting to you?) and should include a work schedule.
Project Status Update — a half-page PDF. The status update should enumerate what you have accomplished by that point. It is designed to prevent too much work from being put off until the end.
Project Final Paper — a five-page PDF in ACM SIGPLAN conference proceedings format. The final paper should include a bibliography and will generally resemble a submission for a peer-reviewed workshop. For example, it should include an introduction section, a conclusion section, and a bibliography, among other elements.

Project Scale

I do not expect very fancy projects.

You should learn as much from working on your project as you would from two or three lectures.
20 hours should suffice. My spider sense tells me that if you actually put in one hour of work on your project after every lecture you would end up with the Project Of Ultimate Mastery.
Notably, "one long weekend" will not suffice. Trust me.
You are welcome to tackle a more ambitious project. Such a project should have "stages" so that you have something to show at the end of the semester. I (or your advisor) can provide extra guidance on such projects.
If you have a great idea and need help, feel free to work as a pair. However, given the scope of most projects, I think they are better done alone. Notably, the grading for a two-person project will actually require "twice as much work" rather than the standard "1.5-times as much work". You should be able to split up the project paper (e.g., "I did section 1, Grace Hopper did section 2 and we shared section 3") and I should be able to compare your part of the project to individual projects.

Project Types

There are three main types of projects.

A survey of the work in some area of language design.
An implementation of a small language or relevant algorithm.
Research in language design or language implementation.

Survey Projects

Pick an interesting area. Read at least six papers thoroughly and at least six other papers "superficially". I may provide some initial leads but you should do most of the work in tracking the relevant papers. Write a paper on what you have learned:

What are the basic problems?
What are the basic approaches to solving them?
What are the main achievements to date?
What are the open problems?
What further research or experiments do you propose in this area? Why?

Keep the scope narrow enough so that you can say something interesting.

Implementation Projects

Pick a fragment of a language or a relevant algorithm. Implement it. There is no firm limit on the amount of code. I have in mind 1,000 - 10,000 lines of code.

An implementation project should feature "numbers" — controlled, experimental results that help to sway your audience in favor a a point you are making. While we're on that subject, the project should actually have a claim: "I implemented a register allocator" is not quite acceptable. You'll want something more like: "Graph-coloring register allocation yields fewer spills and thus smaller and faster code than greedy register allocation. On the X benchmarks for the Y architecture, replacing a greedy register allocator in the Z compiler with a graph-coloring one resulting in a A% decrease in code size and a B% decrease in average executing time."

Your implementation must be relevant to language design or analysis. It could also be relevant to language implementation, provided that it has sufficient conceptual content and is close enough to the course.

Research Projects

There are several sorts of research projects:

Design: invent a language, or a language fragment, for some particular purpose or with some particular characteristic
Modeling: try to formalize some interesting aspect of some existing language
Implementation: explore novel techniques for implementing a given language fragment

These projects are the hardest because they always involve some survey work and often involve some implementation. You can always start with a survey project and then turn it into a research one: if you have good ideas one-third of the way in, implement them. If not, read many more papers. In any event, you still have to write the paper.

Example Project Topics

I encourage you to define your own project. However, a number of starting suggestions are also provided.

Suggestions For Survey Projects

Survey Topic	Description
Constructs for Concurrency	See Chapter 14 in Winskel for leads. You must go significantly beyond the simple pi calculus (which we'll probably cover in class).
Formal PL Semantics and Databases	Start with Arasu and Widom's A Denotational Semantics for Continuous Queries over Streams and Relations or Widom's A Denotational Semantics for the Starburst Production Rule Language for a veiw from the DB side. Then check out Goldsmith et al.'s Relational Queries Over Program Traces for a PL perspective. You might well grow this into an implementation project that checks embedded SQL statements (in, e.g., Java programs) against a formal model of your own devising.
Axiomatic semantics for parallel languages	See Apt and Olderog's Verification of Sequential and Concurrent Programs.
Region-Based Memory Management	Is it possible to have a type-safe language where the programmer has some control over the memory deallocation? Regions offer one possible approach. Start with the seminal paper by Talpin and then take a look at Cyclone.
Refinement Types	A method to refine types for the purpose of allowing the type checker to perform more extensive tests. For example we might want to refine the type of lists into two subtypes, the list of even length and those of odd length (or non-empty lists and possibly-empty lists). With such information a compiler can unroll loops involving lists. Start here.
Implementation choices for polymoprhism	Care must be taken when implementing polymorphic languages so that the representation size of all types are the same. This is sub-optimal and interesting schemes have been devised. Start with the overview in Greg Morrisett's thesis Compiling with Types.
Monads	Monads are a discipline for simulating the effects of imperative programming in logic or purely functional programming languages. Don't forget Haskell!
Continuation-passing style semantics	Continuation passing style is a form of operational semantics using continuations. You can start with Duba et al. paper Typing First-Class Continuations in ML from POPL '91 and Filinski's Linear Continuations in POPL '92.
Semantics of finalizers	Java has the notion of a finalizer, a function that is invoked when an object is garbage collected. This has some interesting consequences, mostly having to do with the ordering of finalizers. See the relevant section from the Java manual or from Microsoft to get started.
Languages with multiple inheritance	Java has multiple inheritance of specifications and C++ has also multiple inheritance of implementation. Explore the implications. A starting point might be Cardelli's paper A Semantics of Multiple Inheritance in Lecture Notes in Computer Science, vol 173, 1984, pp 51--67.
Fault localization	Numerous techniques for fault localization exist in the literature. Start with the venerable Tarantula and other spectrum-based techniques, include some approaches based on natural language, and cover at least one human study that evaluates how useful these techniques are in practice.
Any implementation topic	Implementation project topics (see below) can also typically be adapted to serve as survey project topics.

Suggestions For Implementation Projects

Implementation Topic	Description
Automated program repair	Design and implement an algorithm that is better than, or comparable to, an existing automated repair algorithm — but possibly only for a particular problem domain. Reasonable baselines for comparison include GenProg, Angelix/SemFix, etc.
Receiver class analysis with abstract interpretation	Design and implement an analysis to infer the run-time subclass of a value statically declared to have an abstract class. This can be used in compilers for object-oriented languages to optimize the run-time method dispatch.
Natural language analysis	Start with Gable and Su's "A study of the uniqueness of source code", and follow up with Hindle et al.'s "On the naturalness of software". Implement and apply the sort of n-gram or entropy-based natural language analysis they describe to software.
Range analysis with abstract interpretation	Design and implement an analysis to infer ranges for integer variables. The results of such analyses can be used to optimize array-bounds checking in type-safe languages such as Java, Modula-3 and ML.
Implementation choices for exceptions	Read about the implementation of exceptions in C++, ML and Java. Try these implementations on a small fragment of lambda-calculus with exceptions.
Semantics for Dijkstra's guarded-command language	The guarded-command language is a very simple nondeterministic language. It is described in Winskel Chapter 14 and also in Dijkstra's book A Discipline of Programming. A verification condition generator for a GCL might be interesting.

Suggestions For Research Projects

If you are interested in a hearing about possible research projects on topics related to this class, please come speak to me in person. I have a number of them available.