The first step is to submit a project proposal. The proposal should explain what you expect to learn from the project (i.e., why is it interesting to you?) and should include a work schedule. Make sure to budget time for writing a short project paper (max 10 pages) describing the project and for preparing a short (15-20 minute) project presentation during the last week of classes. The project proposal should be one or two pages long. The main purpose of the proposal is for me to give you feedback on its feasibility (e.g., to warn you if you've decided to solve P=NP in one semester).
The main goal of the project is to allow you to customize the content of the course to your own interests. The goal is not to force you all to produce novel results in one semester. Course projects like this often lead to collaborations that eventually yield exciting research. In the hopefully-likely event that you end up enjoying your project, come see me about taking it further (say, to publication).
An implementation project should feature "numbers" -- controlled, experimental results that help to sway your audience in favor a a point you are making. While we're on that subject, you should actually have a point: "I implemented a register allocator" is not quite good enough. You'll want something more like: "Graph-coloring register allocation yields fewer spills and thus smaller and faster code than greedy register allocation. On the X benchmarks for the Y architecture, replacing a greedy register allocator in the Z compiler with a graph-coloring one resulting in a A% decrease in code size and a B% decrease in average executing time."
Your implementation must be relevant to language design or analysis. It could also be relevant to language implementation, provided that it has sufficient conceptual content and is close enough to the course. Graph-coloring register allocation wouldn't actually cut it. Write a paper on your work.
You will write your project paper as if you were submitting to the ACM SIGPLAN Conference on Programming Language Design and Implementation (aka "PLDI" -- at least the second best venue in this field). LaTeX class files can be found here. Turn in the complete 10-page PDF as well as your LaTeX source.
The 2005 accepted papers for PLDI can be found here. Aside from giving you a number of data points for how the paper should look graphically, reading the electronic editions might help you to find a topic. Extending previously-published work is usually not a bad start.
While preparing the talk keep in mind who your audience is. You will be presenting to colleagues who are eager to find out (1) about new exciting facets of programming languages and (2) how much fun you had. Plan to motivate the project (i.e., why is this important?) and to describe what you learned from it. Keep in mind that your colleagues have not read all the papers that you have read to do the project.
Check out the Speech Evaluation Form for more information. Your peers will use this form to rate your presentation and will turn in a copy to me.
Survey Topic | Description |
Constructs for Concurrency | See Chapter 14 in Winskel for leads. You must go significantly beyond the simple pi calculus (which we'll probably cover in class). |
Formal PL Semantics and Databases | Start with Arasu and Widom's A Denotational Semantics for Continuous Queries over Streams and Relations or Widom's A Denotational Semantics for the Starburst Production Rule Language for a veiw from the DB side. Then check out Goldsmith et al.'s Relational Queries Over Program Traces for a PL perspective. You might well grow this into an implementation project that checks embedded SQL statements (in, e.g., Java programs) against a formal model of your own devising. |
Axiomatic semantics for parallel languages | See Apt and Olderog's Verification of Sequential and Concurrent Programs. |
Region-Based Memory Management | Is it possible to have a type-safe language where the programmer has some control over the memory deallocation? Regions offer one possible approach. Start with the seminal paper by Talpin and then take a look at Cyclone. |
Constraint-based program analysis | Start with Alex Aiken's talk on the topic. The read more of the literature. |
Refinement Types | A method to refine types for the purpose of allowing the type checker to perform more extensive tests. For example we might want to refine the type of lists into two subtypes, the list of even length and those of odd length (or non-empty lists and possibly-empty lists). With such information a compiler can unroll loops involving lists. Start here. |
Implementation choices for polymoprhism | Care must be taken when implementing polymorphic languages so that the representation size of all types are the same. This is sub-optimal and interesting schemes have been devised. Start with the overview in Greg Morrisett's thesis Compiling with Types. |
Monads | Monads are a discipline for simulating the effects of imperative programming in logic or purely functional programming languages. Don't forget Haskell! |
Continuation-passing style semantics | Continuation passing style is a form of operational semantics using continuations. You can start with Duba et al. paper Typing First-Class Continuations in ML from POPL '91 and Filinski's Linear Continuations in POPL '92. |
Semantics of finalizers | Java has the notion of a finalizer, a function that is invoked when an object is garbage collected. This has some interesting consequences, mostly having to do with the ordering of finalizers. See the section from the Java Reference Manual: 12.6 Finalization of Class Instances. |
Languages with multiple inheritance | Java has multiple inheritance of specifications and C++ has also multiple inheritance of implementation. Explore the implications. A starting point might be Cardelli's paper A Semantics of Multiple Inheritance in Lecture Notes in Computer Science, vol 173, 1984, pp 51--67. |
Implementation Topic | Description |
Receiver class analysis with abstract interpretation | Design and implement an analysis to infer the run-time subclass of a value statically declared to have an abstract class. This can be used in compilers for object-oriented languages to optimize the run-time method dispatch. |
Range analysis with abstract interpretation | Design and implement an analysis to infer ranges for integer variables. The results of such analyses can be used to optimize array-bounds checking in type-safe languages such as Java, Modula-3 and ML. |
Implementation choices for exceptions | Read about the implementation of exceptions in C++, ML and Java. Try these implementations on a small fragment of lambda-calculus with exceptions. |
Semantics for Dijkstra's guarded-command language | The guarded-command language is a very simple nondeterministic language. It is described in Winskel Chapter 14 and also in Dijkstra's book A Discipline of Programming. A verification condition generator for a GCL might be interesting. |
Domain-Specific Language Bug-Finding | Choose a DSL for which a large number of "programs" are available (e.g., UnrealScript, Excel Spreadsheets, Infinity Engine Scripting). Troll around in the DSL communities until you find common mistakes or complaints. Write a bug-finding tool that locates mistakes in such programs. This can be as "simple" as FindBugs or as complex as BLAST. Aim for a data-flow analysis of some sort. |
Domain-Specific Language Modeling | Choose a DSL (as above) with non-trivial semantics. Common mistakes and complaints can point you toward areas of complexity. Come up with a formal type system or operational semantics (as appropriate) to capture the essence of some feature. For example, you might look at type qualifiers like "transient" and "travel" in UnrealScript and then build up an operational semantics model (possibly by extending or translating to the pi calculus) that describes them. You might use Mosterman and Vangheluwe's Towards and Executable Denotational Semantics for Causal Block Diagrams (which nominally centers on embedded control systems) as an entry-point into the literature. |
Language-Based Security Analysis | Using program analyses to spot certain classes of security errors is an approach that has been rapidly gaining in popularity. You might get started with David Wagner's work (e.g., type qualifiers and format string vulnerabilities, model checking linux for security violations), then talk to Dave Evans, then pick one of the top 20 security vulnerabilities, then write a program analysis for detecting it. Pay special attention to false positives and false negatives. |
TinyOS and nesC projects. | Quite a few such projects were suggested by David Gay. TinyOS is particularly appropriate if you like operating systems, embedded systems or networking. |
Custom Memory management. |
Have you ever wondered why type-safe languages let you almost
everything except manage your memory? Have you noticed that virtually
all large software systems (e.g. Apache, Linux, gcc) have their own
customized allocators? Apparently it pays off to have an allocator that
exploits the details of the particular strategy for allocation and
deallocation that your application uses. All of this is not available to
users of type-safe languages. Can we design a mechanism that allows
type-safe user-specified memory management?
There are two things that are unsafe about manual memory management: you have a buffer of elements of type char and you carve it out into pieces which are then treated as having some other types; and (much more serious) in the presence of manual deallocation you have the risk of deallocating too early and thus having dangling pointers. Deallocating too late is also bad but technically not unsafe. |
Memory management analysis. | Scott McPeak has implemented an extension to the Boehm-Weiser garbage collector to monitor the execution of a program. Periodically, the program is paused and all allocated blocks are scanned for pointers. Then a graph is constructed with the nodes being the blocks and the edges the pointers between them. You will see that there is a lot of structure in these graphs: you can recognize linked lists, doubly linked lists and even leaked structures. Build an analysis tool that examines this graph (which also contains allocation sites) and tries to predict structural invariants for the data structures at various points in the program. Ambitious: Then, take these invariants and try to verify statically that they are preserved on all executions. |
Type-safe polymorphism in C. | In the C programming language you must use void* with a lot of unsafe casting to implement polymorphic functions (functions that can operate on object of different types). In Java there are two mechanisms. Generics (in the most recent versions) allow you to implement parametric polymorphism (e.g., a list data structure in which all elements are known to be of the same type). Java also has subtype polymorphism (a function that can operate on Object, can be invoked with an object of any type). Parametric polymorphism has the advantage that it does not require run-time checking. This project would explore extensions of the C programming language with support for polymorphism. We want these extensions to be minimal and preferably to not break compatibility with existing programs (e.g., a module compiled with your extended compiler can still be linked with a library module that has been compiled with standard compilers). |
Specifying pointer-based data-structures. | Programmers have a simple but effective vocabulary to communicate invariants about many common data structures (an array of circular lists, a tree with the leaves linked in a circular list). Unfortunately, we do not know of a similarly effective mechanism to communicate these invariants to a tool that can enforce them. Types are probably too weak for this. Full-blown specifications are too hard to use. George Necula has some work in this direction but I think that a lot more can be done. |