Compilers Practicum - Programming Assignment 6

Programming Assignment 7 — The Code Optimizer

Project Overview

Programming assignments 2 through 4 involved the constructed of the front-end (lexer, parser) and gatekeeping (semantic analyzer) stages of a compiler. Programming assignment 6 involved the correct generation of assembly instructions. In this assignment you will produce efficient assembly instructions with respect to a fixed cost model.

You may do this assignment in OCaml, Python, Haskell, JavaScript or Ruby. Almost invariably this assignment is done by modifying your PA6 submission.

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammate (but it may be tricky to work things out if you switch here at the end). The course staff are not responsible for finding you a willing teammate.

Goal

For this assignment you will write an optimizing code generator. This typically involves transforming the abstract syntax tree into a control-flow graph and applying a number of dataflow analyses and optimizations. In addition, students typically pattern-matching optimizations at the AST level and peephole optimizations at the assembly level.

In all cases, the optimized assembly code you emit must produce exactly the same result as the reference interpreter. The first rule of optimization is "don't break the program".

You do not have to worry about "malformed input" because the semantic analyzer (from PA4) has already ruled out bad programs.

The Specification

You must create three artifacts:

A program that takes a single command-line argument (e.g., file.cl-type). That argument will be an ASCII text Cool class map, implementation map, parent map and annotated AST file (as described in PA4). Your program must emit either Cool Assembly Language (file.cl-asm) or x86-64 Assembly Language (file.s). Your program will consist of a number of OCaml files, a number of Python files, or a number of Ruby files.
- If you are targeting Cool Assembly Language, then executing file.cl-asm in a Cool CPU Simulator must produce the correct output for file.cl according to Cool's operational semantics.
- If you are targeting x86-64, then compiling file.s with gcc on a 64-bit x86 Linux system must produce an executable that, when run, produces the correct output for file.cl according to Cool's operational semantics.
- You will only be given .cl-type files from programs that pass the semantic analysis phase of the reference compiler. You are not responsible for correctly handling (1+"hello") programs.
Your optimizer must construct a control-flow graph representation, perform an intraprocedural data-flow analysis, and use the results to eliminate dead code. (You may be more precise, for example with an interprocedural analysis, if you like.) This single optimization is necessary but not sufficient — you must also do additional optimizations.
A plain ASCII text file called readme.txt describing your optimization design decisions and choice of benchmarks. You must also describe your approach to the control flow graph, dataflow analysis and dead code elimination requirement. You should describe any other optimizations you carried out. See the grading rubric. A few paragraphs should suffice.
Two benchmarks benchmark1.cl and benchmark2.cl. Each benchmark should be a valid Cool program that takes 50,000 instructions or fewer to execute (via the reference compiler and --asm and --profile). Optimizers will be graded on the average of their performance on a pool of programs: that pool contains some baseline Cool programs as well as all benchmarks submitted by participants. You can thus "stack the deck" in your favor by constructing programs that you are very good at optimizing but that others may not be able to handle.

Scoring

The Optimizer assignment is an explicit competition. You will receive more credit if your optimizer is "better" than that of other students.

The performance of a Cool Assembly Language program is evaluated using the CYCLES output of cool --profile. The performance of an x86-64 program is evaluated using the cycles output of perf stat -r 100.

Each optimizer will be evaluated on the same set of benchmark programs. Your optimizer's "score" for a program is the factor reduction it obtains over the reference non-optimizing compiler expressed as a percentage. For example, if cool --asm on benchmark1 produces a program that takes 722564 cycles and your optimizer on benchmark1 produces a program that takes 551855 cycles, your speedup on benchmark1 is 1.31x. Note that scores below 1.00x are possible, but unlikely.

Your final score over the benchmark suite is the average of your scores on the individual benchmarks.

If your optimizer "breaks" any benchmark program, your overall score immediately becomes zero. If we deem your README to be inadequate, your overall score is cut in half. If an inspection of your code reveals that you are hard-coding the benchmark files (e.g., by including pattern matching that works for the baseline Cool benchmark programs but few others), your overall score will be cut in half.

Baseline Benchmarks

A baseline set of indicative benchmarks is available here:

cool-optimizer-benchmarks.zip (as of Wed Nov 11 15:20:48 EST 2009)

Your program will be judged on the baseline set as well as programs submitted by you and other students. The baseline benchmarks will probably count for more than benchmarks submitted by students. The baseline benchmarks were chosen so that reasonable optimizations, such as those discussed in class, would yield non-trivial improvements. Some of them are "realistic" (e.g., matrix multiplication, calculating pi, ...) while others are clearly "synthetic".

You can test your optimizer's performance (and see its average speedup) by submitting it to our testing server.

Commentary and Suggestions

Different optimizations work best on different representations. You may want to consider:

Use your CFG for additional dataflow analyses, such as:
- Constant folding and propagation
- Common subexpression elimination
General AST pattern-matching optimizations, such as:
- Loop unrolling
- Loop-invariant code motion
- Inlining method or constructor calls
- Constant string merging
Object-oriented optimizations, such as:
- Constant constructors
- "Unboxing" Integers
- "Unboxing" Bools, Jump Threading for conditionals
- Removing unused methods and fields
Assembly-level optimizations, such as:
- Peephole optimization
- Strength reduction
- Code layout optimizations
Register optimizations, such as:
- Using additional registers (such as R4 to R7 in Cool Assembly Language) as the first few stack slots
- Making your own calling convention (e.g., for constructors)
- Full-blown register allocation

If you haven't heard of one of optimizations above, search around for it on the Internet.

What To Turn In For PA7

You must turn in a zip file containing these files:

readme.txt -- your README file
- Notably, between the README and your source code it must be blindingly obvious where your CFG and dead-code elimination handling take place
benchmark1.cl -- a testcase
benchmark2.cl -- a testcase
source_files -- your implementation, including
- main.rb or
- main.py or
- main.js or
- main.hs or
- main.ml

Your zip file may also contain:

team.txt -- an optional file listing your other team member (see below -- if you are not working in a team, do not include this file)

Submit the file as you did for PA1.

Working In Pairs

You may complete this project in a team of two. Same rules as before.

Autograding

We will use scripts to run your program on various testcases. Same rules as before.