Programming Assignment 7 — The Code Optimizer

Project Overview

Programming assignments 2 through 4 involved the constructed of the front-end (lexer, parser) and gatekeeping (semantic analyzer) stages of a compiler. Programming assignment 6 involved the correct generation of assembly instructions. In this assignment you will produce efficient assembly instructions with respect to a fixed cost model.

You may do this assignment in OCaml, Python, Haskell, JavaScript or Ruby. Almost invariably this assignment is done by modifying your PA6 submission.

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammate (but it may be tricky to work things out if you switch here at the end). The course staff are not responsible for finding you a willing teammate.

Goal

For this assignment you will write an optimizing code generator. This typically involves transforming the abstract syntax tree into a control-flow graph and applying a number of dataflow analyses and optimizations. In addition, students typically pattern-matching optimizations at the AST level and peephole optimizations at the assembly level.

In all cases, the optimized assembly code you emit must produce exactly the same result as the reference interpreter. The first rule of optimization is "don't break the program".

You do not have to worry about "malformed input" because the semantic analyzer (from PA4) has already ruled out bad programs.

The Specification

You must create three artifacts:
  1. A program that takes a single command-line argument (e.g., file.cl-type). That argument will be an ASCII text Cool class map, implementation map, parent map and annotated AST file (as described in PA4). Your program must emit either Cool Assembly Language (file.cl-asm) or x86-64 Assembly Language (file.s). Your program will consist of a number of OCaml files, a number of Python files, or a number of Ruby files.
  2. Your optimizer must construct a control-flow graph representation, perform an intraprocedural data-flow analysis, and use the results to eliminate dead code. (You may be more precise, for example with an interprocedural analysis, if you like.) This single optimization is necessary but not sufficient — you must also do additional optimizations.
  3. A plain ASCII text file called readme.txt describing your optimization design decisions and choice of benchmarks. You must also describe your approach to the control flow graph, dataflow analysis and dead code elimination requirement. You should describe any other optimizations you carried out. See the grading rubric. A few paragraphs should suffice.
  4. Two benchmarks benchmark1.cl and benchmark2.cl. Each benchmark should be a valid Cool program that takes 50,000 instructions or fewer to execute (via the reference compiler and --asm and --profile). Optimizers will be graded on the average of their performance on a pool of programs: that pool contains some baseline Cool programs as well as all benchmarks submitted by participants. You can thus "stack the deck" in your favor by constructing programs that you are very good at optimizing but that others may not be able to handle.

Scoring

The Optimizer assignment is an explicit competition. You will receive more credit if your optimizer is "better" than that of other students.

The performance of a Cool Assembly Language program is evaluated using the CYCLES output of cool --profile. The performance of an x86-64 program is evaluated using the cycles output of perf stat -r 100.

Each optimizer will be evaluated on the same set of benchmark programs. Your optimizer's "score" for a program is the factor reduction it obtains over the reference non-optimizing compiler expressed as a percentage. For example, if cool --asm on benchmark1 produces a program that takes 722564 cycles and your optimizer on benchmark1 produces a program that takes 551855 cycles, your speedup on benchmark1 is 1.31x. Note that scores below 1.00x are possible, but unlikely.

Your final score over the benchmark suite is the average of your scores on the individual benchmarks.

If your optimizer "breaks" any benchmark program, your overall score immediately becomes zero. If we deem your README to be inadequate, your overall score is cut in half. If an inspection of your code reveals that you are hard-coding the benchmark files (e.g., by including pattern matching that works for the baseline Cool benchmark programs but few others), your overall score will be cut in half.

Baseline Benchmarks

A baseline set of indicative benchmarks is available here: Your program will be judged on the baseline set as well as programs submitted by you and other students. The baseline benchmarks will probably count for more than benchmarks submitted by students. The baseline benchmarks were chosen so that reasonable optimizations, such as those discussed in class, would yield non-trivial improvements. Some of them are "realistic" (e.g., matrix multiplication, calculating pi, ...) while others are clearly "synthetic".

You can test your optimizer's performance (and see its average speedup) by submitting it to our testing server.

Commentary and Suggestions

Different optimizations work best on different representations. You may want to consider: If you haven't heard of one of optimizations above, search around for it on the Internet.

What To Turn In For PA7

You must turn in a zip file containing these files:
  1. readme.txt -- your README file
  2. benchmark1.cl -- a testcase
  3. benchmark2.cl -- a testcase
  4. source_files -- your implementation, including
Your zip file may also contain: Submit the file as you did for PA1.

Working In Pairs

You may complete this project in a team of two. Same rules as before.

Autograding

We will use scripts to run your program on various testcases. Same rules as before.