Programming Assignment 7 — The Code Optimizer
Project Overview
Programming assignments 2 through 4 involved the constructed of the
front-end (lexer, parser) and gatekeeping (semantic analyzer) stages of a
compiler. Programming assignment 6 involved the correct generation of
assembly instructions. In this assignment you will produce efficient
assembly instructions with respect to a fixed cost model.
You may do this assignment in OCaml, Python or Ruby. Almost invariably
this assignment is done by modifying your PA6 submission.
You may work in a team of two people for this assignment. You may work in a
team for any or all subsequent programming assignments. You do not need to
keep the same teammate (but it may be tricky to work things out if you
switch here at the end). The course staff are not responsible for finding
you a willing teammate.
Goal
For this assignment you will write an optimizing code generator.
This typically involves transforming the abstract syntax tree into a
control-flow graph and applying a number of dataflow analyses and
optimizations. In addition, students typically pattern-matching
optimizations at the AST level and peephole optimizations at the assembly
level.
In all cases, the optimized assembly code you emit must produce exactly
the same result as the reference interpreter. The first rule of
optimization is "don't break the program".
You do not have to worry about "malformed input" because the semantic
analyzer (from PA4) has already ruled out bad programs.
The Specification
You must create three artifacts:
- A program that takes a single command-line argument (e.g.,
file.cl-type). That argument will be an ASCII text
Cool class map, implementation map, parent map and annotated AST
file (as described in PA4). Your program must emit either Cool
Assembly Language (file.cl-asm) or x86-64 Assembly
Language (file.s).
Your program will consist of a number of OCaml
files, a number of Python files, or a number of Ruby files.
- If you are targeting Cool Assembly Language, then
executing file.cl-asm in a Cool CPU Simulator
must produce the correct output for file.cl according to Cool's
operational semantics.
-
If you are targeting x86-64, then compiling file.s with
gcc on a 64-bit x86 Linux system must produce an executable
that, when run, produces the correct output for file.cl
according to Cool's operational semantics.
- You will only be given .cl-type files from programs that
pass the semantic analysis phase of the reference compiler. You are
not responsible for correctly handling (1+"hello") programs.
- Your optimizer must construct a control-flow graph
representation, perform an intraprocedural data-flow analysis, and
use the results to eliminate dead code. (You may be more precise, for
example with an interprocedural analysis, if you like.) This single
optimization is necessary but not sufficient — you must also
do additional optimizations.
- A plain ASCII text file called readme.txt describing your
optimization design decisions and choice of benchmarks. You must also
describe your approach to the control flow graph, dataflow analysis and
dead code elimination requirement. You should describe any other
optimizations you carried out. See the grading rubric. A few paragraphs
should suffice.
- Two benchmarks benchmark1.cl and benchmark2.cl. Each benchmark
should be a valid Cool program that takes 50,000 instructions or fewer to
execute (via the reference compiler and --asm and
--profile). Optimizers will be graded on the average of their
performance on a pool of programs: that pool contains some baseline Cool
programs as well as all benchmarks submitted by participants. You can thus
"stack the deck" in your favor by constructing programs that you are very
good at optimizing but that others may not be able to handle.
Scoring
The Optimizer assignment is an explicit competition. You will receive more
credit if your optimizer is "better" than that of other students.
The performance of a Cool Assembly Language
program is evaluated using the CYCLES output of cool
--profile. The performance of an x86-64 program is
evaluated using the cycles output of perf stat -r 100.
Each optimizer will be evaluated on the same set of benchmark programs.
Your optimizer's "score" for a program is the factor reduction it
obtains over the reference non-optimizing compiler expressed as a
percentage. For example, if cool --asm on benchmark1 produces
a program that takes 722564 cycles and your optimizer on benchmark1
produces a program that takes 551855 cycles, your speedup on benchmark1 is
1.31x. Note that scores below 1.00x are possible, but unlikely.
Your final score over the benchmark suite is the average of your scores on
the individual benchmarks.
If your optimizer "breaks" any benchmark program, your overall score
immediately becomes zero. If we deem your README to be inadequate, your
overall score is cut in half. If an inspection of your code reveals
that you are hard-coding the benchmark files (e.g., by including pattern
matching that works for the baseline Cool benchmark programs but few
others), your overall score will be cut in half.
Baseline Benchmarks
A baseline set of indicative benchmarks is available here:
Your program will be judged on the baseline set as well as programs
submitted by you and other students. The baseline benchmarks will probably
count for more than benchmarks submitted by students. The baseline
benchmarks were chosen so that reasonable optimizations, such as those
discussed in class, would yield non-trivial improvements. Some of them are
"realistic" (e.g., matrix multiplication, calculating pi, ...) while others
are clearly "synthetic".
You can test your optimizer's performance (and see its average speedup)
by submitting it to our testing server.
Commentary and Suggestions
Different optimizations work best on different representations.
You may want to consider:
- Use your CFG for additional dataflow analyses, such as:
- Constant folding and propagation
- Common subexpression elimination
- General AST pattern-matching optimizations, such as:
- Loop unrolling
- Loop-invariant code motion
- Inlining method or constructor calls
- Constant string merging
- Object-oriented optimizations, such as:
- Constant constructors
- "Unboxing" Integers
- "Unboxing" Bools, Jump Threading for conditionals
- Removing unused methods and fields
- Assembly-level optimizations, such as:
- Peephole optimization
- Strength reduction
- Code layout optimizations
- Register optimizations, such as:
- Using additional registers (such as R4 to R7 in Cool Assembly
Language) as the first few stack slots
- Making your own calling convention (e.g., for constructors)
- Full-blown register allocation
If you haven't heard of one of optimizations above, search around for it on
the Internet.
What To Turn In For PA7
You must turn in a zip file containing these files:
- readme.txt -- your README file
- Notably, between the README and your source code it must be
blindingly obvious where your CFG and dead-code elimination
handling take place
- benchmark1.cl -- a testcase
- benchmark2.cl -- a testcase
- source_files -- your implementation, including
- main.rb or
- main.py or
- main.ml
Your zip file may also contain:
- team.txt -- an optional file listing your other team member
(see below -- if you are not working in a team, do not include this file)
Submit the file as you did for PA1.
Working In Pairs
You may complete this project in a team of two. Same rules as before.
Autograding
We will use scripts to run your program on various testcases. Same rules as
before.