Programming Assignment 6 - Extra Credit - The Code Optimizer
Project Overview
Programming assignments 2 through 4 involved the constructed of the
front-end (lexer, parser) and gatekeeping (semantic analyzer) stages of a
compiler. Programming assignment 5 involved the correct generation of
assembly instructions. In this assignment you will produce efficient
assembly instructions with respect to a fixed cost model.
You may do this assignment in OCaml, Python or Ruby. Almost invariably
this assignment is done by modifying your PA5 submission.
You may work in a team of two people for this assignment. You may work in a
team for any or all subsequent programming assignments. You do not need to
keep the same teammate (but it may be tricky to work things out if you
switch between PA5 and PA6). The course staff are not responsible for
finding you a willing teammate.
Goal
For this assignment you will write an optimizing code generator.
This typically involves transforming the abstract syntax tree into a
control-flow graph and applying a number of dataflow analyses and
optimizations. In addition, students typically pattern-matching
optimizations at the AST level and peephole optimizations at the assembly
level.
In all cases, the optimized assembly code you emit must produce exactly
the same result as the reference interpreter. The first rule of
optimization is "don't break the program".
You do not have to worry about "malformed input" because the semantic
analyzer (from PA4) has already ruled out bad programs.
The Specification
You must create three artifacts:
- A program that takes a single command-line argument (e.g.,
file.cl-type). That argument will be an ASCII text
Cool class map, implementation map, parent map and annotated AST
file (as described in PA4).
Your program must emit file.cl-asm, a Cool Assembly Language
program. Executing file.cl-asm in a Cool CPU Simulator
must produce the correct output for file.cl according to Cool's
operational semantics. Your program will consist of a number of OCaml
files, a number of Python files, or a number of Ruby files.
- You will only be given .cl-type files from programs that
pass the semantic analysis phase of the reference compiler. You are
not responsible for correctly handling (1+"hello") programs.
- A plain ASCII text file called readme.txt describing your
optimization design decisions and choice of benchmarks. See the grading
rubric. A few paragraphs should suffice.
- Two benchmarks test1.cl and test2.cl. Each benchmark
should be a valid Cool program that takes 50,000 instructions or fewer to
execute (via the reference compiler and --asm and
--profile). Optimizers will be graded on the average of their
performance on a pool of programs: that pool contains some baseline Cool
programs as well as all benchmarks submitted by participants. You can thus
"stack the deck" in your favor by constructing programs that you are very
good at optimizing but that others may not be able to handle.
Scoring
The Optimizer assignment is an explicit competition. You will receive more
extra credit if your optimizer is "better" than that of other students.
Each optimizer will be evaluated on the same set of benchmark programs.
Your optimizer's "score" for a program is the factor reduction it
obtains over the reference non-optimizing compiler expressed as a
percentage. For example, if cool --asm on benchmark1 produces
a program that takes 722564 cycles and your optimizer on benchmark1
produces a program that takes 551855 cycles, your speedup on benchmark1 is
1.31x. Note that scores below 1.00x are possible, but unlikely.
Your final score over the benchmark suite is the average of your scores on
the individual benchmarks.
If your optimizer "breaks" any benchmark program, your overall score
immediately becomes zero. If we deem your README to be inadequate, your
overall score is cut in half. If an inspection of your code reveals
that you are hard-coding the benchmark files (e.g., by including pattern
matching that works for the baseline Cool benchmark programs but few
others), your overall score will be cut in half.
Baseline Benchmarks
A baseline set of indicative benchmarks is available here:
Your program will be judged on the baseline set as well as programs
submitted by you and other students. The baseline benchmarks will probably
count for more than benchmarks submitted by students. The baseline
benchmarks were chosen so that reasonable optimizations, such as those
discussed in class, would yield non-trivial improvements. Some of them are
"realistic" (e.g., matrix multiplication, calculating pi, ...) while others
are clearly "synthetic".
You can test your optimizer's performance (and see its average speedup)
by submitting it to our testing server.
Proposed Extra Credit
This section is subject to change.
We currently propose that any optimizer that is twice as good as cool
--asm --opt will grant a 2% overall course grade improvement. When you
submit your optimizer for testing, we will show you the reference
optimizer's performance as a comparison.
We
currently propose that the best optimizer in the class (or best two
optimizers) will grant a 5% overall course grade improvement. Doing a
stellar job on the optimizer also makes it easier for me to write you a
strong letter of recommendation, since it is a largely independent project
where you must synthesize and apply concepts from class on your own. For
letters of recommendation, the complexity of the optimizations you attempt
matters more than the final numerical speedup.
Commentary and Suggestions
Different optimizations work best on different representations.
You may want to consider:
- Constructing a CFG and performing dataflow analyses, such as:
- Constant folding and propagation
- Dead code elimination
- Common subexpression elimination
- General AST pattern-matching optimizations, such as:
- Loop unrolling
- Loop-invariant code motion
- Inlining method or constructor calls
- Constant string merging
- Object-oriented optimizations, such as:
- Constant constructors
- "Unboxing" Integers
- "Unboxing" Bools, Jump Threading for conditionals
- Removing unused methods and fields
- Assembly-level optimizations, such as:
- Peephole optimization
- Strength reduction
- Code layout optimizations
- Register optimizations, such as:
- Using R4 through R7 as the first four stack slots
- Making your own calling convention (e.g., for constructors)
- Full-blown register allocation
If you haven't heard of an optimization, search around for it on the Internet.
What To Turn In For PA6
You must turn in a zip file containing these files:
- readme.txt -- your README file
- benchmark1.cl -- a testcase
- benchmark2.cl -- a testcase
- source_files -- your implementation, including
- main.rb or
- main.py or
- main.ml
Your zip file may also contain:
- team.txt -- an optional file listing your other team member
(see below -- if you are not working in a team, do not include this file)
Submit the file as you did for PA1.
Working In Pairs
You may complete this project in a team of two. Same rules as before.
Autograding
We will use scripts to run your program on various testcases. Same rules as
before.