Programming Assignment 6 — The Code Generator
Project Overview
Programming assignments 2 through 4 involved the constructed of the
front-end (lexer, parser) and gatekeeping (semantic analyzer) stages of a
compiler. In this assignment you will write a program that generates
assembly instructions.
You may do this assignment in OCaml, Python, JavaScript, Haskell or Ruby.
You may work in a team of two people for this assignment. You may work in a
team for any or all subsequent programming assignments. You do not need to
keep the same teammate. The course staff are not responsible for finding
you a willing teammate.
Goal
For this assignment you will write a code generator. Among other
things, this involves implementing the operational semantics specification
of Cool. You will track enough information to generate legitimate run-time
errors (e.g., dispatch on void). You do not have to worry about "malformed
input" because the semantic analyzer (from PA4) has already ruled out bad
programs.
You will also write additional code to unserialize the class map,
implementation map, parent map, and annotated AST produced by the semantic
analyzer.
The Specification
You must create three artifacts:
- A program that takes a single command-line argument (e.g.,
file.cl-type). That argument will be an ASCII text
Cool class map, implementation map, parent map and annotated AST
file (as described in PA4). Your program must emit either Cool
Assembly Language (file.cl-asm) or x86-64 Assembly
Language (file.s).
Your program will consist of a number of OCaml
files, a number of Python files, or a number of Ruby files.
- If you are targeting Cool Assembly Language, then
executing file.cl-asm in a Cool CPU Simulator
must produce the correct output for file.cl according to Cool's
operational semantics.
-
If you are targeting x86-64, then compiling file.s with
gcc on a 64-bit x86 Linux system must produce an executable
that, when run, produces the correct output for file.cl
according to Cool's operational semantics.
- You will only be given .cl-type files from programs that
pass the semantic analysis phase of the reference compiler. You are
not responsible for correctly handling (1+"hello") programs.
- A plain ASCII text file called readme.txt describing your
design decisions and choice of test cases. See the grading rubric. A few
paragraphs should suffice.
- Testcases test1.cl, test2.cl, test3.cl and
test4.cl. The testcases should exercise compiler and run-time
error corner cases.
Assembly Language
You have your choice of either Cool Assembly Language or x86-64
Assembly Language for this project. You may pick either one, or even switch
if you don't like your initial choice (but the deadlines remain unchanged).
Cool Assembly Language is a simplified RISC-style assembly language
that is similar to Java Bytecode.
- Pro: It is dynamically "well typed", so if you produce assembly code
that tries to add a string to a code label, the system will notice
immediately (rather than producing garbage and unhelpfully crashing
later).
- Pro: It is portable and works anywhere the Reference Compiler works.
- Pro: It abstracts away notions like system calls and string handling
so that you can focus on code generation and layout.
- Pro: It has just enough instructions to directly compile Cool
and no more — learning it is simple.
- Pro: It includes built-in debugging and tracing suppot.
- Con: Because it is a simualted bytecode, you cannot use GDB or
Visual Studio to debug your output.
- Con: Because it is a simulated bytecode, you cannot actually "run
the raw .exe" on your computer.
x86-64 Assembly Language is the real deal — it's what most
desktop machines use.
- Pro: You get a strong feeling of accomplishment for working close to
"the bare metal" and using a real-world assembly language.
- Pro: You can use GDB or Visual Studio to step through and debug your
resulting executable.
- Pro: For the subsequent optimization assignment, you can use any
instruction, power or trick that is x86-64 legal to squeeze every last
drop of performance out of your code.
- Con: There are many dialect and syntax options of "x86 assembly
language". This course only allows the 64-bit version
supported by gcc on linux. If you can't run that
on your development machine, you must use Cool Assembly.
- Con: You must handle all of the details, including laying out
strings in memory and emitting assembly code that interfaces with
standard library functions like malloc or scanf or
exit to implement Cool's operational semantics.
- Con: You'll have to use GDB (or something similar) because there's
no innate debugging support.
- Con: x86 assembly is infamous for being hideous. It's your funeral.
To find out if your development machine supports our flavor of x86-64 assembly
language, try this:
./cool --x86 hello-world.cl
gcc hello-world.s
./a.out
If there were no assembler or linker errors and you saw "Hello,
world.", your system can handle our x86-64 assembly. If not, you must
use Cool Assembly Language.
Although we're invoking gcc, that's just shorthand: it's calling
as and collect2 and whatnot to assemble the .s
file into an object file and link that object file into an executable. You
can see the individual steps with gcc -v.
Error Reporting
You are guaranteed that the file.cl-type input file will be
correctly formed and will correspond to a well-typed Cool program. Thus,
there will not be any direct static errors in the input. Those were all
caught by the semantic analyzer earlier.
However, you must generate file.cl-asm (or file.s) so
that it checks for and reports run-time errors. When your
file.{cl-asm,s}
program detects an error, it should use the Syscall IO.out_string
and Syscall exit assembly instructions to cause an error string
to be printed to the screen.
To report an error, write the string ERROR: line_number:
Exception: message (for example, using Syscall
IO.out_string)
and terminate the program with Syscall exit. You may generate
your file.{cl-asm,s} so that it writes whatever you want in the
message, but it should be fairly indicative. Example erroneous input:
class Main inherits IO {
my_void_io : IO ; -- no initializer => void value
main() : Object {
my_void_io.out_string("Hello, world.\n")
} ;
} ;
For such an input, you must generate a well-formed file.{cl-asm,s}
assmebly language file. However, when that file is executed
(either in a Cool CPU Simulator or on an x86-64 machine), it will produce
output such as:
ERROR: 4: Exception: dispatch on void
To put this another way, rather than actually checking for errors directly,
you must generate assembly code that will later check for and report
errors.
Line Number Error
Reporting
The typing rules do not directly specify the line numbers on which errors
are to be reported. As of v1.25, the Cool reference compiler uses
these guidelines:
- Stack overflow: undefined (not your responsibility)
- Division by zero: location of the division expression
- Substring out of range: location of the internal expression
(i.e., 0)
- Dispatch on void: location of dispatch expression
- case-related errors: location of case expression
Note that the reference interpreter uses different line
numbers in some cases; you must match the reference compiler.
Commentary
You will have to handle all of the internal functions (e.g.,
IO.out_string) that are defined in PA4.
You can do basic testing as follows:
Whitespace and newlines do not matter in your file.{cl-asm,s} assembly
code. However, whitespace and newlines do matter for your
simulated Cool CPU output. This is because you are specifically
being asked to implement IO and substring functions.
You should implement all
of the operational semantics rules in the Reference Manual.
You will also have to implement
all
of the built-in functions on the five Basic Classes.
What To Turn In For PA6c
PA6c is a checkpoint for PA6. A compiler is a large and complicated
program; you must start it early to have any hope of success.
For the checkpoint, you will only be tested on
hello-world.cl. If you can generate code for that, you pass the
checkpoint. (Yes, you can "trick" the checkpoint by hard-coding output for
that one test. Ultimately, however, you're only hurting yourself because
you'll have more work to do later. This is not an honor violation.)
You must turn in a zip file containing these files:
- source_files -- your implementation, including
- main.rb or
- main.py or
- main.js or
- main.hs or
- main.ml
Your zip file may also contain:
- team.txt -- an optional file listing your other team member
(see below -- if you are not working in a team, do not include this file)
Submit the file as you did for PA1.
What To Turn In For PA6
You must turn in a zip file containing these files:
- readme.txt -- your README file
- test1.cl -- a testcase
- test2.cl -- a testcase
- test3.cl -- a testcase
- test4.cl -- a testcase
- source_files -- your implementation, including
- main.rb or
- main.py or
- main.js or
- main.hs or
- main.ml
Your zip file may also contain:
- team.txt -- an optional file listing your other team member
(see below -- if you are not working in a team, do not include this file)
Submit the file as you did for PA1.
Working In Pairs
You may complete this project in a team of two. Teamwork imposes burdens
of communication and coordination, but has the benefits of more thoughtful
designs and cleaner programs. Team programming is also the norm in the
professional world.
Students on a team are expected to participate equally in the effort and to
be thoroughly familiar with all aspects of the joint work. Both members
bear full responsibility for the completion of assignments. Partners turn
in one solution for each programming assignment; each member receives the
same grade for the assignment. If a partnership is not going well, the
teaching assistants will help to negotiate new partnerships. Teams may not
be dissolved in the middle of an assignment.
If you are working in a team, exactly one team member should submit
a PA6 zipfile. That submission should include the file team.txt, a
one-line flat ASCII text file that contains exactly and only the
email address of your teammate. Don't include the @virgnia.edu
bit. Example: If ph4u and wrw6y are working together,
ph4u would submit ph4u-pa5.zip with a team.txt
file that contains the word wrw6y. Then ph4u and
wrw6y will both receive the same grade for that submission.
Autograding
We will use scripts to run your program on various testcases. The testcases
will come from the good.cl and bad.cl files you and your
classsmates submit as well as held-out testcases used only for grading.
Your programs cannot use any special libraries (aside from the OCaml
unix and str libraries, which are not necessary for this
assignment). We will use (loosely) the following commands to execute them:
- ghc --make -o a.out *.hs ; ./a.out testcase.cl-type >& testcase.out
- node main.js testcase.cl-type >& testcase.out
- ocamlc unix.cma str.cma *.ml ; ./a.out testcase.cl-type >& testcase.out
- python main.py testcase.cl-type >& testcase.out
- ruby main.rb testcase.cl-type >& testcase.out
You may thus have as many source files as you like (although two or three
should suffice) -- they will be passed to your
language compiler in alphabetical order (if it matters).
In each case we will then compare your output to the correct answer:
- diff testcase.out correct-answer.out
Note that this time we do not ignore newlines and whitespace since
we are explicitly testing your implementation of a string IO subsystem. You
must get every character correct in non-error instances.
If your answer is not the same as the reference answer you get 0
points for that testcase. Otherwise you get 1 point for that testcase.
For error messages and negative testcases we will compare your output
but not the particular error message. Basically, your
generated code need only correctly identify that there is an error on line
X. You do not have to faithfully duplicate our English error messages. Many
people choose to (because it makes testing easier) — but it's not
required.
We will perform the autograding on a 64-bit Linux system. However, your
submissions must officialy be platform-independent (not that hard
with a scripting language). You cannot depend on your compiler running on
any particular platform (although you can depend on the resulting assembly
code running on its associated platform).
There is more to your grade than autograder results. See the Programming
Assignment page for a point breakdown.
Your submission may not create any temporary files. Your submission may not
read or write any files beyond its input and output. We may test your
submission in a special "jail" or "sandbox".