CA2 — TAC Code Generator

Project Overview

CA2 is due 2/23 at 11PM.

Compilers Assignments 1 through 5 will direct you to design and build an optimizing compiler for Cool. Each assignment will cover an intermediate step along the path to a complete optimizing compiler.

You may do this assignment in OCaml, Haskell, JavaScript, Python or Ruby. (There are no language restrictions for Compilers Assignments.)

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent Compilers Assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.

Goal

For this assignment, you will write a program that converts the abtract syntax tree for a method into three-address code.

The abstract syntax tree format (.cl-ast) is as defined in PA3. The three-address code format (.cl-tac) is as defined in CA1.

The Specification

You must create three artifacts:

A program that takes a single command-line argument (e.g., file.cl-ast). That argument will be an ASCII text Cool abstract syntax tree file (as described in below) corresponding to a single method. The cl-ast file will always be well-formed (i.e., there will be no syntax errors in the cl-ast file).
Your program must output three-address code to standard output. The output should represent the same program as the input but in three-address code format. Your program will consist of a number of source files, all in the same programming language.
A plain ASCII text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A paragraph or two should suffice.
Testcases test1.cl and test2.cl. These tests should exercise corner cases of the conversion from abstract syntax trees to three-address code.

Abstract Syntax Tree Format

The serialized abstract syntax tree format is described in the programming assignment three handout. The same format is shared between the PL class and the Compilers practicum.

Expressions to Three Address Code

Three-address code is a simplified representation focusing on assignments to variables and unary or binary operators.

The traditional approach to converting expressions to three-address code involves a recursive descent traversal of the abstract syntax tree. The recursive descent traversal returns both a three-address code instruction as well as a list of additional instructions that should be prepended to the output.

Consider the following pseudocode:

        let rec convert (a : ast) : (tac_instr list * tac_expr) =
                match a with
                | AST_Variable(v) -> [], TAC_Variable(v)
                | AST_Int(i) -> 
                        let new_var = fresh_variable () in
                        [TAC_Assign_Int(new_var, i)], (TAC_Variable(new_var)
                | AST_Plus(a1,a2) ->
                        let i1, ta1 = convert a1 in 
                        let i2, ta2 = convert a2 in 
                        let new_var = fresh_variable () in
                        let to_output = TAC_Assign_Plus(new_var, ta1, ta2) in
                        (i1 @ i2 @ [to_output]), (TAC_Variable(new_var))
                | ...

On input (1+3)+5, this code first calls itself recursively on (1+3). To convert 1+3, it calls itself recursively on 1 and 3. To convert 1, it finds a fresh variable x, outputs x <- 1, and returns x. Similarly, to convert 3, it finds a fresh variable y, outputs y <- 3, and returns y. Now it can convert 1+3 by finding a fresh variable z and outputting z <- x + y, returning z. The final output would look like:

temp1 <- int 1
temp2 <- int 3
temp3 <- + temp1 temp2
temp4 <- int 5
temp5 <- + temp3 temp4

Control-Flow to Three Address Code

The traditional approach to converting control-flow statements to three-address code involves a recursive descent traversal of the abstract syntax tree. The recursive descent traversal returns a list of three-address code instructions.

For example, an instruction of the form if COND then THEN_BRANCH else ELSE_BRANCH typically becomes:

... code to evaluate COND
bt COND then_label
... code to evaluate ELSE_BRANCH
jmp end_label
label then_label
... code to evaluate THEN_BRACH
label end_label

You must consider other control-flow instructions (e.g., while loops, short-circuit boolean evaluation, etc.) and similarly convert them.

Commentary

You can use cool --parse to obtain a .cl-ast file.

You can use cool --opt --cfg to obtain control-flow graph images that can be processed by GraphViz.

You can use cool --tac to obtain three-address code associated with with a method.

What To Turn In For CA2

You must turn in a zip file containing these files:

readme.txt -- your README file
test1.cl -- a tricky testcase
test2.cl -- a tricky testcase
source_files -- including

main.rb or
main.py or
main.js or
main.hs or
main.ml

Your zip file may also contain:

team.txt -- an optional file listing your other team member (see below -- if you are not working in a team, do not include this file)

Submit the file as you did for CA1.

Working In Pairs

You may complete this project in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment. If your partner drops the class at the last minute you are still responsible for the entire assignment.

If you are working in a team, exactly one team member should submit a CA2 zipfile. That submission should include the file team.txt, a one-line flat ASCII text file that contains the email address of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-ca1.zip with a team.txt file that contains the word wrw6y. Then ph4u and wrw6y will both receive the same grade for that submission.

Autograding

We will use scripts to run your program on various testcases. The testcases will come from the test1.cl and test2.cl files you and your classsmates submit as well as held-out testcases used only for grading. Your programs cannot use any special libraries (aside from the OCaml unix and str libraries, which are not necessary for this assignment). We will use (loosely) the following commands to execute them:

ghc --make -o a.out *.hs ; ./a.out testcase.cl-ast node main.js testcase.cl-ast ocamlc unix.cma str.cma *.ml ; ./a.out testcase.cl-ast python main.py testcase.cl-ast ruby main.rb testcase.cl-ast

You may thus have as many source files as you like (although two or three plus your parser definition should suffice) -- they will be passed to your language compiler in alphabetical order (if it matters). Note that we will not run the parser generator for you -- you should run it and produce the appropriate ML, Python or Ruby file and submit that.

In each case we will then compare your output to the correct answer. If your translator changes the meaning of the program, you will get 0 points for that testcase.

Your submission may not create any temporary files. Your submission may not read or write any files beyond its input and output. We may test your submission in a special "jail" or "sandbox".

Grading Rubric

Grading (out of 25)

16 points — test cases

+.1 point for each successful test case (160 total)
0 point for each failed test case (i.e., occurrences of 'FAILED')

6 points — test1.cl and test2.cl test cases

+3 point each for valid, challenging test1.cl and test2.cl files that exercise code generation corner cases
+1.5 point each for valid test cases that are not challenging
0 points for test cases that show limited effort or are invalid

4 points — README.txt

+4 points — clear, detailed description of your design decisions
+2 points — vague or unclear description that omits details
0 points — little or no effort in description or if you submit a RTF/DOC/PDF instead of a plain TXT file.

If you are in a group: -5 points if you do not include a correct team.txt file