CA2 is due 2/23 at 11PM.
Compilers Assignments 1 through 5 will direct you to design and build an optimizing compiler for Cool. Each assignment will cover an intermediate step along the path to a complete optimizing compiler.
You may do this assignment in OCaml, Haskell, JavaScript, Python or Ruby. (There are no language restrictions for Compilers Assignments.)
You may work in a team of two people for this assignment. You may work in a team for any or all subsequent Compilers Assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.
For this assignment, you will write a program that converts the abtract syntax tree for a method into three-address code.
The abstract syntax tree format (.cl-ast) is as defined in PA3. The three-address code format (.cl-tac) is as defined in CA1.
You must create three artifacts:
Your program must output three-address code to standard output. The output should represent the same program as the input but in three-address code format. Your program will consist of a number of source files, all in the same programming language.
The serialized abstract syntax tree format is described in the programming assignment three handout. The same format is shared between the PL class and the Compilers practicum.
Three-address code is a simplified representation focusing on assignments to variables and unary or binary operators.
The traditional approach to converting expressions to three-address code involves a recursive descent traversal of the abstract syntax tree. The recursive descent traversal returns both a three-address code instruction as well as a list of additional instructions that should be prepended to the output.
Consider the following pseudocode:
let rec convert (a : ast) : (tac_instr list * tac_expr) = match a with | AST_Variable(v) -> [], TAC_Variable(v) | AST_Int(i) -> let new_var = fresh_variable () in [TAC_Assign_Int(new_var, i)], (TAC_Variable(new_var) | AST_Plus(a1,a2) -> let i1, ta1 = convert a1 in let i2, ta2 = convert a2 in let new_var = fresh_variable () in let to_output = TAC_Assign_Plus(new_var, ta1, ta2) in (i1 @ i2 @ [to_output]), (TAC_Variable(new_var)) | ...
On input (1+3)+5
, this code first calls itself recursively on (1+3)
. To
convert 1+3, it calls itself recursively on 1 and 3. To convert 1,
it finds a fresh variable x, outputs x <- 1
, and returns
x. Similarly, to convert 3, it finds a fresh variable y,
outputs y <- 3, and returns y. Now it can convert 1+3
by finding a fresh variable z and outputting z <- x + y,
returning z. The final output would look like:
temp1 <- int 1 temp2 <- int 3 temp3 <- + temp1 temp2 temp4 <- int 5 temp5 <- + temp3 temp4
The traditional approach to converting control-flow statements to three-address code involves a recursive descent traversal of the abstract syntax tree. The recursive descent traversal returns a list of three-address code instructions.
For example, an instruction of the form if COND then THEN_BRANCH else
ELSE_BRANCH
typically becomes:
... code to evaluate COND bt COND then_label ... code to evaluate ELSE_BRANCH jmp end_label label then_label ... code to evaluate THEN_BRACH label end_label
You must consider other control-flow instructions (e.g., while loops, short-circuit boolean evaluation, etc.) and similarly convert them.
You can use cool --parse to obtain a .cl-ast file.
You can use cool --opt --cfg to obtain control-flow graph images that can be processed by GraphViz.
You can use cool --tac to obtain three-address code associated with with a method.
You must turn in a zip file containing these files:
Your zip file may also contain:
Submit the file as you did for CA1.
You may complete this project in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.
Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment. If your partner drops the class at the last minute you are still responsible for the entire assignment.
If you are working in a team, exactly one team member should submit a CA2 zipfile. That submission should include the file team.txt, a one-line flat ASCII text file that contains the email address of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-ca1.zip with a team.txt file that contains the word wrw6y. Then ph4u and wrw6y will both receive the same grade for that submission.
We will use scripts to run your program on various testcases. The testcases will come from the test1.cl and test2.cl files you and your classsmates submit as well as held-out testcases used only for grading. Your programs cannot use any special libraries (aside from the OCaml unix and str libraries, which are not necessary for this assignment). We will use (loosely) the following commands to execute them:
You may thus have as many source files as you like (although two or three plus your parser definition should suffice) -- they will be passed to your language compiler in alphabetical order (if it matters). Note that we will not run the parser generator for you -- you should run it and produce the appropriate ML, Python or Ruby file and submit that.
In each case we will then compare your output to the correct answer. If your translator changes the meaning of the program, you will get 0 points for that testcase.
Your submission may not create any temporary files. Your submission may not read or write any files beyond its input and output. We may test your submission in a special "jail" or "sandbox".