You may do this assignment in OCaml, Haskell, JavaScript, Python or Ruby. (There are no language restrictions for Compilers Assignments.) (If you don't know what to do, OCaml and Haskell are likely the best languages for the compiler and optimizer. Python is a reasonable third choice.)
You may work in a team of two people for this assignment. You may work in a team for any or all subsequent Compilers Assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.
Your program must a revised, valid .cl-tac file to standard output. The output should be the same as the input but with dead code removed. Your program will consist of a number of source files, all in the same programming language.
The code below computes (1+3)*7 prints out the result, 28:
a <- int 1 b <- int 3 c <- + a b d <- int 7 e <- * c d retval <- call out_int e return retvalIn three address code the expression (1+3)*7 must be broken down so that the addition and multiplication operations are each handled separately. (The exact format is described in the next section.)
As a second example, the code below reads in an integer and computes its absolute value (by checking to see if it is negative and subtracting it from zero if so):
x <- call in_int z <- int 0 b <- < x z bt b is_negative output <- x jmp do_the_printing label is_negative output <- - z x label do_the_printing retval <- call out_int output return retvalNote the use of labels, conditional branches and unconditional jumps. You can use the cool reference compiler to evaluate .cl-tac files:
$ echo -88 | ./cool.exe absval.cl-tac 88
The format of .cl-tac files is designed to be easy to serialize and deserialize — you can process it in just a few lines, without needing special parsing tools.
A .cl-tac file is text-based and line-based. Your program must support Unix, Mac and Windows text files — regardless of where you, personally, developed your program. Note that lines can be delineated by carriage returns, new lines, or both (e.g., the regular expression [\r\n]+ may be helpful).
Possible line contents are:
Arithmetic (e.g., "x = y + z"):
Comparisons (e.g., "x = y < z"):
Constants:
Note that the string constant will appear on the next line. The string constant assignment is the only three-address form that takes two lines.
Boolean Negation:
Arithmetic Negation:
Object Allocation:
Relevant types are Int, String, Bool and Object. See the Cool Reference Manual.
Object Default Value:
Null Check:
Function Calls:
Branches, Labels and Control Flow:
The bt instruction branches if the variable x is True (x must be a Bool); otherwise control falls through to the next instruction.
while not end of file read one line if line = "" or line[0] = "#" or line[0] = ";" then skip // comment words = split line by whitespace if words[0] = "label" answer += TAC_Label(words[1]) else if words[0] = "jmp" answer += TAC_Jmp(words[1]) else if ... // "bt", "return", "comment", etc. // otherwise: x <- OP ... else if words[2] = "+" answer += TAC_Plus(words[0], words[2], words[3]) else if words[2] = "string" next_line = read one line answer += TAC_String(words[0], next_line) else if ... // all other cases
For example:
a <- int 1 b <- int 3 c <- + a b d <- int 7 retval <- call out_int c return retvalOn the fourth line, the assignment to d is dead because d is not subsequently used. Dead assignments can be removed. Often eliminating one piece of dead code can reveal additional dead code:
a <- int 1 b <- int 3 c <- + a b d <- int 7 e <- + d d retval <- call out_int c return retvalIn above example, the assignment to d is not dead (yet!) but the assignment to e is dead. Once the assignment to e is eliminated, the assignment to d becomes dead and can then be eliminated. Thus, live variable analysis and dead code elimination must be repeated until nothing further changes.
You will use the live variable analysis form of data-flow analysis to determine which assignments are dead code.
Function calls with I/O side effects should never be eliminated.
A local dataflow analysis operates on basic blocks.
As part of this assignment you will also implement a global dataflow analysis that operates on multiple basic blocks (that together form one entire method). A control-flow graph is a graph in which the nodes are basic blocks and the edges represent potential control flow.
$ cool --tac file.cl --out original $ cool --tac --opt file.cl --out ref_optimized $ my-dce original.cl-tac > my_optimized.cl-tac $ cool ref_optimized.cl-tac > optimized.output $ cool my_optimized.cl-tac > my.output $ diff optimized.output my.output $ cool --profile ref_optimized.cl-tac | grep STEPS $ cool --profile my_optimized.cl-tac | grep STEPS
Passing --opt and --tac to the reference compiler will cause the reference compiler to perform dead code elimination before emitting the .cl-tac file.
Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment. If your partner drops the class at the last minute you are still responsible for the entire assignment.
If you are working in a team, exactly one team member should submit a CA1 zipfile. That submission should include the file team.txt, a one-line flat ASCII text file that contains the email address of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-ca1.zip with a team.txt file that contains the word wrw6y. Then ph4u and wrw6y will both receive the same grade for that submission.
In each case we will then compare your output to the correct answer, and then compare the size of your output to a size threshhold. If your dead code eliminator changes the meaning of the program, you will get 0 points for that testcase. Otherwise, you get 1 point for that test case if you eliminate "enough" dead code to meet the threshhold.
We will perform the autograding on some unspecified test system. It is likely to be Solaris/UltraSPARC, Cygwin/x86 or Linux/x86. However, your submissions must officially be platform-independent (not that hard with a scripting language). You cannot depend on running on any particular platform.
There is more to your grade than autograder results. See the Compilers Assignment page for a point breakdown.
Your submission may not create any temporary files. Your submission may not read or write any files beyond its input and output. We may test your submission in a special "jail" or "sandbox".