CA1 is due 2/9 at 11PM.
Compilers Assignments 1 through 5 will direct you to design and build an optimizing compiler for Cool. Each assignment will cover an intermediate step along the path to a complete optimizing compiler.
You may do this assignment in OCaml, Haskell, JavaScript, Python, or Ruby. (There are no language restrictions for Compilers Assignments.) (If you don't know what to do, OCaml and Haskell are likely the best languages for the compiler and optimizer. Python is a reasonable third choice.)
You may work in a team of two people for this assignment. You may work in a team for any or all subsequent Compilers Assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.
For this assignment you will write a dead code eliminator for three-address code based on live variable dataflow analysis of single methods.
There are several high level goals for this assignment. You will create a program that:
You must create three artifacts:
Your program must ouput a revised, valid .cl-tac file to standard output. The output should be the same as the input but with dead code removed. Your program will consist of a number of source files, all in the same programming language.
Three-address code is a simplified representation focusing on assignments to variables and unary or binary operators.
The code below computes (1+3)*7 prints out the result, 28:
a <- int 1 b <- int 3 c <- + a b d <- int 7 e <- * c d retval <- call out_int e return retval
In three address code, the expression (1+3)*7
must be broken down so that
the addition and multiplication operations are each handled separately.
(The exact format is described in the next section.)
As a second example, the code below reads in an integer and computes its absolute value (by checking to see if it is negative and subtracting it from zero if so):
x <- call in_int z <- int 0 b <- < x z bt b is_negative output <- x jmp do_the_printing label is_negative output <- - z x label do_the_printing retval <- call out_int output return retval
Note the use of labels, conditional branches and unconditional jumps. You can use the cool reference compiler to evaluate .cl-tac files:
$ echo -88 | ./cool.exe absval.cl-tac 88
Your program should serialize and deserialize file.cl-tac files representing three-address code.
The format of .cl-tac files is designed to be easy to serialize and deserialize — you can process it in just a few lines, without needing special parsing tools.
A .cl-tac file is text-based and line-based. Your program must support Unix, Mac and Windows text files — regardless of where you, personally, developed your program. Note that lines can be delineated by carriage returns, new lines, or both (e.g., the regular expression [\r\n]+ may be helpful).
Possible line contents are:
Assignment (e.g., "x = y"):
Arithmetic (e.g., "x = y + z"):
Comparisons (e.g., "x = y < z"):
Constants:
Note that the string constant will appear on the next line. The string constant assignment is the only three-address form that takes two lines.
Boolean Negation:
Arithmetic Negation:
Object Allocation:
Relevant types are Int, String, Bool and Object. See the Cool Reference Manual.
Object Default Value:
Null Check:
Function Calls:
Branches, Labels and Control Flow:
The bt instruction branches if the variable x is True (x must be a Bool); otherwise control falls through to the next instruction.
You can use cool --tac file.cl to produce file.cl-tac for the first method in that Cool file. Thus you can use the Cool reference compiler to produce three address code for you automatically, given Cool source code. You should do this to test your CA1 code.
while not end of file read one line if line = "" or line[0] = "#" or line[0] = ";" then skip // comment words = split line by whitespace if words[0] = "label" answer += TAC_Label(words[1]) else if words[0] = "jmp" answer += TAC_Jmp(words[1]) else if ... // "bt", "return", "comment", etc. // otherwise: x <- OP ... else if words[2] = "+" answer += TAC_Plus(words[0], words[2], words[3]) else if words[2] = "string" next_line = read one line answer += TAC_String(words[0], next_line) else if ... // all other cases
A variable is live if it may be needed in the future. That is, an assignment to variable v at point p is live if there exists a path from p to the function exit along which v is read before it is overwritten.
For example:
a <- int 1 b <- int 3 c <- + a b d <- int 7 retval <- call out_int c return retval
On the fourth line, the assignment to d is dead because d is not subsequently used. Dead assignments can be removed. Often eliminating one piece of dead code can reveal additional dead code:
a <- int 1 b <- int 3 c <- + a b d <- int 7 e <- + d d retval <- call out_int c return retval
In above example, the assignment to d is not dead (yet!) but the assignment to e is dead. Once the assignment to e is eliminated, the assignment to d becomes dead and can then be eliminated. Thus, live variable analysis and dead code elimination must be repeated until nothing further changes.
You will use the live variable analysis form of data-flow analysis to determine which assignments are dead code.
Function calls with I/O side effects should never be eliminated.
A basic block is a sequence of instructions with only one control-flow entry and one control-flow exit. That is, once you start executing the instructions in a basic block, you must execute all of them — you cannot jump out early or jump in half-way through.
A local dataflow analysis operates on basic blocks.
As part of this assignment you will also implement a global dataflow analysis that operates on multiple basic blocks (that together form one entire method). A control-flow graph is a graph in which the nodes are basic blocks and the edges represent potential control flow.
Your assignment will have the following stages:
You can do basic testing as follows:
$ cool --tac file.cl --out original $ cool --tac --opt file.cl --out ref_optimized $ my-dce original.cl-tac > my_optimized.cl-tac $ cool ref_optimized.cl-tac > optimized.output $ cool my_optimized.cl-tac > my.output $ diff optimized.output my.output $ cool --profile ref_optimized.cl-tac | grep STEPS $ cool --profile my_optimized.cl-tac | grep STEPS
Passing --opt and --tac to the reference compiler will cause the reference compiler to perform dead code elimination before emitting the .cl-tac file.
A number of Video Guides are provided to help you get started on this assignment on your own. The Video Guides are walkthroughs in which the instructor manually completes and narrates, in real time, aspects of this assignment.
If you are still stuck, you can post on the forum, approach the TAs, or approach the professor. The use of online instructional content outside of class weakly approximates a flipped classroom model. Click on a video guide to begin, at which point you can watch it fullscreen or via Youtube if desired.
You must turn in a zip file containing these files:
Your zip file may also contain:
Submit the file to the course autograding website.
You may complete this project in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.
Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment. If your partner drops the class at the last minute you are still responsible for the entire assignment.
If you are working in a team, exactly one team member should submit a CA1 zipfile. That submission should include the file team.txt, a one-line flat ASCII text file that contains the email address of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-ca1.zip with a team.txt file that contains the word wrw6y. Then ph4u and wrw6y will both receive the same grade for that submission.