CA1 — Dead Code Eliminator

Project Overview

CA1 is due 2/9 at 11PM.

Compilers Assignments 1 through 5 will direct you to design and build an optimizing compiler for Cool. Each assignment will cover an intermediate step along the path to a complete optimizing compiler.

You may do this assignment in OCaml, Haskell, JavaScript, Python, or Ruby. (There are no language restrictions for Compilers Assignments.) (If you don't know what to do, OCaml and Haskell are likely the best languages for the compiler and optimizer. Python is a reasonable third choice.)

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent Compilers Assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.

Goal

For this assignment you will write a dead code eliminator for three-address code based on live variable dataflow analysis of single methods.

There are several high level goals for this assignment. You will create a program that:

The Specification

You must create three artifacts:

  1. A program that takes a single command-line argument (e.g., file.cl-tac). That argument will be an ASCII text Cool Three Address Code file (as described in below) corresponding to a single method. The cl-tac file will always be well-formed (i.e., there will be no errors in the cl-tac file).

    Your program must ouput a revised, valid .cl-tac file to standard output. The output should be the same as the input but with dead code removed. Your program will consist of a number of source files, all in the same programming language.

  2. A plain ASCII text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A paragraph or two should suffice.
  3. Test cases test1.cl-tac and test2.cl-tac. These should be samples of code where it is tricky to remove dead code.

Three Address Code Example

Three-address code is a simplified representation focusing on assignments to variables and unary or binary operators.

The code below computes (1+3)*7 prints out the result, 28:

a <- int 1
b <- int 3
c <- + a b 
d <- int 7
e <- * c d 
retval <- call out_int e 
return retval 

In three address code, the expression (1+3)*7 must be broken down so that the addition and multiplication operations are each handled separately. (The exact format is described in the next section.)

As a second example, the code below reads in an integer and computes its absolute value (by checking to see if it is negative and subtracting it from zero if so):

x <- call in_int
z <- int 0
b <- < x z 
bt b is_negative
output <- x
jmp do_the_printing
label is_negative
output <- - z x 
label do_the_printing
retval <- call out_int output 
return retval

Note the use of labels, conditional branches and unconditional jumps. You can use the cool reference compiler to evaluate .cl-tac files:

$ echo -88 | ./cool.exe absval.cl-tac
88

The .cl-tac File Format

Your program should serialize and deserialize file.cl-tac files representing three-address code.

The format of .cl-tac files is designed to be easy to serialize and deserialize — you can process it in just a few lines, without needing special parsing tools.

A .cl-tac file is text-based and line-based. Your program must support Unix, Mac and Windows text files — regardless of where you, personally, developed your program. Note that lines can be delineated by carriage returns, new lines, or both (e.g., the regular expression [\r\n]+ may be helpful).

Possible line contents are:

You can use cool --tac file.cl to produce file.cl-tac for the first method in that Cool file. Thus you can use the Cool reference compiler to produce three address code for you automatically, given Cool source code. You should do this to test your CA1 code.

Three-Address Code Deserialization

You should be able to use high-level code in the spirit of this example:
while not end of file
        read one line
        if line = "" or line[0] = "#" or line[0] = ";" then 
                skip    // comment
        words = split line by whitespace
        if words[0] = "label"
                answer += TAC_Label(words[1]) 
        else if words[0] = "jmp"
                answer += TAC_Jmp(words[1])
        else if ... // "bt", "return", "comment", etc. 
        // otherwise: x <- OP ... 
        else if words[2] = "+"
                answer += TAC_Plus(words[0], words[2], words[3])
        else if words[2] = "string"
                next_line = read one line
                answer += TAC_String(words[0], next_line) 
        else if ... // all other cases 

Dead Code Elimination

A variable is live if it may be needed in the future. That is, an assignment to variable v at point p is live if there exists a path from p to the function exit along which v is read before it is overwritten.

For example:

a <- int 1
b <- int 3
c <- + a b 
d <- int 7
retval <- call out_int c 
return retval 

On the fourth line, the assignment to d is dead because d is not subsequently used. Dead assignments can be removed. Often eliminating one piece of dead code can reveal additional dead code:

a <- int 1
b <- int 3
c <- + a b 
d <- int 7
e <- + d d 
retval <- call out_int c 
return retval 

In above example, the assignment to d is not dead (yet!) but the assignment to e is dead. Once the assignment to e is eliminated, the assignment to d becomes dead and can then be eliminated. Thus, live variable analysis and dead code elimination must be repeated until nothing further changes.

You will use the live variable analysis form of data-flow analysis to determine which assignments are dead code.

Function calls with I/O side effects should never be eliminated.

Basic Blocks and Control-Flow Graphs

A basic block is a sequence of instructions with only one control-flow entry and one control-flow exit. That is, once you start executing the instructions in a basic block, you must execute all of them — you cannot jump out early or jump in half-way through.

A local dataflow analysis operates on basic blocks.

As part of this assignment you will also implement a global dataflow analysis that operates on multiple basic blocks (that together form one entire method). A control-flow graph is a graph in which the nodes are basic blocks and the edges represent potential control flow.

Commentary

Your assignment will have the following stages:

  1. Deserialize the input .cl-tac file
  2. Identify basic blocks and construct the control-flow graph
  3. Repeat until nothing changes:
    1. Perform global live variable analysis on the control-flow graph
      • This involves local live variable analysis on individual statements or basic blocks
    2. Eliminate dead code
  4. Serialize the result to standard output in .cl-tac format

You can do basic testing as follows:

Passing --opt and --tac to the reference compiler will cause the reference compiler to perform dead code elimination before emitting the .cl-tac file.

Video Guides

A number of Video Guides are provided to help you get started on this assignment on your own. The Video Guides are walkthroughs in which the instructor manually completes and narrates, in real time, aspects of this assignment.

If you are still stuck, you can post on the forum, approach the TAs, or approach the professor. The use of online instructional content outside of class weakly approximates a flipped classroom model. Click on a video guide to begin, at which point you can watch it fullscreen or via Youtube if desired.

Review of TAC and CFGs
Python TAC to CFG Converter

What To Turn In For CA1

You must turn in a zip file containing these files:

  1. readme.txt — your README file
  2. test1.cl-tac and test2.cl-tac — novel test cases
  3. source_files — including
    • main.rb or
    • main.py or
    • main.js or
    • main.hs or
    • main.ml

Your zip file may also contain:

Submit the file to the course autograding website.

Working In Pairs

You may complete this project in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment. If your partner drops the class at the last minute you are still responsible for the entire assignment.

If you are working in a team, exactly one team member should submit a CA1 zipfile. That submission should include the file team.txt, a one-line flat ASCII text file that contains the email address of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-ca1.zip with a team.txt file that contains the word wrw6y. Then ph4u and wrw6y will both receive the same grade for that submission.

Grading Rubric