CA4 — Compiler

Project Overview

CA4 is due 4/7 at 11PM.

Compilers Assignments 1 through 5 will direct you to design and build an optimizing compiler for Cool. Each assignment will cover an intermediate step along the path to a complete optimizing compiler.

You may do this assignment in OCaml, Haskell, JavaScript, Python, or Ruby. (There are no language restrictions for Compilers Assignments.) (If you don't know what to do, OCaml and Haskell are likely the best languages for the compiler and optimizer. Python is a reasonable third choice.)

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent Compilers Assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.

Goal

For this assignment, you will write a program that takes a Cool abstract syntax tree and produces either Cool Assembly or x86-64 assembly. Like with CA3, you will be given a Cool Annotated Abstract Syntax Tree (see PA4 specification for details). From this annotated AST, you must generate ASM code.

You do not have to worry about malformed input because the semantic analyzer (from PA4) has already ruled out bad programs. You will track enough information to generate legitimate run-time errors (e.g., dispatch on void). Among other things, this assignment involves implementing the operational semantics specification of Cool.

The Specification

You must create three artifacts:

A program that takes a single command-line argument (e.g., file.cl-type). That argument will be a Cool Annotated Abstract Syntax Tree file coresponding to one or more classes and methods. The cl-type file will always be well-formed (i.e., there will be no errors in the cl-type file). Your program must emit either Cool Assembly Language (file.cl-asm) or x86-64 Assembly Language (file.s). Your program will consist of a number of source files.

If you are targeting Cool ASsembly Language, then executing file.cl-asm must produce the correct output for file.cl according to Cool's operational semantics.
If you are targeting x86-64, then compiling file.s with gcc on 64-bit Linux must produce an executable that, when run, produces the correct output for file.cl according to Cool's operational semantics.
You will only be given .cl-type files from programs that pass the semantic analysis phase of the reference compiler. You are not responsible for correctly handling (1+"hello") programs.

A plain ASCII text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A few paragraphs should suffice.
Test cases test1.cl, test2.cl, test3.cl, and test4.cl. These test cases should exercise compiler and run-time error corner cases.

If you are working in a pair, you should also include a team.txt file that contains only the computing id of your partner.

Error Reporting

You are guaranteed that the file.cl-type input file will be correctly formed and will correspond to a well-typed Cool program. Thus, there will not be any direct static errors in the input. Those were all caught by the semantic analyzer earlier.

However, you must generate file.cl-asm (or file.s) so that it checks for and reports run-time errors. When your file.{cl-asm,s} program detects an error, it should use the Syscall IO.out_string and Syscall exit assembly instructions to cause an error string to be printed to the screen.

To report an error, write the string ERROR: line_number: Exception: message (for example, using Syscall IO.out_string) and terminate the program with Syscall exit. You may generate your file.{cl-asm,s} so that it writes whatever you want in the message, but it should be fairly indicative. Example erroneous input:

class Main inherits IO {
  my_void_io : IO ; -- no initializer => void value
  main() : Object {
    my_void_io.out_string("Hello, world.\n")
  } ;
} ;

For such an input, you must generate a well-formed file.{cl-asm,s} assmebly language file. However, when that file is executed (either in a Cool CPU Simulator or on an x86-64 machine), it will produce output such as:

ERROR: 4: Exception: dispatch on void

To put this another way, rather than actually checking for errors directly, you must generate assembly code that will later check for and report errors.

Line Number Error Reporting

The typing rules do not directly specify the line numbers on which errors are to be reported. As of v1.25, the Cool reference compiler uses these guidelines:

Stack overflow: undefined (not your responsibility)
Division by zero: location of the division expression
Substring out of range: location of the internal expression (i.e., 0)
Dispatch on void: location of dispatch expression
case-related errors: location of case expression

Note that the reference interpreter uses different line numbers in some cases; you must match the reference compiler.

Video Guides

A Video Guide is provided to help you get started on this assignment on your own. The Video Guides is a walkthrough in which the instructor manually completes and narrates, in real time, the first part of this assignment. It includes coding, testing and debugging elements.

If you are still stuck, you can post on the forum, approach the TAs, or approach the professor. The use of online instructional content outside of class weakly approximates a flipped classroom model. Click on a video guide to begin, at which point you can watch it fullscreen or via Youtube if desired.

CA4 — Code Generation

CA4 — Debugging

Reminder: You can watch YouTube videos at 1.5x speed with full audio.

What To Turn In For CA4

You must turn in a zip file containing these files:

readme.txt — your README file
source_files — including
- main.rb or
- main.py or
- main.js or
- main.hs or
- main.ml
4 test cases

test1.cl
test2.cl
test3.cl
test4.cl

Your zip file may also contain:

team.txt—an optional file listing your other team member (see below—if you are not working in a team, do not include this file)

Submit the file to the course autograding website.

Working In Pairs

You may complete this project in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment. If your partner drops the class at the last minute you are still responsible for the entire assignment.

If you are working in a team, exactly one team member should submit a CA1 zipfile. That submission should include the file team.txt, a one-line flat ASCII text file that contains the email address of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-ca1.zip with a team.txt file that contains the word wrw6y. Then ph4u and wrw6y will both receive the same grade for that submission.

Autograding

We will use scripts to run your program on various testcases. The testcases will come from the test1.cl-tac and test2.cl-tac files you and your classsmates submit as well as held-out testcases used only for grading. Your programs cannot use any special libraries (aside from the OCaml unix and str libraries, which are not necessary for this assignment). We will use (loosely) the following commands to execute them:

ghc --make -o a.out *.hs ; ./a.out testcase.cl-tac >& testcase.out node main.js testcase.cl-tac >& testcase.out ocamlc unix.cma str.cma *.ml ; ./a.out testcase.cl-tac >& testcase.out python main.py testcase.cl-tac >& testcase.out ruby main.rb testcase.cl-tac >& testcase.out

You may thus have as many source files as you like (although two or three plus your parser definition should suffice)—they will be passed to your language compiler in alphabetical order (if it matters). Note that we will not run the parser generator for you—you should run it and produce the appropriate ML, Python or Ruby file and submit that.

In each case we will then compare your output to the correct answer, and then compare the size of your output to a size threshold. If your compiler changes the meaning of the program, you will get 0 points for that testcase. Otherwise, you get 1 point for that test case.

There is more to your grade than autograder results. See the Compilers Assignment page for a point breakdown.

Your submission may not create any temporary files. Your submission may not read or write any files beyond its input and output. We may test your submission in a special "jail" or "sandbox".

Grading Rubric

Grading (max 107 out of 100)

85 points for autograder tests

+1 point for each passed good test case (77 total)
+1 point for each passed bad test case (8 total)
+0 point for each failed test case (i.e., occurrences of 'FAILED')

16 points for valid test1.cl, test2.cl, test3.cl, and test4.cl files

+4 points each — high quality tests
+2 points each — valid tests that do not stress challenges in compilation
+0 points — test cases showing no effort

6 points for README.txt file

+6 points — a clear, thorough description of compiler design and test cases
+3 points — a vague or unclear description omitting details
+0 points — little effort or if you submit an RTF/DOC/PDF instead of a plain TXT file

If you are in a group: -10 points if you do not submit a correct team.txt file