CS 4610 - Programming Languages - Programming Assignment 4

Programming Assignment 4 - The Semantic Analyzer

Project Overview

Programming assignments 2 through 5 will direct you to design and build a compiler for Cool. Each assignment will cover one component of the compiler: lexical analysis, parsing, semantic analysis, and operational semantics. Each assignment will ultimately result in a working compiler phase which can interface with the other phases.

You may do this assignment in OCaml, Python or Ruby. You must use each language at least once (over the course of PA2 - PA5); you will use one language (presumably your favorite) twice.

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate. However, you must still satisfy the language breadth requirement (i.e., you must be graded on at least one OCaml program, at least one Ruby program, and at least one Python program).

Goal

For this assignment you will write a semantic analyzer. Among other things, this involves traversing the abstract syntax tree and the class hierarchy. You will reject all Cool programs that do not comply with the Cool type system.

You will also write additional code to unserialize the AST produced by the parser stage and to serialize the class map, implementation map, parent map, and annotated AST produced by your semantic analysis.

The Specification

You must create three artifacts:

A program that takes a single command-line argument (e.g., file.cl-ast). That argument will be an ASCII text Cool abstract syntax tree file (as described in PA3). Your program must either indicate that there is an error in the input (e.g., a type error) or emit file.cl-type, a serialized Cool abstract syntax tree, class map, implementation map, and parent map. If your program is called checker, invoking checker file.cl-ast should yield the same output as cool --type file.cl. Your program will consist of a number of OCaml files, a number of Python files, or a number of Ruby files.
A plain ASCII text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A few paragraphs should suffice.
Testcases good.cl, bad1.cl, bad2.cl and bad3.cl. The first should pass the semantic analysis stage. The remaining three should yield semantic analysis errors.

Error Reporting

To report an error, write the string ERROR: line_number: Type-Check: message to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative. Example erroneous input:

class Main inherits IO {
  main() : Object {
    out_string("Hello, world.\n" + 16777216) -- adding string + int !?
  } ;
} ;

Example error report output:

ERROR: 3: Type-Check: arithmetic on String Int instead of Ints

Line Number Error Reporting

The typing rules do not directly specify the line numbers on which errors are to be reported. As of v1.11, the Cool reference compiler uses these guidelines (possibly surprising ones are italicized):

Errors related to parameter-less method main in class Main: always line 0
Inheritance cycle: always line 0
Other inheritance type problem: inherited type identifier location
self or SELF_TYPE used in wrong place: self (resp. SELF_TYPE) identifier (resp. type) location
Redefining a feature: (second) feature location
Redefining a formal or class: (second) identifier location
Other attribute problems: attribute location
Redefining a method and changing types: (second) type location
Other problems with redefining a method: method location
Method body type does not conform: method name identifier location
Attribute initializer does not conform: attribute name identifier location
Errors with types of arguments to relational/arithmetic operations: location of relational/arithmetic operation expression
Errors with types of while / if subexpression(s): location of (enclosing) while or if expression (not the location of the conditional expression)
Errors with case expression (e.g., lub): location of case expression
Errors with conformance in let: location of let expression (not location of initializer)
Errors in blocks: location of (beginning of) block expression
Errors in actual arguments: location of method invocation expression (not the location of any particular actual argument)
Assignment does not conform: assignment expression location (not right-hand-side location)
Unknown identifier: location of identifier
Unknown method: location of method name identifier
Unknown type: location of type

Remember that you do not have to match the English prose of the reference compiler's error messages at all. You just have to get the line number right.

Semantic checks are unordered — if a program contains two or more errors, you may indicate whichever you like. You can infer from this that all of our test cases will contain at most one error.

The `.cl-type` File Format

If there are no errors in file.cl-ast your program should create file.cl-type and serialize the class map, implementation map, parent map, and annotated AST to it. The class and implementation maps are described in the Cool Reference Manual.

A .cl-type file consists of four sections:

The class map.
The implementation map.
The parent map.
The annotated AST.

Simply output the four sections in order, one after the other.

We will now describe exactly what to output for the class and implementation maps. The general idea and notation (one string per line, recursive descent) are the same as in PA3.

The Class Map
- Output class_map \n.
- Output the number of classes and then \n.
- Output each class in turn (in ascending alphabetical order):
  - Output the name of the class and then \n.
  - Output the number of attributes and then \n.
  - Output each attribute in turn (in order of appearance, with inherited attributes from a superclass coming first):
    - Output no_initializer \n and then the attribute name \n and then the type name \n.
    - or Output initializer \n and then the attribute name \n and then the type name \n and then the initializer expression.
The Implementation Map
- Output implementation_map \n.
- Output the number of classes and then \n.
- Output each class in turn (in ascending alphabetical order):
  - Output the name of the class and then \n.
  - Output the number of methods for that class and then \n.
  - Output each method in turn (in order of apperance, with inherited methods from a superclass coming first):
    - Output the method name and then \n.
    - Output the number of formals and then \n.
    - Output each formal's name only:
      - Output the name and then \n
    - If this method is inherited from a parent class and not overriden, output the name of the ultimate parent class that defined the method body expression and then \n. Otherwise, output the name of the current class and then \n.
    - Output the method body expression.
The Parent Map
- Output parent_map \n.
- Output the number of parent-child inheritance relations and then \n. This number is equal to the number of classes minus one (since Object has no parent).
- Output each child class in turn (in ascending alphabetical order):
  - Output the name of the child class and then \n.
  - Output the name of the child class's parent and then \n.
The Annotated AST
- With two exceptions, the annotated AST format is identical to the normal AST from PA3.
- The first change involves expressions. To output an Expression:
  1. Output the line number of the expression and then a newline (as in PA3).
  2. Output the name of type associated with the expression and then a newline. For example, the expression 3+x is associated with the type Int. This is new to PA4. It should be done for PA4, but not for PA4c.
  3. Output the name of the expression and then a newline and then any subparts (as in PA3).
- The second change is a new kind of expression, internal, used to represent the bodies of predefined methods. Internal expressions are those that are handled by the run-time system -- you might think of them as part of the standard library. You output Internal Expressions as follows:
  - 0 \n internal \n Class.method \n
  The valid kinds of internal expressions (i.e., the values for Class.method) are:
  - Object.abort Object.type_name Object.copy IO.out_string IO.out_int IO.in_string IO.in_int String.length String.concat String.substr
  They are defined here (for Object), here (for IO), and here (for String).
Note that you must output information about all classes and methods defined in the program as well as all base classes (and their methods). Do not just print out "classes actually used" or "methods actually called" or something like that. Output all classes and methods -- no optimizations or shortcuts!

That's it for the output specification. Here's some example input:

class Main inherits IO {
  my_attribute : Int <- 5 ; 
  main() : Object { 
    out_string("Hello, world.\n") 
  } ;
} ;

Example .cl-type class map output -- with comments:

class_map 6 number of classes Bool note: includes predefined base classes 0 IO 0 Int 0 Main 1 our Main has 1 attribute initializer my_attribute named "my_attribute" Int with type Int 2 -- initializer expression line number Int -- initializer expression type (see above: this is an expression annotated with a type) -- do not emit types for the PA4c Checkpoint! integer -- initializer expression kind 5 -- which interger constant is it? Object 0 String 0

Example .cl-type implementation map output -- with comments:

implementation_map 6 six classes Bool first is Bool 3 - it has three methods abort - first is abort() 0 -- abort has 0 formal arguments Object -- name of parent class from which Bool inherits abort() 0 -- abort's body expression starts o line 0 Object -- abort's body expression has type Object internal -- abort's body is an internal kind of expression (i.e., a system call; see above) Object.abort -- extra detail on abort's body expression copy - second of Bool's three methods is copy() 0 -- copy has 0 formal arguments Object -- name of parent class from which Bool inherits copy() 0 -- copy's body expression starts on line 0 SELF_TYPE -- copy's body expression has type SELF_TYPE internal -- copy's body is an internal kind of expression (i.e., a system call; see above) Object.copy -- extra detail on copy's body expression ... many lines skipped Main another class is Main 8 - it has 8 methods ... many lines skipped ... main - one of Main's methods is main() 0 -- main has 0 formal arguments Main -- the name of the class where Main.main() is defined 4 -- the body expression of Main.main starts on line 4 SELF_TYPE -- the body expression of Main.main has type SELF_TYPE self_dispatch -- the body of Main.main() is a self_dispatch kind of expression ... many lines skipped ...

Example .cl-type parent map output -- with comments:

parent_map 5 there are five classes with parents (Object is the sixth class) Bool Bool's parent ... Object ... is Object. IO IO's parent ... Object ... is Object. Int Int's parent ... Object ... is Object. Main Main's parent ... IO ... is IO. String String's parent. .. Object ... is Object.

Writing the rote code to output a .cl-type text file given an AST may take a bit of time but it should not be difficult; our reference implementation does it in 35 lines and cleaves closely to the structure given above. Reading in the AST is similarly straightforward; our reference implementation does it in 171 lines.

Commentary

You can do basic testing as follows:

$ cool --parse file.cl  
$ cool --out reference --type file.cl 
$ my-checker file.cl-ast
$ diff -b -B -E -w file.cl-type reference.cl-type

When you attempt to submit your PA4c or PA4 assignment, the server will automatically run your program on some held-out test cases and report the results.

You should implement all of the typing rules in the Reference Manual. There are also a number of other rules and corner cases you have to check (e.g., no class can inherit from Int, you cannot redefine a class, you cannot have an attribute named self, etc.). They are sprinkled throughout the manual. Check everything you possibly can.

Because PA4 involves spitting out parts of the AST in a similar format to PA3, there is a slight synergy to doing PA3 and PA4 in the same language. It's not major, though, and the vast majority of your effort in PA4 will be centered on type checking (rather than output formatting).

What To Turn In For PA4c

PA4c is a checkpoint for PA4. The typechecker is a large project (and a large part of your grade), so it behooves you to start it early.

For PA4c you should turn in (electronically) an early version of PA4 that does the following:

Reads in the .cl-ast file given as a command-line argument.
- You do not need to use a parser generator to read in the .cl-ast file -- its format was specifically chosen to make it easy to read with just some mutually-recursive procedures. It should take you (much) less than 150 lines to read in the .cl-ast file.
Does every bit of typechecking and semantic analysis possible without typechecking expressions.
- This you should not annotate types in initializer expressions in the class map.
Prints out error messages as normal.
Outputs only the class map to .cl-type if there are no errors.
- You can use the --class-map command-line argument to get the reference compiler to spit out the class map after typechecking (for comparison).

Thus you should build the class hierarchy and check everything related to that. For example:

Check to see if a class inherits from Int (etc.).
Check to see if a class inherits from an undeclared class.
Check for cycles in the class hierarchy.
Check for duplicate method or attribute definitions in the same class.
Check for a child class that redefines a parent method but changes the parameters.
Check for a missing method main in class Main.
Check for self and SELF_TYPE mistakes in classes and methods.
This list is not exhaustive -- read the Cool Reference Manual carefully and find everything you might check for without typechecking expressions.
Basically, you'll look at classes, methods and attibutes (but not method bodies).

You must turn in a zip file containing these files:

source_files -- your implementation, including
- main.rb or
- main.py or
- main.ml

Your zip file may also contain:

team.txt -- an optional file listing your other team member (see below -- if you are not working in a team, do not include this file)
This is the only written assigment that allows explicit teams.

Submit the file electronically (as with PA1-PA3).

Q: What's the exact list of errors I have to check for in PA4c?
A: No such list is provided! Part of the assignment is thinking up all possible checks that do not involve expressions.

What To Turn In For PA4

You must turn in a zip file containing these files:

readme.txt -- your README file
good.cl -- a positive testcase
bad1.cl -- a negative testcase
bad2.cl -- a negative testcase
bad3.cl -- a negative testcase
source_files -- including
- main.rb or
- main.py or
- main.ml

Your zip file may also contain:

team.txt -- an optional file listing your other team member (see below -- if you are not working in a team, do not include this file)

Submit the file as you did for PA1.

Working In Pairs

You may complete this project in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.

If you are working in a team, exactly one team member should submit a PA4 zipfile. That submission should include the file team.txt, a one-line flat ASCII text file that contains exactly and only the email address of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-pa4.zip with a team.txt file that contains the word wrw6y. Then ph4u and wrw6y will both receive the same grade for that submission.

Autograding

We will use scripts to run your program on various testcases. The testcases will come from the good.cl and bad.cl files you and your classsmates submit as well as held-out testcases used only for grading. Your programs cannot use any special libraries (aside from the OCaml unix and str libraries, which are not necessary for this assignment). We will use (loosely) the following commands to execute them: ocamlc unix.cma str.cma *.ml ; ./a.out testcase.cl-ast >& testcase.out python main.py testcase.cl-ast >& testcase.out ruby main.rb testcase.cl-ast >& testcase.out You may thus have as many source files as you like (although two or three should suffice) -- they will be passed to your language compiler in alphabetical order (if it matters).

In each case we will then compare your output to the correct answer:

diff -b -B -E -w testcase.cl-type correct-answer.cl-type

If your answer is not the same as the reference answer you get 0 points for that testcase. Otherwise you get 1 point for that testcase.

For error messages and negative testcases we will compare your output but not the particular error message. Basically, your semantic analyzer need only correctly identify that there is an error on line X. You do not have to faithfully duplicate our English error messages. Many people choose to (because it makes testing easier) -- but it's not required.

We will perform the autograding on some unspecified test system. It is likely to be Solaris/UltraSPARC, Cygwin/x86 or Linux/x86. However, your submissions must officialy be platform-independent (not that hard with a scripting language). You cannot depend on running on any particular platform.

There is more to your grade than autograder results. See the Programming Assignment page for a point breakdown.

Your submission may not create any temporary files. Your submission may not read or write any files beyond its input and output. We may test your submission in a special "jail" or "sandbox".

class_map
6	number of classes
Bool	note: includes predefined base classes
0
IO
0
Int
0
Main
1	our Main has 1 attribute
initializer
my_attribute	named "my_attribute"
Int	with type Int
2	-- initializer expression line number
Int	-- initializer expression type (see above: this is an expression annotated with a type) -- do not emit types for the PA4c Checkpoint!
integer	-- initializer expression kind
5	-- which interger constant is it?
Object
0
String
0

implementation_map
6	six classes
Bool	first is Bool
3	- it has three methods
abort	- first is abort()
0	-- abort has 0 formal arguments
Object	-- name of parent class from which Bool inherits abort()
0	-- abort's body expression starts o line 0
Object	-- abort's body expression has type Object
internal	-- abort's body is an internal kind of expression (i.e., a system call; see above)
Object.abort	-- extra detail on abort's body expression
copy	- second of Bool's three methods is copy()
0	-- copy has 0 formal arguments
Object	-- name of parent class from which Bool inherits copy()
0	-- copy's body expression starts on line 0
SELF_TYPE	-- copy's body expression has type SELF_TYPE
internal	-- copy's body is an internal kind of expression (i.e., a system call; see above)
Object.copy	-- extra detail on copy's body expression
... many lines skipped
Main	another class is Main
8	- it has 8 methods
... many lines skipped ...
main	- one of Main's methods is main()
0	-- main has 0 formal arguments
Main	-- the name of the class where Main.main() is defined
4	-- the body expression of Main.main starts on line 4
SELF_TYPE	-- the body expression of Main.main has type SELF_TYPE
self_dispatch	-- the body of Main.main() is a self_dispatch kind of expression
... many lines skipped ...

parent_map
5	there are five classes with parents (Object is the sixth class)
Bool	Bool's parent ...
Object	... is Object.
IO	IO's parent ...
Object	... is Object.
Int	Int's parent ...
Object	... is Object.
Main	Main's parent ...
IO	... is IO.
String	String's parent. ..
Object	... is Object.