Programming Assignment 4 - The Semantic Analyzer
Project Overview
Programming assignments 2 through 5 will direct you to design and build an
interpreter for Cool. Each assignment will cover one component of the
interpreter: lexical analysis, parsing, semantic analysis, and
operational semantics. Each assignment will ultimately result in a working
interpreter phase which can interface with the other phases.
You may do this assignment in OCaml, Python or Ruby. You must use each
language at least once (over the course of PA2 - PA5); you will use one
language (presumably your favorite) twice.
You may work in a team of two people for this assignment. You may work in a
team for any or all subsequent programming assignments. You do not need to
keep the same teammate. The course staff are not responsible for finding
you a willing teammate. However, you must still satisfy the language
breadth requirement (i.e., you must be graded on at least one OCaml
program, at least one Ruby program, and at least one Python program).
Goal
For this assignment you will write a semantic analyzer. Among other
things, this involves traversing the abstract syntax tree and the class
hierarchy. You will reject all Cool programs that do not comply with the
Cool type system.
You will also write additional code to unserialize the AST produced by
the parser stage and to serialize the class map and implementation map
produced by your semantic analysis.
The Specification
You must create three artifacts:
- A program that takes a single command-line argument (e.g.,
file.cl-ast). That argument will be an ASCII text
Cool abstract syntax tree file (as described in PA3).
Your program must either indicate that there is an error in the
input (e.g., a type error) or emit file.cl-type, a serialized Cool
abstract syntax tree, class map, and implementation map.
If your program is called checker, invoking checker
file.cl-ast should yield the same output as cool --type
file.cl. Your program will consist of a number of OCaml files, a
number of Python files, or a number of Ruby files.
- A plain ASCII text file called readme.txt describing your
design decisions and choice of test cases. See the grading rubric. A few
paragraphs should suffice.
- Testcases good.cl, bad1.cl, bad2.cl and
bad3.cl. The first should pass the semantic analysis stage. The
remaining three should yield semantic analysis errors.
Error Reporting
To report an error, write the string ERROR: line_number:
Type-Check: message to standard output and terminate the
program. You may
write whatever you want in the message, but it should be fairly indicative.
Example erroneous input:
class Main inherits IO {
main() : Object {
out_string("Hello, world.\n" + 16777216) -- adding string + int !?
} ;
} ;
Example error report output:
ERROR: 3: Type-Check: arithmetic on String Int instead of Ints
Line Number Error
Reporting
The typing rules do not directly specify the line numbers on which errors
are to be reported. As of v1.11, the Cool reference interpreter uses these
guidelines (possibly surprising ones are italicized):
- Errors related to parameter-less method main in class
Main: always line 0
- Inheritance cycle: always line 0
- Other inheritance type problem: inherited type identifier location
- self or SELF_TYPE used in wrong place: self
(resp. SELF_TYPE) identifier (resp. type) location
- Redefining a feature: (second) feature location
- Redefining a formal or class: (second) identifier location
- Other attribute problems: attribute location
- Redefining a method and changing types: (second) type location
- Other problems with redefining a method: method location
- Method body type does not conform: method name identifier
location
- Attribute initializer does not conform: attribute name
identifier location
- Errors with types of arguments to relational/arithmetic operations:
location of relational/arithmetic operation expression
- Errors with types of while / if subexpression(s):
location of (enclosing) while or if expression
(not the location of the conditional expression)
- Errors with case expression (e.g., lub): location of
case expression
- Errors with conformance in let: location of let
expression (not location of initializer)
- Errors in blocks: location of (beginning of) block expression
- Errors in actual arguments: location of method invocation expression
(not the location of any particular actual argument)
- Assignment does not comform: assignment expression location
(not right-hand-side location)
- Unknown identifier: location of identifier
- Unknown method: location of method name identifier
- Unknown type: location of type
Remember that you do not have to match the English prose of the reference
interpreter's error messages at all. You just have to get the line number
right.
The .cl-type File Format
If there are no errors in file.cl-ast your program should create
file.cl-type and serialize the abstract syntax tree, class map and
implementation map to it. The AST format is exactly the same as it was in PA3. The class
and implementation maps are described in the Cool Reference Manual.
A .cl-type file consists of three sections:
- The class map.
- The implementation map.
- The AST (just copy it from your input).
Simply output the three sections in order, one after the other.
We will now describe exactly what to output for the class and
implementation maps. The general idea and notation (one string per line,
recursive descent) are the same as in PA3.
- The Class Map
- Output class_map \n.
- Output the number of classes and then \n.
- Output each class in turn (in ascending alphabetical order):
- Output the name of the class and then \n.
- Output the number of attributes and then \n.
- Output each attribute in turn (in order of appearance, with
inherited attributes from a superclass coming first):
- Output no_initializer \n and then the attribute name
\n and then the type name \n.
- or Output initializer \n and then the
attribute name \n and then the type name \n and then the
initializer expression.
- The Implementation Map
- Output implementation_map \n.
- Output the number of classes and then \n.
- Output each class in turn (in ascending alphabetical order):
- Output the name of the class and then \n.
- Output the number of methods for that class and then \n.
- Output each method in turn (in ascending alphabetical order):
- Output the method name and then \n.
- Output the number of formals and then \n.
- Output each formal's name only:
- Output the name and then \n
- Output the method body expression.
- New Expression Kind:
We add a new kind of expression, internal, for internal method
calls. Internal method calls are those that are handled by the run-time
interpreter -- you might think of them as part of the standard library.
You output internal expression as follows:
0 \n internal \n Class.method \n
The valid kinds of internal expressions (i.e., the values for
Class.method) are:
- Object.abort Object.type_name Object.copy IO.out_string
IO.out_int IO.in_string IO.in_int String.length
String.concat String.substr
They are defined
here (for Object),
here
(for IO),
and
here
(for String).
That's it for the output specification. Here's some example input:
class Main inherits IO {
my_attribute : Int <- 5 ;
main() : Object {
out_string("Hello, world.\n") -- adding string + int !?
} ;
} ;
Example .cl-type class map output -- with comments:
class_map
6 -- number of classes
Bool -- note: includes predefined base classes
0
IO
0
Int
0
Main
1 -- our Main has 1 attribute
initializer
my_attribute -- named "my_attribute"
Int -- with type Int
2 -- initializer expression line number
integer -- initializer expression kind
5 -- what integer constant is it?
Object
0
String
0
Example .cl-type implementation map output -- with
comments:
implementation_map
6 -- 6 classes
Bool -- first is Bool
3 -- it has 3 methods
abort -- first is abort
0 -- abort has 0 formal arguments
0 -- abort is "defined" on line 0
internal -- kind of body expression
Object.abort -- what kind of internal method is it?
copy -- second is copy
0 -- copy has 0 formal arguments
0 -- copy is "defined" on line 0
internal -- kind of body expression
Object.copy -- what kind of internal method is it?
-- many lines skipped
Main -- another class is Main
8 -- it has 8 methods
-- many lines skipped
main -- one method is "main"
0 -- it has 0 formal arguments
4 -- the body expression starts on line 4
self_dispatch -- the body is a self_dispatch
-- many lines skipped
Writing the rote code to output a .cl-type text file given an AST
may take a bit of time but it should not be difficult; our reference
implementation does it in 35 lines and cleaves closely to the structure
given above. Reading in the AST is similarly straightforward; our reference
implementation does it in 171 lines.
Commentary
You can do basic testing as follows:
You should implement all
of the typing rules in the Reference Manual. There are also a number of
other rules and corner cases you have to check (e.g., no class can inherit
from Int, you cannot redefine a class, you cannot have an attribute named
self, etc.). They are sprinkled throughout the manual. Check everything you
possibly can.
Because PA4 involves spitting out parts of the AST in the same format as
PA3, there is a slight synergy to doing PA3 and PA4 in the same language.
It's not major, though, and the vast majority of your effort in PA4 will be
centered on type checking (rather than output formatting).
What To Turn In For PA4
You must turn in a zip file containing these files:
- readme.txt -- your README file
- good.cl -- a positive testcase
- bad1.cl -- a negative testcase
- bad2.cl -- a negative testcase
- bad3.cl -- a negative testcase
- source_files -- your implementation
Your zip file may also contain:
- team.txt -- an optional file listing your other team member
(see below -- if you are not working in a team, do not include this file)
Your zip file must be named your_email-pa4.zip. For
example, if your University email address is wrw6y you must
call your zip file wrw6y-pa4.zip. Do not use your gmail address or
whatnot -- we really want your university ID here.
Submit the file using Toolkit (as with PA1-PA3).
What To Turn In For WA4
Written Assignment 4 (WA4) is a milestone for PA4. The typechecker is a
large project (and a large part of your grade), so it behooves you to start
it early.
For WA4 you should turn in (electronically, via Toolkit) an early version
of PA4 that does the following:
- Reads in the .cl-ast file given as a command-line argument.
- You do not need to use a parser generator to read in the
.cl-ast file -- its format was specifically chosen to make it
easy to read with just some mutually-recursive procedures. It should
take you (much) less than 150 lines to read in the .cl-ast
file.
- Does every bit of typechecking and semantic analysis possible
without typechecking expressions.
- Prints out error messages as normal.
- Outputs only the class map to .cl-type if there are
no errors.
- You can use the --class-map command-line argument to
get the reference interpreter to spit out the class map after
typechecking (for comparison).
Thus you should build the class hierarchy and check everything related to
that. For example:
- Check to see if a class inherits from Int (etc.).
- Check to see if a class inherits from an undeclared class.
- Check for cycles in the class hierarchy.
- Check for duplicate method or attribute definitions in the same class.
- Check for a child class that redefines a parent method but changes
the parameters.
- Check for a missing method main in class Main.
- Check for self and SELF_TYPE mistakes in classes
and methods.
- This list is not exhaustive -- read the Cool Reference Manual
carefully and find everything you might check for without typechecking
expressions.
- Basically, you'll look at classes, methods and attibutes (but not
method bodies).
Despite the fact that this is an electronic submission of a program, it
counts only as a written assignment for grading purposes.
You must turn in a zip file containing these files:
- source_files -- your implementation
Your zip file may also contain:
- team.txt -- an optional file listing your other team member
(see below -- if you are not working in a team, do not include this file)
- This is the only written assigment that allows explicit teams.
You should electronically submit your file by Friday at 11:59pm.
Working In Pairs
You may complete this project in a team of two. Teamwork imposes burdens
of communication and coordination, but has the benefits of more thoughtful
designs and cleaner programs. Team programming is also the norm in the
professional world.
Students on a team are expected to participate equally in the effort and to
be thoroughly familiar with all aspects of the joint work. Both members
bear full responsibility for the completion of assignments. Partners turn
in one solution for each programming assignment; each member receives the
same grade for the assignment. If a partnership is not going well, the
teaching assistants will help to negotiate new partnerships. Teams may not
be dissolved in the middle of an assignment.
If you are working in a team, exactly one team member should submit
a PA4 zipfile. That submission should include the file team.txt, a
one-line flat ASCII text file that contains exactly and only the
email address of your teammate. Don't include the @virgnia.edu
bit. Example: If ph4u and wrw6y are working together,
ph4u would submit ph4u-pa4.zip with a team.txt
file that contains the word wrw6y. Then ph4u and
wrw6y will both receive the same grade for that submission.
Autograding
We will use scripts to run your program on various testcases. The testcases
will come from the good.cl and bad.cl files you and your
classsmates submit as well as held-out testcases used only for grading.
Your programs cannot use any special libraries (aside from the OCaml
unix and str libraries, which are not necessary for this
assignment). We will use (loosely) the following commands to execute them:
- ocaml unix.cma str.cma *.ml testcase.cl-ast >& testcase.out
- python main.py testcase.cl-ast >& testcase.out
- ruby main.rb testcase.cl-ast >& testcase.out
You may thus have as many source files as you like (although two or three
should suffice) -- they will be passed to your
language interpreter in alphabetical order (if it matters).
In each case we will then compare your output to the correct answer:
- diff -b -B -E -w testcase.cl-type correct-answer.cl-type
If your answer is not the same as the reference answer you get 0
points for that testcase. Otherwise you get 1 point for that testcase.
For error messages and negative testcases we will compare your output
but not the particular error message. Basically, your semantic
analyzer need only correctly identify that there is an error on line X. You
do not have to faithfully duplicate our English error messages. Many people
choose to (because it makes testing easier) -- but it's not required.
We will perform the autograding on some unspecified test system. It is
likely to be Solaris/UltraSPARC, Cygwin/x86 or Linux/x86. However, your
submissions must officialy be platform-independent (not that hard
with a scripting language). You cannot depend on running on any particular
platform.
There is more to your grade than autograder results. See the Programming
Assignment page for a point breakdown.