You may do this assignment in OCaml, Haskell, JavaScript, Python or Ruby. You must use a different language each time (over the course of PA2 - PA5).
You may work in a team of two people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate. However, you must still satisfy the language breadth requirement (i.e., you must be graded on a different language for each of PA2 - PA5).
Your program must either indicate that there is an error in the Cool program described by the cl-lex file (e.g., a parse error in the Cool file) or emit file.cl-ast, a serialized Cool abstract syntax tree. Your program's main parser component must be constructed by a parser generator. The "glue code" for processing command-line arguments, unserializing tokens and serializing the resulting abstract syntax tree should be written by hand. If your program is called parser, invoking parser file.cl-lex should yield the same output as cool --parse file.cl. Your program will consist of a number of OCaml files, a number of Python files, or a number of Ruby files.
(* Line 5 *) while x <= (* Line 6 *) 99 loop (* Line 7 *) x <- x + 1 (* Line 8 *) poolThe while expression is on line 5, the x <= 99 expression is on line 5, the 99 expression is on line 6, and the x <- x + 1 and x + 1 expressions are on line 7. The line numbers for tokens are present in the serialized token .cl-ast file.
Your parser is responsible for keeping track of the line numbers (both for the output syntax tree and for error reporting).
(* Line 70 *) class Cons inherits List + IO {
Example error report output:
ERROR: 70: Parser: syntax error near +
We will now describe exactly what to output for each kind of node. You can view this as specifying a set of mutually-recursive tree-walking functions. The notation "superclass:identifier" means "output the superclass using the rule (below) for outputting an identifier". The notation "\n" means "output a newline".
Example input:
(* Line 01 *) (* Line 02 *) (* Line 03 *) class List { (* Line 04 *) -- Define operations on lists. (* Line 05 *) (* Line 06 *) cons(i : Int) : List { (* Line 07 *) (new Cons).init(i, self) (* Line 08 *) }; (* Line 09 *) (* Line 10 *) };
Example .cl-ast output -- with comments.
1 -- number of classes 3 -- line number of class name identifier List -- class name identifier no_inherits -- does this class inherit? 1 -- number of features method -- what kind of feature? 6 -- line number of method name identifier cons -- method name identifier 1 -- number of formal parameters 6 -- line number of formal parameter identifier i -- formal parameter identifier 6 -- line number of formal parameter type identifier Int -- formal parameter type identifier 6 -- line number of return type identifier List -- return type identifier 7 -- line number of body expression dynamic_dispatch -- kind of body expression 7 -- line number of dispatch receiver expression new -- kind of dispatch receiver expression 7 -- line number of new-class identifier Cons -- new-class identifier 7 -- line number of dispatch method identifier init -- dispatch method identifier 2 -- number of arguments in dispatch 7 -- line number of first argument expression identifier -- kind of first argument expression 7 -- line number of the identifier i -- what is the identifier? 7 -- line nmber of second argument expression identifier -- kind of second argument expression 7 -- line number of the identifier self -- what is the identifier?
The .cl-ast format is quite verbose, but it is particularly easy for later stages (e.g., the type checker) to read in again without having to go through all of the trouble of "actually parsing". It will also make it particularly easy for you to notice where things are going awry if your parser is not producing the correct output.
Writing the rote code to output a .cl-ast text file given an AST may take a bit of time but it should not be difficult; our reference implementation does it in 116 lines and cleaves closely to the structure given above.
Haskell uses the Happy parser generator. You could also use a parser combinator libary. Happy is part of the Haskell Platform.
A JavaScript parser generator called jison is available. You must download it yourself.
A Ruby parser generator called ruby-yacc is available, but you must download it yourself. Another one, racc, is also available.
A Python parser analyzer generator called ply is available, but you must download it yourself.
All of these parser generators are derived from yacc (or bison), the original parser generator for C. Thus you may find it handy to refer to the Yacc paper or the Bison manual. When you're reading, mentally translate the C code references into the language of your choice.
$ cool --lex file.cl $ cool --out reference --parse file.cl $ my-parser file.cl-lex $ diff -b -B -E -w file.cl-ast reference.cl-ast
You may find the reference compiler's --unparse option useful for debugging your .cl-ast files.
Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.
If you are working in a team, exactly one team member should submit a PA3 zipfile. That submission should include the file team.txt, a one-line flat ASCII text file that contains the email address of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-pa3.zip with a team.txt file that contains the word wrw6y. Then ph4u and wrw6y will both receive the same grade for that submission.
In each case we will then compare your output to the correct answer:
If your answer is not the same as the reference answer you get 0
points for that testcase. Otherwise you get 1 point for that testcase.
For error messages and negative testcases we will compare your output but not the particular error message. Basically, your parser need only correctly identify that there is an error on line X. You do not have to faithfully duplicate our English error messages. Many people choose to (because it makes testing easier) -- but it's not required.
We will perform the autograding on some unspecified test system. It is likely to be Solaris/UltraSPARC, Cygwin/x86 or Linux/x86. However, your submissions must officialy be platform-independent (not that hard with a scripting language). You cannot depend on running on any particular platform.
There is more to your grade than autograder results. See the Programming Assignment page for a point breakdown.
Your submission may not create any temporary files. Your submission may not read or write any files beyond its input and output. We may test your submission in a special "jail" or "sandbox".