CS301 Fall 1998

Project Part #3: The Semantic Analyzer

Out: Tuesday November 24
Due: Thursday, December 10

 

We move on to the final part of the project, namely the semantic analyzer for C-. As usual, the directions below are meant to help you.

  1. Create a new directory to include your Semantic Analyzer. Copy over your files from the Parser directory into this new directory, since you will be using them to continue the construction of your parser. (If you would rather use my sample solutions instead, let me know and will email them to you.) Also, copy any files you may wish to use from TINY's Analyzer, in case you would like to use them as templates. Modify the makefile to help you with compilations.
  2. Meet with your teammate to discuss the structure of your program. The discussion we had in class on the Analyzer and the notes at the end of this handout should be helpful here.
  3. In the same meeting, discuss with your teammate whether you would like to work together through the implementation of this phase or if you would rather split the implementation into two pieces of roughly equal length and difficulty. This negotiation you should carry seriously and frankly. It is important that both of you understand what you agreed upon and that you feel it is fair.
  4. Implement and test your code thoroughly. Write sample C- code to test your parser on. Create files that catch every semantic error you can think of. For testing the Analyzer, the C- programs need to be semantically correct. This is the end of the front-end of a compiler, only the code generation remains. Remember that the C- source code file should have the .cm extension.
  5. Make sure you have appropriate comments in your files including header comments describing the name(s) of the file creator/modifier, date of modifications and summary of the file contents and use. When needed, have in-line comments to explain elaborate coding that your instructor or your colleagues might have troubles understand.
  6. Hand in the implementaton decision paragraph and printouts of your code and your tests. You should have traceScan and traceParse set to FALSE, but echoSource and traceAnalyze set to TRUE.
  7. Party! (once more!)

Happy programming!

Recommendations and Comments about the C-Minus Semantic Analyzer

I. Augmenting to the syntax tree.

There are three new fields necessary in the syntax tree:

int nestLevel;

TreeNode * defptr;

CMType datatype;

We describe the use of each of these three fields.

I.1. The nestLevel field.

The nestLevel field is needed to separate nested scopes. In the semantic analyzer, this will be used to check whether a redeclaration of an identifier occurs within the same scope, since this is a semantic error:

void f(int x)
{ int y;
int y[10]; /* error! */
}

Redeclarations across scopes do not represent errors:

void f(int x)
{ int y;
{ int y[10]; /* no error here */
}
}

You can check this by recording the nestLevel of each declaration. In the first example above f is at nestLevel 0, x is at nestLevel 1 and y is at nestLevel 2. In the second example, the second y is at nestLevel 3. This double increment of nestLevel is due to a quirk of C that wants parameters to be viewed as being at a different nestLevel than the function name and the local variables. For example, the following code is not an error in C:

void f(int x)
{ int x; /* no error here */
}

Thus, you should increment nestLevel twice when entering a function: once before processing the parameters, and once before processing the body block. Then you can use nestLevel alone to determine a redeclaration error.The best way to record the nestLevel is to use a static external variable within the analyze.c file, initialized to 0, and incremented/decremented accordingly.

I.2. The defptr field.

This field is used for any tree node that stores an identifier name. It points to the definition/declaration of the function/parameter/variable that it refers to. This allows type checking to occur after the symbol table has been disposed of.

I.3. The datatype field.

This field records the type of the node during type checking. In C-Minus there are essentially five types to be checked, and we can define these in an enum declaration:

Typedef enum {IntType, ArrayType, IntFuncType, VoidFuncType, VoidType } CMType;

All DefNodes and ExpNodes in the syntax tree will have types. No StmtNode will have type. Type errors and call nodes to functions of type VoidFuncType will both have VoidType.

 

II. The structure of the semantic analyzer.

I recommend that you use the same basic structure as with the TINY semantic analyzer, namely, that you use the traverse function for your basic tree traversal, and that you build the symbol table in the first pass, and perform type checking in the second pass. Thus, the nestLevel and defptr fields will get filled in during the symbol table pass, while the datatype fields will get filled in during the type checking pass.

The preorder and postorder functions will be somewhat more complicated in the C-Minus case, and you may want to write separate procedures for defnodes, stmtnodes, and expnodes, which can be called by these functions. Though the symbol table pass is essentially a preorder traversal, there are a couple of operations that need to be performed in postorder: calling exitScope() and decrementing nestLevel when exiting a block or function definition. Thus, postProc, though simple, is not the nullProc.

 

III. The symbol table.

I propose keeping the same basic structure as in TINY, except of course that provision for nested scopes must be added. I suggest you add pointers to separate tables to the internal structure of the table, as shown in Figure 6.17 on page 305 of the text. You create a new table for a new scope using a new symbol table primitive enterScope(), and you delete the new table using a new primitive exitScope(). There should be no other new function declarations necessary in symtab.h. Also, the printTable() primitive should be removed (see the comments on tracing information below). With this design, the symbol table can remain hidden from the rest of the compiler, which is a good feature.

One essential question: what should the symbol table store? The most appropriate and versatile way is to store the TreeNode * value of the nodes containing the names in the table. These node will contain all the important information, like the nestLevel, the lineno, and even the name string itself.

There is one additional problem, namely the existence of the predefined procedures input() and output() in C-Minus. The way to handle these is to build syntax tree nodes for them by hand, and insert them into the symbol table as part of the symbol table initialization step.

 

IV. Type checking in C-Minus.

There is one place where postorder constructions of types does not quite work: in recursive functions. Before entering a function body, the CMType of the function must be filled in, in case it is called within its body. This can be accomplished by using a very simple preorder function (but not nullProc) as a parameter to the traverse function.

There is one other preorder operation that needs to be performed during the type checking pass: assigning the currentFunc pointer.

V. Printing out tracing information to the listing file.

This is a topic we did not discuss in class. What should our semantic analyzer print if traceAnalyze is set to TRUE? The TINY compiler just calls printTable(), which we have said should not exist for C-Minus. Here is my recommendation: