Type Inference FAQ

Introduction

This programming assignment, like the garbage collector, is to be completed in pairs with all the same rules and guidelines as before. This program may or may not be codewalked. Additionally, you must work with a different person on this assignment than you did on the last one. This is to ensure that no one is taking unfair advantage of anyone else and to help expose you to more ways of viewing the material. If you would like to be matched up with a partner, please let the ta staff know via email.

Part I: Generating Type Constraints

Following the 2003-11-05 notes, derive type constraints for this language:

   <expr> ::= <num>
            | true
            | false
            | {+ <expr> <expr>}
            | {- <expr> <expr>}
            | {* <expr> <expr>}
            | {iszero <expr>}
            | {bif <expr> <expr> <expr>}

            | <id>
            | {with {<id> <expr>} <expr>}
            | {rec {<id> <expr>} <expr>}
            | {fun {<id>} <expr>}
            | {<expr> <expr>}

            | tempty
            | {tcons <expr> <expr>}
            | {tempty? <expr>}
            | {tfirst <expr>}
            | {trest <expr>}

The only novelty of this language is that the list operations are now polymorphic; that is, you can create lists of values of any type.

Note: The right hand side of the rec binding does not have to be a syntactic function. However, you may assume that the rec-bound identifier only appears under a {fun ...} in the right hand side of the binding. In other words, the following expressions are legal:

     {rec {f {fun {x} {f x}}}
       ...}

     {rec {f {with {y 4}
               {fun {x} {f y}}}}
       ...}

while the following are not legal:

     {rec {f f}
       ...}

     {rec {f {+ 1 f}}
       ...}

Then, write a function which consumes an expression of this language, and returns a list of constraints (of the type defined in Part II).The correspondence between type constraints and the terms in Part II is as follows:

The constants are the base types: they should contain either the symbol 'number or 'boolean.
The variables correspond to the expressions and identifiers whose types you wish to learn. Thus, they should contain either an expression, or a symbol representing an identifier. You may assume all identifiers are bound exactly once; i.e., there are no unbound identifiers, and no name is ever bound more than once.
The constructors are the type constructors. They should contain either:
- the symbol '-> and a list of two arguments, or
- the symbol 'listof and a list of one argument.

In some cases, you may need to a fresh identifier when defining constraints. The Scheme function gensym returns a unique identifier on every call.

Part II: Unification

Implement the unification algorithm from the 2003-11-10 notes. The algorithm should work for a generic term representation, as defined below.

A term is either:

an constant, which contains a symbol,
a variable, which can contain any value, or
a constructor of the form C(t₁, ..., t_n), where C is a symbol and the t_i are a list of sub-terms.

In addition, you will need data types for representing a constraint (a pair of terms) and substitution (a variable and a term). The unification algorithm will consume a list of constraints (as defined in Part I) and produce either a list of substitutions or an error string.

Errors can arise from two situations: when the unification of two terms is impossible, or when the occurs check fails. In both cases, you should return a string with an appropriate error message.

Finally, when comparing variables for equality, use Scheme's built-in eq? function. For symbols, it behaves exactly as symbol=?; for other values, it compares them for identity (like Java's == comparison). We will rely on identical variables being deemed equivalent by eq? when solving the constraints generated in the following section.

Part III: Inferring Types

To infer the type of a program, first parse it, then generate constraints, and finally unify the constraints using the functions written for parts I & II as appropriate. The result will be a list of substitutions; by looking up the subsitution for the entire expression, you can access its type.

To implement this, your code needs to define a function, infer-type, which consumes a concrete representation of the program (as given above), and produces either an error string or a representation of the inferred type. Represent types concretely as:

   <type> ::= number
            | boolean
            | (listof <type>)
            | (<type> -> <type>)
            | <string>

where strings are used to represent type variables. For example, the type of length would be:

   ((listof "a") -> number)

Extra Credit

For a very small amount of extra credit, write a program in this language for which your algorithm infers the type ("a" -> "b"). You shouldn't attempt this problem until you've fully completed the assignment.

What Not To Do

You do not need to implement an interpreter for this language.

You do not need to implement let-based polymorphism.

FAQ

Nothing yet...