Hierarchies

Last edit: Wednesday, November 12, 1997 05:42 PM

Consider the problem of defining a signature for priority queues, which support a remove_min operation that removes the "least" element of the queue. Clearly priority queues make sense only for ordered types, those for which we have a notion of "less than" with which to define "least" element of the queue. Since the element type does not determine the appropriate notion of comparison between elements, the compiler cannot be expected to guess which ordering we mean. We must instead specify the ordering together with the type. This is achieved using a substructure, a structure-within-structure, as follows:

signature PRIO_QUEUE = sig
structure Elt : ORDERED
type prio_queue
exception Empty
val empty : prio_queue
val insert : Elt.t * prio_queue -> prio_queue
val remove : prio_queue -> Elt.t * prio_queue
end

Notice that the type prio_queue is a type, and not a type constructor (as it was in the case of "plain" queues). This is a reflection of the fact that the operations on a priority queue are not independent of the type of elements (as they are with "plain" queues), but rely on the comparison operations that are provided with the Elt structure.

To build, say, a priority queue containing integers ordered by the ordinary "less than" relation, we might use a declaration such as the following:

signature INT_PRIO_QUEUE = PRIO_QUEUE where type Elt.t = int

structure IntPQ :> INT_PRIO_QUEUE = struct
structure Elt = IntLT
type prio_queue = ... representation type ...
exception Empty
val empty = ...
val insert = ...
val remove = ...
end

Notice that we must indicate the type of elements in the signature of the priority queue structure; otherwise the queue is useless since there would be no way to produce values of the element type. We assume that IntLT is the name of the structure with signature ORD that orders integers by the ordinary "less than" relation.

To build a priority queue containing strings, we proceed similarly:

signature STRING_PRIO_QUEUE = PRIO_QUEUE where type Elt.t = string

structure IntPQ :> INT_PRIO_QUEUE = struct
structure Elt = StringLT
type prio_queue = ... representation type ...
exception Empty
val empty = ...
val insert = ...
val remove = ...
end

We assume that StringLT is the name of a structure with signature ORD that orders the type string by the substring relation.

An important point about these examples is that the signature is self-contained in the sense that the specifications make no use of any identifiers (other than the built-in primitives of ML) not declared in the signature itself. In particular, the signature contains within it the specification of the element type and the operations on it. This is an important principle of modularity: the signature of a component should make sense independently of any other component of a system. Of course it is difficult to make hard and fast rules, but this signature closure rule is a useful rule-of-thumb for structuring a system.

It is quite obvious from the preceding examples that it is burdensome to define several priority queues with different element types (or the same types but with different comparisons). The examples suggest that we must repeat the code for the priority queue for each use, possibly making transcription errors or propagating an error to each use. This is not so, but the mechanisms to define a generic implementation of priority queues will only be introduced in the next chapter.

Another, more pedestrian, use of substructures in ML is to organize the components of a large system into a hierarchy of components, culminating in the "main" structure of the system. This may seem like a very straightforward application of the substructure mechanism, but we must watch out for a subtle problem, called the diamond import problem, that arises when combining hierarchies and abstraction. Consider the following sketch of the structure of a component of a compiler:

signature SYMBOL = sig
type symbol
val create : string -> symbol
...
end

signature LEXER = sig
structure Symbol : SYMBOL
type token
val lex : char stream -> token stream
...
end

signature SYMTAB = sig
structure Symbol : SYMBOL
type symtab
val lookup : symtab * Symbol.symbol -> bool
...
end

signature ABSSYN = sig
structure Symbol : SYMBOL
datatype ast = Identifier of Symbol.symbol | ...
...
end

signature PARSER = sig
structure Lexer : LEXER
structure SymTab : SYMTAB
structure AbsSyn : ABSSYN
val parse : Lexer.Symbol.symbol stream -> AbsSyn.ast stream
...
end

The idea is that the lexer, the symbol table, and the abstract syntax all rely on the notion of a symbol. In order for the signatures of these modules to be self-contained, it is necessary that they each include as a component a substructure that provides the implementation of symbols. The parser makes use of all three components: the lexer (to obtain a token), the symbol table (to record identifiers), and the abstract syntax (to emit an ast). The important point here is that the parser relies on the coherence of these three components of the compiler with respect to the implementation of symbols: in order for the symbols generated by the lexer to be acceptable as inputs to the symbol table and abstract syntax routines, they must all share a common notion of symbol. But how can we ensure that these components in fact share a common notion of symbol? Put another way, how can we avoid the error of using incompatible components in a large system?

To see how this is done, let us examine one possible implementation of these modules, as illustrated by the following code fragments:

structure Symbol : SYMBOL = struct
type symbol = ...
fun create s = ...
...
end

structure Lexer : LEXER = struct
structure Symbol = Symbol
type token
fun lex cs = ...
...
end

structure SymTab : SYMTAB = struct
structure Symbol = Symbol
type symtab = ...
fun lookup (s, st) = ...
...
end

structure AbsSyn : ABSSYN = struct
structure Symbol = Symbol
datatype ast = Id of Symbol.symbol | ...
...
end

structure Parser : PARSER = struct
structure Lexer : LEXER = Lexer
structure SymTab : SYMTAB = SymTab
structure AbsSyn : ABSSYN = AbsSyn
fun parse st = ... Lexer.lex ... SymTab.lookup ...
...
end

Notice that the Lexer, SymTab, and AbsSyn modules are all constructed from a common Symbol structure, and hence they may be used by the parser in the manner outlined above. It is evident from the construction of the system that

Lexer.Symbol.symbol = SymTab.Symbol.symbol = Symbol.symbol,

which is all that is required for the proposed implementation to make sense. Notice that this holds true even if the Symbol module were to hold the type symbol abstract.

Now suppose that we decided to treat the Lexer, SymTab, and AbsSyn modules as abstractions by using opaque signature ascriptions to "hide" the identity of the type components of these modules using opaque signature ascription as follows:

structure Lexer :> LEXER = ...
structure SymTab :> SYMTAB = ...
structure AbsSyn :> ABSSYN = ...

This is good software engineering practice since the parser (for example) should not be sensitive to the implementation details of the Lexer, SymTab, or AbsSyn modules.

We now face a serious difficulty in the construction of the parser module. Specifically, it is no longer evident that the Lexer, SymTab, and AbsSyn modules are built on a common notion of symbol! By holding the type components of the modules abstract, we have obscured the fact that the types Lexer.Symbol.symbol, SymTab.Symbol.symbol, and AbsSyn.Symbol.symbol all coincide with the type Symbol.symbol, which is necessary if the parser code is to be type-correct.

To avoid obscuring the underlying coherence of these modules while at the same time enforcing abstraction we must explicitly propagate the requisite type sharing information as follows:

structure Lexer :> LEXER where type Symbol.symbol=Symbol.symbol = ...
structure AbsSyn :> ABSSYN where type Symbol.symbol=Symbol.symbol = ...
structure SymTab :> SYMTAB where type Symbol.symbol=Symbol.symbol = ...

No doubt this modification looks strange at first glance! Keep in mind that the left-hand side of the where type clause refers to the components of the signature being modified, whereas the right-hand side refers to the surrounding declaration of Symbol. The effect of these declarations is to make explicit in the signatures that the type Symbol.symbol included in each component is in fact the type Symbol.symbol defined at the top of the program.

This example exposes a fundamental tension between the desire to limit the propagation of type information between components of a system and the desire to combine these components into a coherent whole. The tension is resolved by explicitly propagating type equations into signatures using where type clauses to ensure that the appropriate type sharing properties are publically visible, even in the presence of an opaque signature ascription. In fact, transparent ascription is really just a use of opaque ascription in which all type information is propagated to the signature using where type clauses supplied by the compiler. Transparent ascription is often more convenient to use than opaque ascription, for precisely this reason, but it is important to keep in mind that the concision afforded by transparent ascription is bought at the expense of abstraction.

One bit of concision provided by the SML/NJ compiler does not come at the expense of abstraction. Rather than writing "where type Symbol.symbol = Symbol.symbol" in each of the signatures above, we may instead simply write "where Symbol = Symbol", where as before the left-hand side refers to the Symbol component of the modified signature, and the right-hand side refers to the outer declaration of Symbol. Unfortunately, this notation is not (yet) standard, so in the interest of portability, it is advisable to eschew it.

Let us now extend the compiler example a bit further, and imagine that the parser is itself included as a component of a yet larger module, say, the front-end of the compiler. In order to minimize the dependency of the front-end on the implementation details of the parser, it would make sense to use opaque ascription when constructing the parser as follows:

structure Parser :> PARSER = ...

But now notice that in doing so we have once again obscured the coherence of the consituents of the parser. In particular, it is no longer apparent that the types Parser.Lexer.Symbol.symbol and Parser.SymbTab.Symbol.symbol coincide. Worse, since the parser module does not itself include the Symbol module, it is not obvious how to specify the coherence of these components without tying them to the single global Symbol module. That is, we might change the specification for Lexer inside the the signature PARSER to

structure Lexer :>
LEXER where type Lexer.Symbol.symbol=Symbol.symbol = ...

But then we are making use of an external identifier in the signature PARSER, a violation of the signature closure principal. Put another way, we are artificially limiting any implementation of the PARSER signature to make use of the single, fixed Symbol module. This is overly restrictive, since all we are interested in expressing is the coherence of the components of the parser with respect to their notion of symbol; we need not pin them down to any specific implementation of symbols.

The solution to this dilemma in Standard ML is to impose an explicit coherence requirement, called a sharing specification, that ensures that the required type equations hold true of the underlying implementation of a collection of modules. Here we impose the requirement that the lexer, symbol table, and abstract syntax modules all share a common (though unspecified) notion of symbol so that they may be coherently combined by the parser. This is achieved as follows:

signature PARSER = sig
structure Lexer : LEXER
structure SymTab : SYMTAB
structure AbsSyn : ABSSYN
sharing type
Lexer.Symbol.symbol = SymTab.Symbol.symbol
= AbsSyn.Symbol.symbol
val parse : Lexer.Symbol.symbol stream -> AbsSyn.ast stream
...
end

The addition of the sharing specification that "ties together" the lexer and symbol table modules. This ensures that they make use of the same notion of symbol without having to reveal what that signature is. Now if Parser is any implemenation of the signature PARSER, we are assured (for example) that

Parser.Lexer.Symbol.symbol = Parser.SymTab.Symbol.symbol,

without having to expose the exact underlying implementation of symbols in these components.

We may achieve the same result using where type clauses:

signature PARSER = sig
structure Lexer : LEXER
structure SymTab :
SYMTAB where type Symbol.symbol = Lexer.Symbol.symbol
structure AbsSyn :
ABSSYN where type Symbol.symbol = Lexer.Symbol.symbol
val parse : Lexer.Symbol.symbol stream -> AbsSyn.ast stream
...
end

The idea is to pick a representative of the type equivalence class determined by the sharing specification (here, Lexer.Symbol.symbol) and to use where type to express the requirement that the type Symbol.symbol in each of the remaining components be equivalent to the representative.

Whether to use a sharing specification or a where type clause is largely a matter of taste. The advantage of the sharing specification is that it does not break the symmetry of the situation: we merely specify that the three types in question must coincide, without choosing one to serve as the "representative" of the equivalence class, as we are doing when we use a where type clause. In fact sharing specifications are re-written by the compiler into a series of where type clauses obtained by arbitrarily breaking the symmetry in the manner suggested here. As a rule of thumb it is preferable to have the compiler break the symmetry as a matter of internal representation of a signature, rather than to choose an asymmetric representation yourself. However, the decision is fundamentally a matter of taste.

[ Back ] [ Home ] [ Next ]