Consider the problem of defining a signature for priority queues, which support a remove_min
operation that removes the "least" element of the queue. Clearly priority
queues make sense only for ordered types, those for which we have a notion of "less
than" with which to define "least" element of the queue. Since the
element type does not determine the appropriate notion of comparison between elements, the
compiler cannot be expected to guess which ordering we mean. We must instead specify
the ordering together with the type. This is achieved using a substructure,
a structure-within-structure, as follows:
signature PRIO_QUEUE = sig
structure Elt : ORDERED
type prio_queue
exception Empty
val empty : prio_queue
val insert : Elt.t * prio_queue -> prio_queue
val remove : prio_queue -> Elt.t * prio_queue
end
Notice that the type prio_queue is a type, and not a type constructor (as
it was in the case of "plain" queues). This is a reflection of the fact
that the operations on a priority queue are not independent of the type of elements (as
they are with "plain" queues), but rely on the comparison operations that are
provided with the Elt structure.
To build, say, a priority queue containing integers ordered by the ordinary "less
than" relation, we might use a declaration such as the following:
signature INT_PRIO_QUEUE = PRIO_QUEUE where type
Elt.t = int
structure IntPQ :> INT_PRIO_QUEUE = struct
structure Elt = IntLT
type prio_queue = ... representation type ...
exception Empty
val empty = ...
val insert = ...
val remove = ...
end
Notice that we must indicate the type of elements in the signature of the
priority queue structure; otherwise the queue is useless since there would be no way to
produce values of the element type. We assume that IntLT is the name of
the structure with signature ORD that orders integers by the ordinary
"less than" relation.
To build a priority queue containing strings, we proceed similarly:
signature STRING_PRIO_QUEUE = PRIO_QUEUE where type
Elt.t = string
structure IntPQ :> INT_PRIO_QUEUE = struct
structure Elt = StringLT
type prio_queue = ... representation type ...
exception Empty
val empty = ...
val insert = ...
val remove = ...
end
We assume that StringLT is the name of a structure with signature ORD
that orders the type string by the substring relation.
An important point about these examples is that the signature is self-contained
in the sense that the specifications make no use of any identifiers (other than the
built-in primitives of ML) not declared in the signature itself. In particular, the
signature contains within it the specification of the element type and the operations on
it. This is an important principle of modularity: the signature of a component
should make sense independently of any other component of a system. Of course it is
difficult to make hard and fast rules, but this signature closure rule is a
useful rule-of-thumb for structuring a system.
It is quite obvious from the preceding examples that it is burdensome to define several
priority queues with different element types (or the same types but with different
comparisons). The examples suggest that we must repeat the code for the priority
queue for each use, possibly making transcription errors or propagating an error to each
use. This is not so, but the mechanisms to define a generic implementation of
priority queues will only be introduced in the next chapter.
Another, more pedestrian, use of substructures in ML is to organize the components of a
large system into a hierarchy of components, culminating in the "main" structure
of the system. This may seem like a very straightforward application of the
substructure mechanism, but we must watch out for a subtle problem, called the diamond
import problem, that arises when combining hierarchies and abstraction. Consider the following sketch of the structure of a component
of a compiler:
signature SYMBOL = sig
type symbol
val create : string -> symbol
...
end
signature LEXER = sig
structure Symbol : SYMBOL
type token
val lex : char stream -> token stream
...
end
signature SYMTAB = sig
structure Symbol : SYMBOL
type symtab
val lookup : symtab * Symbol.symbol -> bool
...
end
signature ABSSYN = sig
structure Symbol : SYMBOL
datatype ast = Identifier of Symbol.symbol | ...
...
end
signature PARSER = sig
structure Lexer : LEXER
structure SymTab : SYMTAB
structure AbsSyn : ABSSYN
val parse : Lexer.Symbol.symbol stream -> AbsSyn.ast stream
...
end
The idea is that the lexer, the symbol table, and the abstract syntax all rely on the
notion of a symbol. In order for the signatures of these modules to be
self-contained, it is necessary that they each include as a component a substructure that
provides the implementation of symbols. The parser makes use of all three
components: the lexer (to obtain a token), the symbol table (to record identifiers), and
the abstract syntax (to emit an ast). The important point here is that the parser
relies on the coherence of these three components of the compiler with respect to
the implementation of symbols: in order for the symbols generated by the lexer to be
acceptable as inputs to the symbol table and abstract syntax routines, they must all share
a common notion of symbol. But how can we ensure that these components in fact share
a common notion of symbol? Put another way, how can we avoid the error of using
incompatible components in a large system?
To see how this is done, let us examine one possible implementation of these modules,
as illustrated by the following code fragments:
structure Symbol : SYMBOL = struct
type symbol = ...
fun create s = ...
...
end
structure Lexer : LEXER = struct
structure Symbol = Symbol
type token
fun lex cs = ...
...
end
structure SymTab : SYMTAB = struct
structure Symbol = Symbol
type symtab = ...
fun lookup (s, st) = ...
...
end
structure AbsSyn : ABSSYN = struct
structure Symbol = Symbol
datatype ast = Id of Symbol.symbol | ...
...
end
structure Parser : PARSER = struct
structure Lexer : LEXER = Lexer
structure SymTab : SYMTAB = SymTab
structure AbsSyn : ABSSYN = AbsSyn
fun parse st = ... Lexer.lex ... SymTab.lookup ...
...
end
Notice that the Lexer , SymTab , and AbsSyn
modules are all constructed from a common Symbol structure, and hence they
may be used by the parser in the manner outlined above. It is evident from the
construction of the system that
Lexer.Symbol.symbol = SymTab.Symbol.symbol = Symbol.symbol ,
which is all that is required for the proposed implementation to make sense.
Notice that this holds true even if the Symbol module were to hold the type symbol
abstract.
Now suppose that we decided to treat the Lexer , SymTab , and AbsSyn
modules as abstractions by using opaque signature ascriptions to "hide" the
identity of the type components of these modules using opaque signature ascription as
follows:
structure Lexer :> LEXER = ...
structure SymTab :> SYMTAB = ...
structure AbsSyn :> ABSSYN = ...
This is good software engineering practice since the parser (for example) should not be
sensitive to the implementation details of the Lexer , SymTab , or
AbsSyn modules.
We now face a serious difficulty in the construction of the parser module.
Specifically, it is no longer evident that the Lexer , SymTab ,
and AbsSyn modules are built on a common notion of symbol !
By holding the type components of the modules abstract, we have obscured the fact
that the types Lexer.Symbol.symbol , SymTab.Symbol.symbol , and AbsSyn.Symbol.symbol
all coincide with the type Symbol.symbol , which is necessary if the parser
code is to be type-correct.
To avoid obscuring the underlying coherence of these modules while at the same time
enforcing abstraction we must explicitly propagate the requisite type sharing
information as follows:
structure Lexer :> LEXER where type Symbol.symbol=Symbol.symbol = ...
structure AbsSyn :> ABSSYN where type Symbol.symbol=Symbol.symbol = ...
structure SymTab :> SYMTAB where type Symbol.symbol=Symbol.symbol = ...
No doubt this modification looks strange at first glance! Keep in mind that the
left-hand side of the where type clause refers to the components of the
signature being modified, whereas the right-hand side refers to the surrounding
declaration of Symbol . The effect of these declarations is to make
explicit in the signatures that the type Symbol.symbol included in each
component is in fact the type Symbol.symbol defined at the top of the
program.
This example exposes a fundamental tension between the desire to limit the
propagation of type information between components of a system and the desire to combine
these components into a coherent whole. The tension is resolved by explicitly
propagating type equations into signatures using where type clauses
to ensure that the appropriate type sharing properties are publically visible, even in the
presence of an opaque signature ascription. In fact, transparent ascription is
really just a use of opaque ascription in which all type information is
propagated to the signature using where type clauses supplied by the
compiler. Transparent ascription is often more convenient to use than opaque
ascription, for precisely this reason, but it is important to keep in mind that the
concision afforded by transparent ascription is bought at the expense of abstraction.
One bit of concision provided by the SML/NJ compiler does not come at the expense of
abstraction. Rather than writing "where type Symbol.symbol =
Symbol.symbol " in each of the signatures above, we may instead simply write
"where Symbol = Symbol ", where as before the left-hand side refers
to the Symbol component of the modified signature, and the right-hand side
refers to the outer declaration of Symbol . Unfortunately, this notation
is not (yet) standard, so in the interest of portability, it is advisable to eschew it.
Let us now extend the compiler example a bit further, and imagine that the parser is
itself included as a component of a yet larger module, say, the front-end of the
compiler. In order to minimize the dependency of the front-end on the implementation
details of the parser, it would make sense to use opaque ascription when constructing the
parser as follows:
structure Parser :> PARSER = ...
But now notice that in doing so we have once again obscured the coherence of the
consituents of the parser. In particular, it is no longer apparent that the types Parser.Lexer.Symbol.symbol
and Parser.SymbTab.Symbol.symbol coincide. Worse, since the parser
module does not itself include the Symbol module, it is not obvious how to
specify the coherence of these components without tying them to the single global Symbol
module. That is, we might change the specification for Lexer inside the
the signature PARSER to
structure Lexer :>
LEXER where type Lexer.Symbol.symbol=Symbol.symbol = ...
But then we are making use of an external identifier in the signature PARSER ,
a violation of the signature closure principal. Put another way, we are artificially
limiting any implementation of the PARSER signature to make use of
the single, fixed Symbol module. This is overly restrictive, since all
we are interested in expressing is the coherence of the components of the parser
with respect to their notion of symbol; we need not pin them down to any specific
implementation of symbols.
The solution to this dilemma in Standard ML is to impose an explicit coherence
requirement, called a sharing specification, that ensures that the required type
equations hold true of the underlying implementation of a collection of modules.
Here we impose the requirement that the lexer, symbol table, and abstract syntax modules
all share a common (though unspecified) notion of symbol so that they may be coherently
combined by the parser. This is achieved as follows:
signature PARSER = sig
structure Lexer : LEXER
structure SymTab : SYMTAB
structure AbsSyn : ABSSYN
sharing type
Lexer.Symbol.symbol = SymTab.Symbol.symbol
=
AbsSyn.Symbol.symbol
val parse : Lexer.Symbol.symbol stream -> AbsSyn.ast stream
...
end
The addition of the sharing specification that "ties together" the lexer and
symbol table modules. This ensures that they make use of the same notion of symbol
without having to reveal what that signature is. Now if Parser is any
implemenation of the signature PARSER , we are assured (for example) that
Parser.Lexer.Symbol.symbol = Parser.SymTab.Symbol.symbol ,
without having to expose the exact underlying implementation of symbols in these
components.
We may achieve the same result using where type clauses:
signature PARSER = sig
structure Lexer : LEXER
structure SymTab :
SYMTAB where type Symbol.symbol =
Lexer.Symbol.symbol
structure AbsSyn :
ABSSYN where type Symbol.symbol =
Lexer.Symbol.symbol
val parse : Lexer.Symbol.symbol stream -> AbsSyn.ast stream
...
end
The idea is to pick a representative of the type equivalence class determined by the sharing
specification (here, Lexer.Symbol.symbol ) and to use where type
to express the requirement that the type Symbol.symbol in each of the
remaining components be equivalent to the representative.
Whether to use a sharing specification or a where type clause
is largely a matter of taste. The advantage of the sharing specification is that it
does not break the symmetry of the situation: we merely specify that the three types in
question must coincide, without choosing one to serve as the "representative" of
the equivalence class, as we are doing when we use a where type clause.
In fact sharing specifications are re-written by the compiler into a
series of where type clauses obtained by arbitrarily breaking the symmetry in
the manner suggested here. As a rule of thumb it is preferable to have the compiler
break the symmetry as a matter of internal representation of a signature, rather than to
choose an asymmetric representation yourself. However, the decision is fundamentally
a matter of taste.
|