A suitable language for the standard interpreter

Requirements

What are the requirements for a language for the standard interpreter? Within the requirements, we can devise a variety of possible languages, but we will aim for something that is simple, and expressive for a wide range of programs, with particular emphasis on interpreters.

Computational completeness

The first requirement is that the language must be able to express all Turing-computable functions. This is usually satisfied through common-sense in designing the language. It is possible to devise computationally complete languages that are poorly expressive, for example the Turing Machine [Turing 37]. However, we will provide a range of operators more expressive than the Turing Machine (and working on a basic type system which supports mapping of abstract types to it rather better than does the Turing Machine Tape). Here are some kinds of operators that we expect to find in a serious programming language (and even in a small example language):

Provision of operators of all these kinds is sufficient for the implementation of a wide variety of programming languages, through interpreters written in a reasonably expressive style. Support for algorithmic and functional styles is particularly good, and implementation of other styles of language, while not so succinct, is not particularly cumbersome. Declarative and pattern-matching languages are the worst fit to our model, and require pre-processing into a suitable procedure-based representation, as explained in section 8.8.

I have presented the base language as a form of Lisp, partly because, having simple syntax, it is a convenient notation for simple new procedural or functional languages, and partly because the semantics of the base language, and the operators available, are generally Lisp-like. This follows from Lisp's origins and its evolution toward language and other symbolic processing.

Reflectiveness

Our other major requirement for the base language is that it must support reification and reflection. This needs only two operators:

An implementation for each of these is given in section 9.4.

Other reifiers and reflectors can be built on top of these, as explained in section 9.4, through use of the type system and the function application operator, for example the main operations on meta-continuations as described (under different names) in section 9.4:

Completeness of operators

As mentioned in section 5.4, the language for a closure must provide all the operators used in the expression of that closure. This is no new requirement introduced by the tower. The tower of levels with languages has only made it possible for this condition not to be satisfied. In a conventional language, all the operators in the language are always there. Only in a system where the language can be reflected into can this condition be broken.

Therefore, the base language must include at least all the operators needed to implement the standard evaluator. This is an extension of the idea of structural integrity, but applied to interpretation, rather than to the handling, of reified values. To be useful, it should also include a variety of operators typically useful in implementing operators of other languages, such as structure field accessors for the data types returned by reifiers.

It is not necessary for all operators of the base language to be shadowed by the meta-evaluator, but in practice (for efficiency) they all are. (Operators not included in the base language may also be shadowed. The only link between inclusion in the base language and being shadowed is that a computationally complete subset of the base language, and enough of the reflective operators to reify and reflect the entire tower state, must be shadowed.) [Smith And des Rivières 84b] contains a description of how to derive a minimal set of grounded operators.

The implementation of the base language

Like those in other languages, many of the operators in the base language will need to evaluate all their arguments, but do not need to specify in what order to evaluate them. Arithmetic operators are a good example of this. In the explanation below, we will use

((lambda (a b c) (+ a b c)) 1 2 3)
as an example.

The most concise way to implement such a family of operators is to split each operator into three parts:

In the experimental implementation Platypus (see section pl89sect) the base language and its shadows are set up by a group of Lisp macros platypus-defprim, platypus-def-control-prim, platypus-def-lispy-prim, and platypus-def-lispy-expr-prim, which both define the code to be interpreted within the tower, and name (or even define) the Lisp function to be used as the shadow outside the tower.

Since the operators structured in this way call more rudimentary operators such as dyadic-add, these more rudimentary ones may be provided as operators in their own right; they may be used directly for implementing some languages. Since they do not evaluate their arguments, the arguments must be fed to them in a fixed manner---non-evaluation of arguments means they cannot even be given through varying local variable indices. The values to be processed must be placed at the end of the values list---the top of the stack---and the operator called. It removes its arguments from the values list by treating it as a stack and popping them from it, performs its essential action, and puts any results onto the end of the values list by pushing them as onto a stack. This form of calling makes these operators suitable for use directly in a FORTH-like language such as PostScript---this is done in the implementation of PostScript used here.

The real set of primitive operators

In practice, many more operators than strictly necessary may be provided as primitive (shadowed), for efficiency and to make better use of the richness of the substrate system. The operators provided in Platypus include flow control, evaluation, arithmetic, data structure manipulation, reification, reflection, and input and output.

Operators for reification and reflection

In this section, we look at adding reflective operators to a language, taking Lisp as our example language. All of this applies to any other language used in our experimental system; Lisp is the most convenient example language.

The same reflective operators may be provided in any language. Furthermore, since we make all languages equivalent and transparently cross-callable, and interpretable by the same interpreter, by having common formats for program, language definition, and state, a program Pa written in language La may reify a sub-program Pb (perhaps a library routine called from Pa) written in Lb, and will receive it in the same form as that it would receive the representation of itself from a reifier. The operators used will be different, but if Pa does any analysis of the program, it may use the definitions of the operators of the language Lb (which is built into the closure of Pb) to find what each operator does.

Adding reflective facilities to an existing language

Since Lisp's calls are the same as our non-reflective calls, a common way to add reflective features to a Lisp system is to add a special form that does a reflective call, in which a procedure is applied to some otherwise hidden part of the Lisp system (such as the environment or the stack) along with any other arguments which are given as normal in the calling form.

This gives us reifiers such as

call-with-environment(procedure args env)
which is called as
(call-with-environment procedure arg1 arg2 ... argn)

but as a callee has an environment and a list of the original arguments supplied as its arguments, with the interpreter interposing the extra information. In effect, the interpreter executes

(call-with-environment procedure arg1 arg2 ... argn)
as if it were
(apply
       procedure
       *current-environment*
       (list arg1 arg2 ... argn))

where *current-environment* is a reifying variable---a variable handled specially by the evaluator, holding part of the information used by the evaluator in evaluating the program---holding the current environment.

Here is a sample piece using this style of reifier:

The general case of such a reifier is call-with-level proc (level args), which is called as (call-with-level proc arg1 arg2 ... argn) but as callee receives the level at which it is running and the original argument list, as if it were called as: (call-with-level *current-level* proc args).

Since the call frame contains a level, the call frame (activation record) of such a reflective call is in effect a tower level, into which information has been reified from the calling level.

When using activation records as tower levels, the link to the lower levels is an ordinary local variable/parameter in the stack frame. The next lower level for an interpreter must be held in variables of the interpreter anyway, even in a non-reflective system, as the interpreter must store somewhere the program it is interpreting. In our experimental implementation, the link to the lower levels is always in the same place in the level: it is in the second argument position in the value list of the interpreter, just after the interpreter's evaluand.

In this form of reflective interpretation, every call (transfer of meaning) to a lower level must pass that lower level to itself as part of its parameter list. This is easily accomplished, as the calling interpreter has that information available for its own use already, although possibly in a different form. All that it needs to do is include the data in the stack frame that it builds, so that it appears as if given as an argument to the call.

Passing Code Between Levels

For reflection, one of a second range of special forms, complementary to the first one, and characterized by the form {\tt(eval-with- } ) (where is whichever part of the context we wish to reflect), calls a procedure given as one of its arguments, with parts of the context surrounding it being taken from its other arguments. For example, we could have reflective operators such as (eval-with-environment form environment), which evaluates form using environment to provide any free variables needed by form, and (eval-with-arglistform arglist) called funcall in Lisp, or, to be as general as possible, (eval-with-level form level). For example, this function

(defun another-level-cons (d e)
   (eval-with-level
      '(cons d e)
      (construct-funny-level)))

returns a cons cell which was constructed by the cons operator of a different level.

Although these additions to Lisp can provide full reflective facilities, they present no organized model of non-reflective and reflective calls. Also, they often (although not necessarily) work in terms of the internals of a normal Lisp system, which, meta-circularity notwithstanding, is not particularly suited to manipulating language elements as data values. For example, to represent environments we might conveniently use a-lists, but an abstract type for environment, with appropriate operators (bind, unbind, lookup, assign, boundp) would be more appropriate. This is a consequence of the poor support for data abstraction (that is, just cons cells!) that classical Lisp provides. It seems appropriate to find a model of computation, and a type system to support it, designed more specifically for reflective evaluation, and not built around the facilities of one particular language. Such a system would have a data type representing the state of a computation, in which type grand reifiers would always return their results, and in which grand reflectors would always take their arguments. Each field of this type would also be of a particular type, and these types would be few and of simple well-defined characteristics.

The type we will use to represent the state of a computation is the interpretive closure, as described in clochap, and the types of its fields, as described in typechap are interpretive closure, environment, expression (or parse tree) and values list.

Jumpy and pushy reflection and reification

Instead of the two special calls eval-with-level and call-with-level, which always start new levels of interpretation, we can use alternative, and simpler, forms of reifier and reflector. In these operators, the state is represented as a tower. The reifier, which is a procedure taking no arguments, returns the state of the system as its result, and the reflector takes one argument, a tower, which replaces the current tower.

These jumpy operators differ from the reflective operations described in section 9.4 in that they assign to the state rather than bind the state. The pushy reflective operators may be implemented on top of the jumpy ones with the addition of a stack.

The grand pushy reflector may be implemented as the following macro, which takes something to evaluate and a level in which to evaluate it, and performs the evaluation in that context. The variable *tower* belongs to the level of the current level's evaluator. The action of reading it is a grand jumpy reifier, and writing it, a grand jumpy reflector.

(defmacro eval-with-level (form level)
  `(let ((old-levels (tower-levels *tower*)))
     (setf (tower-levels *tower*)
           (cons ,level old-levels))

     (eval ',form) ; this is implicitly in
                   ; the context of *tower*

     (setf (tower-levels *tower*)
           old-levels)))

A matching reifier is implemented by the following macro, which takes a procedure to call and an argument list, and splices onto the start of the argument list the level within which it will call that procedure:

(defmacro call-with-level (function &rest args)
  `(funcall ,function
             (car (tower-levels *tower*))
             ,@args))

As demonstrated by the procedures above, when used on a conventional architecture (that does not provide stack pushing as a primitive) these assigning (jumpy) reflective operations are more primitive than the binding (pushy) ones, in the sense that they may be used as part of the internals of the pushy operators. However, given an interpreter outside the tower, the pushy operators may also be implemented as primitively, in that they are simply functions that produce or consume an extra argument before calling a given function, and where the meta-evaluator handles that argument. (However, on a conventional computer system, all reflective operators are actually jumpy at the exact point of reflection, as there is a point when control switches from one context to another, regardless of whether the old context has been pushed onto some kind of stack, or otherwise stored.)

Each of these two approaches to handling reified execution state uses one of two kinds of meta-continuations, as explained in sections terminology and in [Danvy And Malmkjær 88]: pushy meta-continuations, which bind the tower state, and jumpy meta-continuations, which assign to the state. In activating a pushy continuation, no information is lost, as the old state is kept on a stack of continuations; while a jumpy continuation discards information, simply replacing the old state with the new one. These terms may be used to describe both control flow within one level (where pushy continuations are procedures to be called, and jumpy continuations are labels to be jumped to) and flow of meaning between levels, where pushy and jumpy meta-continuations are the two forms of reflection that we have just described.

Pushy and jumpy meta-continuations may be mixed within one tower, and even within one level. Viewed as tower-constructing devices, they have rather different forms. A jumpy meta-continuation destroys tower levels, by replacing a string of them destructively with a shorter string, or creates them, by replacing one level with a string of several levels.

A pushy meta-continuation cannot destroy or create levels in the same tower in this sense, but starts a new tower, orthogonal to the first. (This is as described in section 4.5.) The analogy for this in (non-meta) continuations is that a jumpy continuation modifies a sequence of actions within a procedure, while a pushy continuation starts a fresh sequence in another procedure. Just as tail-recursive pushy procedure continuations can be transformed into iterative continuations, tail-recursive pushy meta-continuations can be transformed into iterative metacontinuations. Functions for these transformations are given in [Danvy And Malmkjær 88].

From here on, we will use more systematic naming for reflective operators. The reifier called eval-with-level above will now be called grand-pushy-reifier, and the reflector call-with-level will be called grand-pushy-reflector. The jumpy reflective operations, not given as named procedures above, but used implicitly in-line, will be called grand-jumpy-reifier and grand-jumpy-reflector. Here is the code for the two grand jumpy reflective operators:

(proclaim '(special *tower*))

(defun grand-jumpy-reifier ()
  *tower*)

(defun grand-jumpy-reflector (new-tower)
  (setq *tower* new-tower))

Reifying and reflecting specific parts of a level

The grand jumpy or pushy reifier and reflector are the only reflective operations that we need. However, most of our reflective actions will copy the current tower, make some change, and make the changed tower become the current one. To make this kind of activity more convenient, and also to avoid unnecessary work, we provide some more specialized jumpy reflectors, for changing individual components of the current tower.

Changing one part of a level's state usually maps to one operation in a typical programming language. For example, alterations to the value list can implement assignment to variables or binding of variables; changing the continuation expression implements a jump or a call. Assignment to the interpreter has no equivalent in conventional programming languages, although there are commonly statements to make new procedure definitions (usually statically) and to implement non-local flow control (long jumps).

Providing separate reflectors for each part of the state stored in the closure thereby models cleanly the actions that a typical interpreter must implement. We also provide a collection of procedures for handling the reified values. These not only make handling these values easier, but also help to maintain consistency in the tower---not a theoretical consideration, but helpful in avoiding an easy way of crashing the system.

The reflectors (and, less importantly, reifiers) which affect only part of the state are as follows:

The reifiers above may be built on top of the grand-jumpy-reifier, and select a field of the the resulting record. The reflectors are different: they must find the part of the structure that the corresponding reifier would return, and modify just that part. In terms of conventional programming language technology, this is finding the left hand side value (abstract address) of the reifier's result and then through it assigning the new right hand side value.

This is more efficient than finding the whole reified object by grand-jumpy-reifier, changing just one part and reflecting the whole thing back in with grand-jumpy-reflector. In some forms of Lisp [Steele 84][section 7.2], efficient code for these operations can be generated automatically through the use of defsetf, producing an interface presented as setf forms.

Integrity

A requirement met implicitly by providing a set of whole-tower reflective operators is that reflective integrity should be preserved in going through a level that uses this language. A procedure shifted up to execute into such a level can always return information back to its home level. Therefore, information may always be passed up and down the tower by reflecting procedures to run in different levels as long as the language at each level has at least the facilities of the minimal base language. This requirement must be met anyway for other reasons: were a level not to include facilities equivalent to those of the base language, it could not form part of a integral string of interpreters (this is a circular argument) and so the integrity of the tower would be lost at that point. This would make it into two orthogonal towers, as described in section 4.5. However, it would still have structural integrity for reification (as described later in this section), and so lower levels would be able to access levels which they are not interpreted by at all. Thus, such a tower would no longer be grounded, as its connection with the umbrella interpreter would no longer have integrity---it would be two towers for purposes of interpretation, but only one for reflection. kindsofshift.>

Structural integrity for reification, that is, for access to remote levels through reifiers in one's own level, is ensured through the structure of each level, rather than through the language.

Integrity of groundedness through this level is met through the computational completeness of the language. It is grounded because it can, in its own right, compute anything that is computable. Its groundedness does not depend on that of any other levels.

Inserting and removing levels of evaluation

The most important of the reflected data manipulation procedures are one to insert a new evaluator just above the base of the tower, and a complementary one to remove an evaluator from just above the base of the tower.

Since an evaluator is an ordinary closure, it is guaranteed that when inserted above an existing, interpretable (grounded) evaluator, it can be interpreted by the previous first evaluator, and also that it can interpret the old application. metaevaluator.)>

Here are the procedures for inserting and deleting evaluators from the end of the tower:

(defun add-evaluator (evaluator)
  (let* ((our-tower (grand-jumpy-reifier))
         (new-level
          (make-evaluator-interpretation-level
           (car (level-call-record-stack our-tower))
           evaluator)))
    (push new-level
        (level-call-record-stack our-tower))
    (grand-jumpy-reflector our-tower)))

(defun remove-evaluator ()
  (pop (level-call-record-stack (grand-jumpy-reifier))))

An operation on the tower that preserves its integrity is one that replaces a string of levels by another string of levels that also has integrity. For the string to have integrity each interpreter must be able to interpret its lower neighbour, and so the new sequence must fit the interpreters above and below it correctly:

Inserting Strings

which we can guarantee by taking those end interfaces, labelled Algol Language and Algol Program entirely into the operation, replacing not only the link between them but those levels themselves. This way, we never try to link levels which will be incompatible, but can always insert an extra level in between as a buffer, with the appropriate language definitions for the previous evaluator. (The ability to do this depends on an evaluator being provided written in the new language.) Unfortunately, this may add more levels of interpretation, and so perhaps should be avoided in practice, for efficiency.

In practice, common tower manipulations change only one level, adding or removing an interpreter between two that remain unchanged:

Inserting a level between two that are unchanged

... an operation which is referentially transparent to all lower levels. Although the intensional meaning has been changed, the extensional meaning is still the same, because a transformation that preserves integrity is one that does not destroy the correctness of the previous interpretation.

An example of the usefulness of this is adding and removing tracing of evaluation, by adding and removing an evaluator that traces what is being processed.

Handling collections of closures

To change en masse how a group of closures (such as those implementing the operators of a language) are interpreted, the closures may be grouped by giving them the same evaluator. This makes them share a tower level together. Then, all that group of closures can have their processing affected by changing that one evaluator.

For example, all the operator closures of the language of a closure could be given the same evaluator, and then all activity in that language could be traced by tracing that evaluator. (To trace an evaluator i1 which is interpreted by an evaluator i2, we insert an evaluator it, which traces what it processes, between i1 and i2.)

Grouping closures in this manner can cause harmless dimensional anomalies in the tower structure. The closures that have been grouped share a tower level because they have the same evaluator; yet they may also be at distinct tower levels for other reasons, such as one being part of the interpreter of another. Thus, an evaluator may exist at more than one level, and so the same tower can have more than one number of levels between the base and the umbrella, as shown in the following diagram:

Ambiguous Level Indices

The same reification and reflection operators may be made into a part of any language; their action is always the same. Some languages will require some amount of packaging around the bare reflective operators, as they may not provide means for handling such data directly (they might have to present it in terms of a procedure library for handling reified data), or they may prefer to present the data in some form that matches the language's natural model of computation more closely.

Unrolling the infinite tower

As mentioned in sections rollup and unroll-circular-tower, the boring section of the tower is kept as a circular reference, which could in principle be followed indefinitely. If, however, we were to allow a program to do so via the reifiers, it would be hard to detect how many levels of the tower it had climbed before eventually changing something. So, to simplify the tracking of realized tower levels, we make the shadow versions of the the operators reifying and manipulating tower components to do some extra work that is invisible from within the tower (unless the program within the tower calls for reification of the meta-tower, as described in section 4.5. These are the extra actions needed:

Reflective operators that reach the substrate

There are a few operators that reflect down (or up!) to the substrate language. The main one of these is eval-in-cl, which takes an argument form that it passes to the evaluator of the substrate language, which is the eval procedure of Common Lisp. The result of the evaluation is then passed back as the result of this Platypus operator. This operator was provided so that the benchmark suite for Platypus could also run the same tests in Common Lisp automatically, for comparisons of the speeds. It could also be used for a form of reflection, right through to the substrate, to ask for compilation of Lisp forms that could then be installed in the shadow map to make new primitives. (This is discussed further in section 13.2.)

There is also a break operator, for use as a breakpoint, that runs a read-eval-print loop in the substrate language. When the user quits from the loop, possibly having reflected some changes into the system after reifying and displaying some information, this operator returns.

The time-now and input-output procedures are also in some sense substrate system reflective operations.


Summary of the base language

A language for use with the standard interpreter in the boring section of the tower must be powerful enough to support both the standard interpreter and the procedures that will run on it, which will typically be operators for other languages.

The implementation of the base language has two parts: the operators themselves, and their shadows, which are run at the next meta-level in the tower. (The last meta-tower is run in the substrate language on which the whole reflective system is built, and it is there that all operator definitions are eventually evaluated.)

The language should provide operations typically needed by interpreters, and those needed for reification and reflection. It is also desirable that the base language be reasonably expressive.

As well as the fundamental reifiers and reflectors, it is convenient to provide some jumpy reflectors that assign only part of the state; these are not only more efficient, but also more expressive of many common language features that they may be used to model.

In practice, we provide many more operators in the base language than are strictly necessary. ([Smith and des Rivières] explains how to work out which operators are necessary.)

Summary of reflective operators

Operators for reflection may be added to an existing language. With our model for mixed-language interpretation, the same operators will work for any language.

Reflective operators (reifiers and reflectors) are of two kinds, jumpy which move data between program-as-agent and program-as-subject without automatically creating new levels of interpretation, and pushy which create new levels either providing data from the program-as-subject or using it to create a new (or modified) program-as-subject. Jumpy operators are more primitive than pushy operators, in that (on a conventional architecture) they may be used to implement pushy operators, whereas, within one level of interpretation, pushy operators may not be used as the primitive on which jumpy operators may be built (other than by considerable wasted work).

One form of reflective operator is the grand reflective operator which reifies or reflects the entire state of the system. However, it is more efficient, as well as often more convenient, to reflect into just the part of the state required, and so reflectors that set only specific parts of the state are also worth providing in a practical system.

Reification of programs is homogenous between languages. The same reifiers (and reflectors) may be used in any language, and the values returned have closed into them all the linguistic information needed to understand the value in any way that might be required.


A fool hath no delight in understanding, but that his heart may discover itself.

Proverbs 18:2



Submitted, Examined And Accepted: 1991
Contact me