We believe that more can be done to reduce the storage requirements of Lisp programs. Although in many instances, the Lisp programmer can reduce the storage requirements of his program by deleting unneeded pointers as soon as possible, there is nothing he can do about systematic inefficiencies of the Lisp interpreter. This paper addresses itself to two sources of inefficiency in Lisp's variable binding mechanism--one of which is easy to detect--which prevent storage from being garbage collected long after its last reference. Implementations of Lisp which eliminate these situations should result in more economical execution for almost all Lisp programs which use a lot of storage, due to a lighter marking load on the garbage collector.
This paper discusses ways to reduce the net space of a program and hence its maximal net space. This is important because the garbage collect mark phase time for a single garbage collection is proportional to the net space at that point in the program. Reducing the average net space of a program will reduce the total garbage collect time; reducing the maximal net space will allow larger programs to be run in a limited address space.
We assume a paradigm Lisp interpreter similar to that found in the appendix of [McCarthy65] for the purposes of discussion. This is because most Lisp interpreters are virtually isomorphic to this one (with the possible exception of shallow-binding, which does not affect our arguments), and this interpreter is already familiar to more people than any other.
lookup and rebind can be defined relatively easily:
(lookup x e) = let e'=search[x, e] in
if e'=NULL-ENVIRONMENT then UNDEFINED
else value[bind[e']]
(rebind x z e) = let e'=search[x, e] in
if e'=NULL-ENVIRONMENT then ERROR
else value[bind[e']]:=z;
e
search[x, e] = if e=NULL-ENVIRONMENT then e
else if x=variable[bind[e]] then e
else search[x, parent[e]]
If an environment e' can be reached from an environment
e by following a finite series of parent pointers, then
we say that e' is an ancestor of e and
that e is a descendant of e'. We make
the reasonable assumption that the null environment is an ancestor of every
environment (excepting the null environment, itself). This constraint is
automatically satisfied by any finite program, given only our primitives for
interacting with environments. With these assumptions, a group of environments
forms a tree having the null environment as its root. The unique path
from an environment to the root we will call its search path.
We have defined the tree representation for environments; we now investigate its properties. First of all, lookup and rebind terminate for every environment due to the null-environment-ancestor condition. Secondly, lookup and rebind find only the first occurrence of a variable in the search path from a particular node; any other binding of that variable in an ancestor environment will not be seen. Therefore, the tree representation could lead to the "buried binding problem": a binding of a variable x in a node e cannot be referenced because 1) the only direct pointers to e are parent pointers from other environments; and 2) all paths of pointers to e from outside the environment tree go through environment nodes in which x is previously bound. Thus, search can never be called with the arguments [x, e].
Buried bindings can be a problem if space is at a premium because the garbage collector will refuse to collect the value of the buried variable even though it cannot be referenced by lookup or rebind. This is because the garbage collector cannot distinguish between environment and other structures and it must assume that the parent pointer chains can be followed. There are two possible solutions to this problem: modify the garbage collector, or change the representation. We will investigate both of these possibilities.
A natural question arises before we delve into the details of possible modifications. Do buried bindings actually occur in real programming situations, such as in Lisp? The answer is paradoxical in that Lisp interpreters[4] which have been optimized in certain ways[5] exhibit more buried bindings than unoptimized interpreters!
In order to identify buried bindings in the mark phase, the garbage collector must know when it is marking an environment node, and give it special treatment. Upon encountering an unmarked environment node e by chasing other than a parent pointer, the garbage collector must first run along its parent path from the node to the root, accumulating a defined variable set for e which indicates which variables occur in the path. If the garbage collector tries to add a variable which is already in the defined variable set, it knows that the binding is buried as far as the environment e is concerned. However, it might be accessible through some other pointer. Therefore, the best thing the garbage collector can do is mark the bindings which are certainly accessible. If at the end of the mark phase of garbage collection, a binding is still not marked, it is either totally inaccessible (would not be marked by an unmodified garbage collector) or it represents a buried binding and should be reclaimed.
In order to accumulate the defined variable set for an environment during the scanning of the parent path for an environment e, one bit per atom (variable) is allocated which is normally in the off state. As each binding is encountered on the scan, the variable's bit is tested. If it is off, the variable is not yet in the set and must be added. The bit is set to on, that binding is marked as accessible, and its value is put onto the stack for further marking. If, on the other hand, the bit was on, then nothing is done to that binding. In either case, the scan then proceeds with the parent of the current node. When the root is reached, a variable's bit will be on if and only if it is in the defined variable set for e. However, since the bits must all be off before the next environment node is marked, we must make one more pass and turn all of the bits off. Finally, the environment node e itself is marked.
This scheme is not as efficient as one might hope because environments close to the root will be scanned many times in the course of garbage collection, whereas in normal garbage collection, a cell can be marked (and its pointers followed) only once. It remains to be seen if more clever methods of garbage collection can solve this problem.
Buried bindings, once identified, can be either nullified or spliced out of the environment tree. Nullifying the binding at node e means simply calling (rebind variable[e] nil e); i.e., rebinding the variable at e to something harmless, so that the old value can be collected. This method does not require another phase in the garbage collector but does have the disadvantage that the storage used by the node e itself cannot be reclaimed. However, lookup could be modified to splice out nullified bindings whenever they are encountered, so that they can be picked up during the next garbage collection. Thus, this additional storage use is only temporary.
Splicing out buried bindings during the garbage collection in which they are identified might seem to require another mark-type phase. However, Guy Steele [Steele76] has suggested a clever method whereby the buried bindings can be spliced out during the collect phase. Every marked environment node encountered is checked to see if its parent is unmarked. If it is, the parent pointer is reset to the parent's parent. That node itself is checked for a marking, and so on, until the parent pointer points to a marked environment or nil. The collector then goes onto the next node in address order. If the node is unmarked, it is put onto the free node list. This simple scheme will work, provided only that the free list link cell is not the same as the parent cell in the environment node. This is because nodes may be placed onto the free list before their parent pointers have been used and we wouldn't want the parent pointer to be clobbered prematurely. Since this state of affairs can easily be arranged (e.g., link the free lists by CAR's in our prototype), this method is very nearly as efficient as a normal collect phase.
We now present such a non-shared environment representation which we call the functional representation. In this representation, an environment is simply a set of binding posts with no more than one binding post per variable. In other words, an environment is a set of ordered pairs which forms a single-valued relation, i.e., a function. The definitions of each of the primitives dealing with environments is trivial. (lookup x e) finds the binding post for x in e and returns the associated value, or UNDEFINED, if none exists. NULL-ENVIRONMENT is the empty set. (rebind x z e) finds the binding post for x in e and replaces the associated value with z. A big change between the representations is in bind. (bind x y e) = e' has two cases: 1) if x is undefined in e, then e' is e union <x,y>; 2) if x is bound to w in e, then e' is (e - <x,w>) union <x,y>. Thus, bind preserves the functionality of environments.
There are several possible implementations of the functional representation. The set of binding posts can be stored as a list, an array, or something more complex like a binary search tree or trie. It seems that all of these representations require O(|S|) time and space to perform bind, while for some of them, lookup or rebind can be done in time O(log|S|), where |S| is the cardinality of the set. For certain restricted sets of variables, the time for lookup is O(1), i.e., constant. This is to be contrasted with the tree representation, where bind requires constant time and space, but lookup can require time much greater than O(|S|)! This is because some trees bind the same variable many times in a search path, which slows down accesses to other variables.
This discussion must also be tempered by the knowledge that the tree representation can be augmented with value cells to give a shallow-binding representation [Baker76] which can reduce lookup time to a constant at the expense of yet another time--context switching time, which is constant for either the unadorned tree representation or the functional representation. We note that the tree representation has a buried binding problem if and only if the shallow-binding tree representation has it; therefore, we will not discuss shallow-binding any further.
The functional representation cannot exhibit the buried binding problem because any environment binds a variable to only one value. However, it has serious efficiency problems. First, it may not be any faster to access than the tree representation. Second, creating new environments is no longer a trivial operation, as it is in the tree representation. Finally, the storage for the sets themselves is not insignificant, and since it is not shared, as it is in the tree representation, it may take up more room in some cases than the tree representation.
The $64,000[7] question is: can a representation for environments be found which has good creation and access times and which solves the buried-binding problem?
To insure the inviolability of environments, we advocate the creation of a new Lisp datatype, the environment. The Lisp programmer would have access to environments only through the functions: null-environment, bind, rebind, and eval. (null-environment) would produce an environment with no bindings; (bind x y e) would produce a new environment which had the same bindings as e except that the atom x would be bound to y; (rebind x z e) would change the binding of x in the environment e to the value z using a side-effect, and would return the changed environment e; (eval x) would evaluate the expression x in the current environment and (eval x e) would evaluate the expression x in the environment e.
A trivial example of a dead binding is that of a subroutine having one parameter which never references it. In this case, the binding of the parameter to the argument need never occur, since it would be immediately dead. A less trivial case would be a subroutine of one parameter, which parameter was used to determine one of a set of alternative bodies for that subroutine. If none of the alternative bodies referenced the parameter, the parameter binding would become dead immediately after the choice of alternatives had been made. Dead bindings tie up a lot of space by holding onto storage that could otherwise be reclaimed by the garbage collector. If the bindings were nullified or spliced out as soon as they were no longer needed, rather than at the end of interpretation of the form in which they were bound, the space savings could be significant.
The primary producer of dead bindings, however, is the functional value or upward funarg. A functional value is a pair consisting of a lambda-expression and an environment. Functional values are created by evaluating a (function ... ) form in Lisp. The environment of the functional values is the environment at the time the form was evaluated. Intuitively, this environment is to be used for finding the values of the free variables in the lambda-expression. Usually, however, there are only a few free variables, whereas the environment structure can define a large number.
The more trivial examples of dead bindings discussed earlier do not cause as much trouble as functional values because the binding usually becomes dead only a short time before it is released, at subroutine return time, when it can be garbage-collected. Functional values, on the other hand, can be created at large return-stack depths, yet can be passed back all the way to be invoked at a very shallow stack depth. In this way, extremely large environment structures can be in effect at very shallow return point stack depths. In most cases, only a very small part of these large environment structures will ever be referenced.
A worst-case situation for dead bindings produced by functional values can be
imagined. It is well known that Lisp's CONS cells can be exactly simulated by
functional values; actually doing so in Lisp 1.5 will cause any by the most
trivial program to quickly run out of storage. This is not because the number
of Lisp cells used to implement functional values is excessive; in fact, the
simulation below requires just 7 CONS cells (in our paradigm interpreter,
anyway) for one simulated CONS cell. What is wrong is that the functional
values keep pointers not only to the simulated cell's CAR and CDR, but also to
the value of every other variable which existed in the context in which the
cell was created! It is this storage inefficiency caused by dead bindings
rather than the constant factor inefficiency which makes functional values so
useless in Lisp that some Lisp's do not even support them.
(defun cons (x y)
(function
(lambda (fn new me)
(cond ((eq fn 'car) x)
((eq fn 'cdr) y)
((eq fn 'rplaca) (prog2 (setq x new) me))
((eq fn 'rplacd) (prog2 (setq y new) me))))))
(defun car (x) (x 'car nil x))
(defun cdr (x) (x 'cdr nil x))
(defun rplaca (x y) (x 'rplaca y x))
(defun rplacd (x y) (x 'rplacd y x))
Simulation of Lisp's CONS cell by a FUNARG
The difficulty with dead bindings is in realizing when they have become dead.
In lexically-scoped languages like Algol or PL/I, the free/bound status of a
variable in an expression gives a good fail-safe method for proving that a
variable binding is dead. If there are no more free occurrences of that
variable in the rest of the expression being evaluated, any binding of that
variable in the current environment is dead. On the other hand, a variable
binding may be dead and yet the variable may still occur free in the
expression--e.g., within a conditional arm that will not be executed.
The whole concept of "free/bound" loses its meaning in a dynamically-scoped language like Lisp, because the scope of a variable binding changes with every different invocation of a subroutine. No purely syntactic test can determine whether a variable is in the scope of a binding or not.
In general, the deadness of a binding at a point in the interpretation of an expression is undecidable, regardless of whether lexical or dynamic scoping is used. Therefore, heuristics have to be developed which eliminate as many dead bindings as possible (without, of course, eliminating non-dead ones!).
The most dramatic use of tail recursion is in certain recursive programs which call themselves as the last part of their body. In this case, no return-point stack space is used at all--the program iterates instead of recursing.
Consider the following trivial Lisp 1.5 program:
((label[9] f
(lambda (x)
(cond ((zerop x) x)
(t (f (sub1 x))))))
n)
This program recursively decrements x from n to zero and
returns the value 0. A tail-recursing interpreter would use only a
bounded (independent of n) amount of return stack space in executing
this program, because when f recurses, the value returned by the lower
level will be that returned by the upper levels, so only one return address is
ever pushed onto the stack.
The size of the environment used in the interpretation is another matter. At the bottom level of the recursion, when x is equal to 0, the tree environment will have n and f bound and n+1 bindings of x, each one burying the previous one. Thus, doing tail recursion does not save environment space, only return-point stack space.
However, the tail-recursing interpreter has still helped us solve the problem of the extra dead bindings of x in the previous problem piling up. Whereas the normal interpreter would put pointers to all the intermediate environments onto the return-point stack, the tail-recursing interpreter builds the environment without keeping pointers to all the ancestor environments. In this way, the extra bindings of x are converted from being simply dead to being buried. Since buried bindings are easier to detect than other kinds of dead bindings, the technique of tail recursion has helped us make progress towards a solution in this case.
In general, the techniques of tail-recursion and buried-binding reclamation, consistently applied, will allow "iterative style" programs to be interpreted with a bounded net use of storage. In other words, these techniques allow iterative-style programs to recurse to an unbounded depth while holding onto only a bounded number of free storage cells. (This statement is actually false as it stands, because a terminating program which runs for n steps must use O(log n) space. However we are assuming that data types like integers take only one free storage cell--a good approximation for most Lisp programs.)
It is well known that recursion is a stronger programming technique than iteration; any iterative program can be rewritten in a uniform way to use recursion instead, while the reverse is not true. However, we would like a stronger reducibility than simple expressibility--we would like to say that recursion requires no more than a constant factor more in running time or net space than iteration. This is why tail recursion and the elimination of buried bindings is so important; they allow us to claim this stronger reducibility.
Tail recursion and dead binding elimination would allow a Lisp program which has been converted by Fischer's algorithm [Fischer72] to a continuation-passing style to use no more than a constant factor more maximal net space than the original program. This would strengthen the expressibility equivalence that Fischer claims to a maximal net space as well as running time equivalence. It is not clear what would happen to garbage collect time.
eval is not the only construct which prevents us from determining free variables and therefore dead bindings. apply with two arguments, i.e., with an implied environment, has the same troubles. This is because (apply f a) is essentially the same as (eval (cons f a)). However, there is one more construct which can cause the problem. This is the implied eval that takes place when the CAR of a form cannot be recognized as either a primitive function, a lambda-expression, a label-expression, or a closure. In these cases, (f a1 a2 ... an) reduces to ((eval f) a1 a2 ... an). Here, the expression f can compute a lambda-expression having any free variables, whatsoever.
We do not know the reasons why the architects of Lisp chose dynamic rather than lexical variable scoping. If it was to save time or the storage required to hold the environment needed for functional arguments or values, these small savings are gained at the likely expense of many dead bindings and the large time lost in tracing over them at garbage collect time.
The side-stacking feature, together with the ease of identifying free variables, would seem to make lexically-scoped languages far superior to dynamically-scoped languages. There are several advantages of dynamic scoping which have been overlooked, however. Dynamic scoping allows an extremely flexible coupling of independent modules at execution time. Flipping the coin over, lexical scoping ignores the problems of hooking together independent modules. The simplest lexical scoping model assumes one large, complete program text; linking is required to resolve external names of multiple modules before execution can proceed. The problem of correctly associating mutually recursive functions is not straight-forward in these languages. These requirements lead to a read-resolve-evaluate-print style of interpretation rather than Lisp's read-evaluate-print style.
The resolution phase is not a problem in systems with a clean break between program text and data. However, many artificial intelligence programs attach pieces of program to data and these pieces are dynamically linked together at run time by Lisp's dynamic variable scoping.
Thus, for systems requiring the utmost in linking flexibility, dynamic scoping can be very valuable. It is for these systems that solutions to the dead binding problem are important.
For people who like this sort of thing, This is the sort of thing they will like.We would like to describe a variant of Lisp which is as close to Lisp 1.5 as possible for a lexically-scoped language. In other words, it makes the smallest deviation from Lisp 1.5 while being lexically-scoped.
People may complain that dynamic scoping is an inherent quality of Lisp; that Lisp wouldn't be Lisp without it. We believe that the essential qualities of Lisp are its trivial syntax, its S-expressions formed from CONS cells, its garbage collector, its atoms and property lists, and its representation of programs as S-expressions. A language with all of these features would be more Lisp-like than a language with only dynamic scoping (e.g., APL).
You may have thought that we would proceed to describe yet another language, complete with an exhaustive listings of functions, etc. However, that is not necessary--it has already been done. For XLisp is just our name for Scheme [Sussman75]![10]
The general dead binding problem will require a formalism to state more precisely the timing and mechanism of the binding of variables and their reclamation. What Reynolds did for the order of interpretation issue using continuations [Reynolds72] needs to be done for the precise timing of binding and reclamation. Either a new formalism for specifying bindings, or a stronger interpretation on previous formalisms, must be used to pin down more precisely the semantics of the meta-circular interpreter of Lisp. Once this is done, the language will be strong enough to distinguish between classical and tail-recursing interpreters; between interpreters with dead binding problems and those without. When the language is strong enough to state the problem, solutions will not be far away.[11]
[Baker76] Baker, H.G. "Shallow Binding in Lisp 1.5". Manuscript, April, 1976. Subsequently published in Comm. ACM 21,7 (July 1978),565-569.
Fischer, M.J. "Lambda Calculus Schemata". Proc. ACM Conf. on Proving Asserts. about Programs, Sigplan Not. (Jan. 1972).
Hewitt, C., and Baker, H.G. "Actors and continuous functionals". In Neuhold, E.J., ed. Proc. IFIP Working Conf. on Formal Desc. of Progr. Concepts, IFIP, Aug. 1977, 367-387.
McCarthy, John, et al. LISP 1.5 Programmer's Manual. MIT Press, Cambridge, 1965.
Reynolds, John C. "Definitional Interpreters for Higher Order Programming Languages". ACM Conference Proceedings, 1972.
Steele, G.L. Personal communication, June, 1976.
Steele, G.L. RABBIT: A Compiler for SCHEME. AI-TR-474, AI Lab., MIT, May 1978.
Sussman, G.J., and Steele, G.L. "SCHEME--An Interpreter for Extended Lambda Calculus". AI Memo 349, MIT AI Lab., Dec. 1975.
[1] "Stale" was used in the original paper, but "dead" is more consistent with modern usage [Aho86].
[2] The date is correct; this manuscript was never published due to the pressure of the author's thesis work. The present version has been formatted and edited, but not revised, except where noted.
[3] Author's current address: Nimble Computer Corporation, 16231 Meadow Ridge Way, Encino, CA 91436, (818) 501-4956, FAX: (818) 986-1360.
[4] Interpreted code is easier to analyze for storage utilization, but compiled code has analogous problems.
[5] E.g., through closure-sharing [Steele78].
[6] A modern term for this representation is "acquaintance list/vector", after its use in Hewitt's Actor model [Hewitt77].
[7] From a TV quiz show which was popular when circuits were still segregated.
[8] "Stale" was the term which appeared in the original paper; "dead" is more consistent with modern usage [Aho86].
[9]
In Common Lisp, this expression would be written as
follows:
(labels ((f (x)
(cond ((zerop x) x)
(t (f (1- x))))))
(f n))
[10]
But without reified continuations.
[11] Steele was aware of this paper when he wrote his Master's Thesis [Steele78]; indeed, he made valuable suggestions for its improvement [Steele76]. Nevertheless, Steele's closure-sharing analysis [Steele78,p.60-63] often creates buried bindings where none existed before; if the goal of minimum "maximal net space" is desired, closure-sharing can be a "pessimization" rather than an optimization.