We found that since the intermediate result of rewrite is unshared, the linear interpreted version is slower than the non-linear interpreted version, while the linear compiled version is slightly faster than the non-linear compiled version. When sharing is allowed, requiring reference counts for the linear versions, the linear shared versions are not competitive with the non-linear versions, except when "anchored pointers" are used. The anchored pointer reference count version, which reclaims storage, is still 1.25X slower than the non-linear version, which reclaims no storage.
Lisp implementors have long known that most list cells have a reference count of 1 [Clark77]--i.e., most cells are unshared. Unshared cells can have particularly efficient deallocation policies. Unfortunately, determining at compile time whether a cell is shared may be as expensive as ML type inference [Baker90], which is known to have exponential time complexity [Mairson90]. Linear logic [Girard87] [Lafont88] [Abramsky93] [Wadler91], on the other hand, exploits unshared cells by typing them as unshared, and the type system then preserves this property.
Since we are interested in understanding the advantages and disadvantages of the "linear" style of programming, we have tested this style on several different "real" programs [Baker93] [Baker94QS] [Baker94LLPS] We were particularly drawn to the Boyer Benchmark ("Boyer"), because it is the most representative of all the Gabriel Lisp benchmarks [Gabriel85]. Boyer is a classic Lisp program, because it recursively examines and constructs lists at a great rate. Boyer proves that a formula is a tautology by continually rewriting it until it is in a normalized form, and then testing whether this normalized form is a tautology. This rewriting utilizes a database of rewrite rules (axioms), whose left-hand-sides are matched (unified) against the formula, and when a match occurs, the formula is rewritten utilizing the right-hand-side as a pattern. This rewriting is analogous to the operation of a Prolog system in which the rules act as Prolog predicates and the initial formula acts as the initial form to be "evaluated". There is a major difference, however, in that (the standard) Boyer interprets these rules on-the-fly rather than compiling them.
Our present interest in a linear version of Boyer is driven not so much by speed or space per se, since a memoized Boyer is two orders of magnitude faster and smaller than the standard Boyer [Baker92]. We are more interested in the effect of copying versus sharing on total storage requirements of a non-memoized Boyer, and in whether more explicit storage management is onerous relative to automatic garbage collection.
(defmacro assq (x y) `(assoc ,x ,y :test #'eq)) (defun falsep (x lst) (or (equal x '(f)) (member x lst :test #'equal))) (defun truep (x lst) (or (equal x '(t)) (member x lst :test #'equal)))The prog in the function test must be changed into a let to report the answer. Some Lisps--e.g., xlisp--don't provide nil with a property list, which Boyer requires. Rewrite rule #84 should have greatereqp instead of greatereqpr, although this rule is never used. Rule #101 should have (t) and (f) instead of t and f, respectively, in the right-hand-side, although this rule never matches.
The most serious problem is with rule #54--(equal (remainder y 1)
(zero))--which requires a constant 1 as
remainder's second argument. one-way-unify1 incorrectly
treats the atom 1 like a variable, however, so that this rule
matches any use of remainder, replacing two occurrences in the
benchmark with (zero). After one-way-unify1 is fixed, the
remainder rules--#54, #56, #67, #86--are called 2,003 times instead of
only twice; consing is increased by 12%, and one-way-unify1 now calls
equal 7,053 times instead of the previous 1,046 times. Fixing this
bug increases the time on the standard Boyer Benchmark by 10-25%.[1] From now on, we use the term "standard
Boyer" to refer to Boyer with this bug fixed, and "buggy Boyer" to refer to the
code published in [Gabriel85].
(defun one-way-unify1 (term1 term2) ; With bug fixed.
(cond ((atom term2)
(cond ((setq temp-temp (assq term2 unify-subst)) (equal term1 (cdr temp-temp)))
((numberp term2) (equal term1 term2)) ; This clause is new.
(t (setq unify-subst (cons (cons term2 term1) unify-subst)) t)))
((atom term1) nil)
((eq (car term1) (car term2)) (one-way-unify1-lst (cdr term1) (cdr term2)))
(t nil)))
Essentially all of Boyer's time and space are used in rewriting; the resources
required for tautology testing are relatively insignificant. The Boyer rewrite
system includes 106 rewrite rules, each with a "left-hand-side" pattern to be
matched, and a "right-hand-side" pattern which shows how a matching expression
is to be rewritten. Of these 106 rules, 49 rules are tried in the standard
benchmark, and only 8 rules ever succeed (#18,#21,#23,#25,#33,#34,#35,#36).
Rule #23 (if) is attempted 21,491 times and succeeds 934 times, rule
#25, #51 and #95 (plus) are each attempted 5,252 times and rule #25
succeeds twice. The rules are indexed by their outermost function symbol on
the property lists of 58 symbols; equal has the most rules (14),
difference and lessp have 7 rules, and 15 function names each
have but one rule associated with them. The rules are tried in reverse order
of their occurrence in the text, and the benchmark is sensitive to this
ordering--one alternative ordering increased running time by 40%. Storage for
the rules themselves account for 1,849 list cells (without sharing).
The intermediate result of rewriting in standard Boyer contains 49,747 cells,
all of them unique! In other words, the classical reason for preferring
non-linearity is to enable sharing, but rewrite produces completely
unshared data structures! The reason is that rewrite makes a fresh
copy of its argument even when no rules have been successfully applied. To
determine the potential for sharing in rewrite, we tested code similar
to the following:
(defun scons (a d c) ; c is cell whose car might be a and whose cdr might be d.
(if (and (eq (car c) a) (eq (cdr c) d)) c ; if car&cdr match, use c instead of new cell.
(cons a d))) ; scons should be inlined.
(defun rewrite (term) ; rewrite with sharing.
(if (atom term) term
(rewrite-with-lemmas (scons (car term) (rewrite-args (cdr term)) term) ; scons v. cons.
(get (car term) 'lemmas))))
(defun rewrite-args (lst) ; rewrite-args with sharing.
(if (null lst) nil
(scons (rewrite (car lst)) (rewrite-args (cdr lst)) lst))) ; scons v. cons.
The use of scons within rewrite and rewrite-args was
extremely successful: it won 171,878 times out of 173,804 tries--a 98.9% "hit
rate". This use of scons eliminated 68% of all Boyer consing and
dramatically increased sharing: the result of rewrite now required
only 6,362 distinct cells instead of 49,747 cells, although both structures are
equal. This reduction in storage usage produced a 23% reduction in
execution time.[2] We tried scons
within apply-subst and apply-subst-lst and got a further
sharing of only 23 cells (6,339 cells), although with a slightly increased
running time. We thus found that the sharing implicit in non-linear styles of
programming can be valuable for Boyer, but only when it is properly exploited!
A similar situation occurs elsewhere in Boyer. When the database of rewrite rules is consulted, the set of rules indexed by a particular function symbol must be "copied", even though none of the rules may match. Furthermore, once a matching rule has been found, the rest of the rules for that function symbol are explicitly discarded, since the new function symbol to be rewritten is most likely different from the previous function symbol. Since most rule matches fail, and fail before a significant fraction of the rule has been examined, it is most efficient to keep only one copy of each rule, which copy is decomposed during matching and then reconstructed for later use.
The non-functional style of updating the variable association list in one-way-unify1 was changed into a functional style. In particular, the tasks of one-way-unify1 were split into three functions: one-way-unify1, which simply matches shapes, one-way-unify2, which constructs an association list, and check-alist, which checks that duplicate variables have equal values. This new style is slower than the non-functional style, but only slightly, since shapes very seldom match, in which case the construction of association lists is not necessary.
The strictly linear (unshared) style for Boyer was 1.58X slower than the non-linear Boyer, which is due primarily to the fact that linear Boyer has to recirculate many argument values as multiple result values which can be quite expensive. Furthermore, the linear interpreted Boyer performed one excess copy for each substitution performed, because it could not tell that there were no more occurrences of the variable.
We then programmed a traditional reference count implementation "underneath" our linear Boyer, by having dup increment reference counts instead of copying, having kill decrement reference counts instead of recycling, and having dlet* adjust reference counts on cells it is dissolving. This simple reference count implementation is not competitive, being about 2.9X as slow as the non-linear Boyer, and 1.62X slower than the strictly linear Boyer. This slowdown is worse than expected, since the copying involved during substitution is replaced by a simple reference count increment. One reason for this difference is the change from the built-in cons cell to a defstruct cell; type testing for defstruct cells is substantially slower in Coral Common Lisp than testing for consp. Another reason is that most reference count updates involve a type test in addition to the increment itself. Additionally, the bulk of the work in Boyer is unsuccessful pattern matching, so the costs involved in manipulating reference counts on the patterns and values far exceeds the slight savings during substitution. Finally, the next call to rewrite eliminates the sharing created by substitution.
We then used the concept of "anchored pointers" [Baker94AP] to further optimize our linear reference counted Boyer. An anchored pointer is similar to a "derived" or "weak" pointer in that it does not protect its target from being reclaimed. However, if the target is already protected by an anchor, then the anchored pointer will always point to an accessible target. Anchored pointers can be more efficient than normal pointers for traversing data structures because reference counts do not have to be manipulated. Anchored pointers can be safely used only within a dynamic context in which the anchor holds, and any anchored pointer which escapes this context must first be normalized by incrementing the target's reference count.
Anchored pointers are perfect for most of the manipulations involved in interpreted Boyer. The rules themselves are anchored, and during the pattern match, the expression to be matched is anchored, as well. However, the argument and result of rewrite are normal pointers, because we want the argument to rewrite to be consumed in the production of its result, which is also normal. The arguments to the function apply-subst are anchored pointers, but its value is normal, because it is newly consed.
The linear reference counted Boyer with anchored pointers was more than twice as fast as the simple reference counted Boyer, and 1.37X faster than the strictly linear Boyer without reference counts. The linear reference counted Boyer was 1.42X slower than the standard non-linear Boyer. When sharing (scons) was used, the linear reference counted Boyer was 1.28X slower than the standard non-linear Boyer using defstruct cells.
Our rule compiler compiles each rule separately, and does not therefore take advantage of important inter-rule optimizations. The compiler consists of two parts--a pattern match compiler, and a pattern-directed constructor, both classical. Our non-linear rule compiler uses sharing for subexpressions of the input which appear in the output, as well as for multiple copies in the output. Each compiled rule is a function with one argument--the expression to be rewritten--and 2 results--a truth value which is false if no match occurred, and the rewritten expression. Thus, when (funcall <rule> <exp>) fails, it returns nil,<exp>. The non-linear compiled rules are similar to the linear compiled rules below, except that upon failure the input expression is not reconstituted.
The rule compiler eliminates the consing required to construct association lists during pattern matching, which saves 69,208 conses, or about 27% of the total, and also runs 1.32X faster than the interpreted version. We also tested a "shared cons" (scons) compiled version, which eliminated another 171,878 conses, leaving 13,372 conses. This non-linear shared compiled version ran 1.34X faster than the non-scons compiled version.
Our linear rule compiler also compiles each rule separately. The compiler consists of three parts--a pattern match compiler, a pattern-directed constructor, and a copy analyzer which determines how many copies of a matched variable are kept or produced for the constructor. The pattern-directed constructor is classical, except that it is given a list of names of cells it is allowed to reuse during construction. The pattern match compiler is also classical, except that when a non-match occurs, the input expression must be explicitly re-constituted. The copy analyzer is novel in the linear style, which requires that any copying and deletion be made explicit. Our linear rule compiler is strictly linear, and does not use reference counts or sharing, so that all copying is "deep" copying.
A complete rule is a more convenient modularization for the linear style than
is the separation of matching and construction found in the interpreted Boyer.
Complete rules allow the compiler to more efficiently reuse cells reclaimed
from the input expression in the construction of the output expression. Below
is an example of the compiled version of the rule (equal (foo x y x) (bar y
y)):
(defun rule-foo-1 (e) ; compiled from rule (equal (foo x y x) (bar y y))
(if-atom e (values nil e)
(dlet* (((fn . e1) e)) ; Acquire first cell of input list.
(if-atom e1 (values nil `(,fn ,@e1)) ; These conses reuse the input cells.
(if-neq 'foo fn (values nil `(,fn ,@e1)) ; These conses reuse the input cells.
(dlet* (((x1 . e2) e1)) ; Acquire second cell of input list.
(if-atom e2 (values nil `(,fn ,x1 ,@e2)) ; These conses reuse the input cells.
(dlet* (((y1 . e3) e2)) ; Acquire third cell of input list.
(if-atom e3 (values nil `(,fn ,x1 ,y1 ,@e3)) ; Reuse input cells.
(dlet* (((x2 . e4) e3)) ; Acquire fourth cell of input list.
(if-natom e4 (values nil `(,fn ,x1 ,y1 ,x2 ,@e4)) ; Reuse input cells.
(if-nequal x2 x1 (values nil `(,fn ,x1 ,y1 ,x2 ,@e4)) ; Reuse cells.
(let* ((y1 y2 (dup y1))) ; Acquire copy of input y.
(kill fn) (kill x1) (kill x2) ; Start freeing resources.
(values t `(bar ,y1 ,y2))))))))))))))) ; These conses are optimized.
This compiled code for a rewrite rule bears a close resemblance to concurrent
programs which perform "two-phase locking" [Gray78]. In two-phase locking, a
program's first phase acquires all the resources it needs, so that during its
second phase, it will be guaranteed to succeed. In the code above, the rule
first "acquires" the entire input expression that matches the pattern, followed
by the "acquisition" of a duplicate of the y variable. If at any time during
the first phase the "resources" cannot be acquired, then the rule can fail
without "deadlocking". The second phase is guaranteed to succeed, because y
has already been copied, and the number of cells required for the
right-hand-side of the rule (3) is fewer than those appearing in the
left-hand-side (4). If we make the analogy between these rewrite rules and
chemical reactions which can happen in parallel [Berry92], then the two-phase
locking analogy becomes very intuitive.
This compiled linear Boyer uses strict copying, so each call to dup causes its argument to be "deep" copied. Since most people expect this copying to be expensive, we collected statistics for this Boyer benchmark. There were 1,868 calls to dup on non-atoms, with a mean size of 24.5 cells, a max size of 1801 cells and a standard deviation of 64.1 cells. The total number of cells copied is 45,784, which accounts for most of the 49,747 cells which appear in the result of rewrite. The rest of the cells (3,963) come from incremental additions by the successful rules--e.g., the 934 successful applications of rule #23 add 934*4=3,736 additional cells.[3] Since the 8 successful rules cause neither deletions (kill's) nor incremental reductions in the cell count, every cell is accounted for.
A major speed difference between the standard (non-linear) Boyer and our linear Boyer is the speed of tautologyp. The non-linear tautologyp takes about 12 times longer than the linear tautologyp. The reason for this discrepancy is instructive. The linear tautologyp on an argument of 49,747 cells takes only 14% longer than kill on the same argument, and the absolute difference is only marginally greater than the time for non-linear tautologyp. Thus, the time for linear tautologyp is just the time to test a tautology, plus the time to reclaim all the cells of its argument.
NL -- non-linear (garbage collected) using standard cons cells.
NL3 -- non-linear (garbage collected) using defstruct cells.
L -- linear (deep copy) using standard cons cells.
LRC -- linear sharing using reference counts; uses defstruct cells.
LRCAP -- linear sharing using reference counts and anchored pointers; uses defstruct cells.
Note that compiled versions (C) have the bug fixed (BF), and strict linear (L) versions cannot represent sharing (S). Interpreted versions (I) do not count the number of cells used to represent the rules themselves (1,849 cells).
Memory Interp (I) Bug (B) Unshar (US) rw Time Result Total
Management Comp (C) Bg Fxd (BF) Shar (S) w/o GC
One of the more interesting comparisons is between NL3,C,BF,S at 16.533 secs.
and LRCAP,C,BF,S at 20.7 secs. Both of these tests use exactly the same code,
except that the reference count updating and cell recycling has been removed
from the non-linear version. The reference counted version is 1.25X slower
than the non-linear version, meaning that fully 1/5 of all of the work went
into reference count updating and cell recycling. Recycling the cells
themselves takes only about 1/4 of that difference, or about 1/20 of the total
effort. The rest of the difference, or about 15% of the total effort, is
devoted to reference count updating. This test performed 7,584 normalizations,
of which 7,082 were non-atoms; this number is probably a minimum for the amount
of sharing which actually occurs Thus, although anchored pointers reduce
reference count overhead, it is still a substantial fraction of the total
effort.
NL I B US 24.967 s. 48,139 226,436
cells cells.
NL I B S 21.167 s. 6,362 58,566
cells cells.
NL I BF US 28.050 s. 49,747 254,458
cells cells.
NL I BF S 24.417 s. 6,337 82,580
cells cells.
NL C BF US 21.250 s. 49,747 185,250
cells cells.
NL C BF S 15.900 s. 6,337 13,372
cells cells.
NL3 I BF S 25.367 s. 6,337 82,580
cells cells.
NL3 C BF S 16.533 s. 6,337 13,372
cells cells.
L I BF US 44.400 s. 49,747 50,204
cells cells.
L C BF US 20.783 s. 49,747 ~52,000
cells cells.
LRC I B US 72.000 s. 48,139 49,880
cells cells.
LRCAP I B US 33.133 s. 48,139 ~66,000
cells cells.
LRCAP I B S 27.333 s. 6,362 ~8,000
cells cells.
LRCAP I BF US 39.767 s. 49,747 66,208
cells cells.
LRCAP I BF S 32.400 s. 6,337 7,544
cells cells.
LRCAP C BF S 20.700 s. 6,337 7,478
cells cells.
The difference between NL3,C,BF,S at 16.533 secs. and NL,C,BF,S at 15.9 secs. is entirely due to the representation of the cells and the type testing. The NL3 version uses defstruct cells, while the NL version uses the more efficient built-in cons cells.
The lack of sharing and explicit copying was not as expensive as we had expected. The explicit copying, compiled rule Boyer is 1.02X faster than the unshared non-linear compiled Boyer, is only 1.004X slower than the shared, reference-counted Boyer, and is only 1.31X slower than the fastest shared, non-linear Boyer, which does no storage reclamation at all. There are several reasons for this efficiency in the strict linear (unshared) version. First, the standard Boyer does not preserve sharing, anyway. Second, the static, a priori knowledge that a cell is unshared means that when it is dissolved, the cell can be immediately recycled without any checking. Third, linearity allows for reallocation of a recycled cell by the rule compiler without it ever appearing on the freelist. Fourth, recycling during dissolution means that most cells are recycled by mutator traversal rather than requiring a separate collector traversal. Thus, data structures must be heavily shared before explicit copying becomes inefficient.
Of course, sharing can be remarkably efficient for Boyer, since a hash cons table implementation of Boyer uses only 146 cells to represent the intermediate result of rewriting, compared with 48,139 cells[4] for an unshared representation [Baker92]! However, our current results show that the sharing which is important for this compression is obtained by exploiting "may share" information (exploited by scons and hash consing) rather than "must share" information (traditional non-linear sharing) which utilizes 6,337 cells--almost an order of magnitude smaller than 49,747, but 1.6 orders of magnitude more than 146. It therefore appears that the scheme suggested in [Baker92LLL] which uses both reference-counted linearity and hash consing may be an optimum point for Boyer-like symbolic computations.
Our tests do not settle the question about which kind of storage management is best for Boyer. They do seem to indicate that unless reference counts can dramatically reduce cache misses, reference counting will need hardware help in order to be competitive with other techniques. We could not measure the "high water mark" storage requirements for the non-linear versions which rely on tracing garbage collection for cell reclamation, but we believe that linear and reference-counted versions show substantial reductions in the high water mark compared with non-linear versions.
Most traditional Lisp optimizations have focussed on minimizing execution time for both a computation and its garbage collector. With modern RISC architectures having one- and two-level caches, however, minimizing time requires minimizing space, because an "off-chip" access can cost upwards of 100 times as much as an on-chip access. The linear style does seem to minimize storage requirements, and should produce fewer cache misses. Our timings could not reflect this, however, since our experiments were performed on an architecture without a cache.
[Baker90] Baker, H.G. "Unify and Conquer (Garbage, Updating, Aliasing, ...) in Functional Languages". ACM Lisp & Funct. Progr. (1990), 218-226.
[Baker92BD] Baker, H.G. "The Buried Binding and Dead Binding Problems of Lisp 1.5: Sources of Incomparability in Garbage Collector Measurements". ACM Lisp Pointers V,2 (Apr.-June 1992), 11-19.
[Baker92] Baker, H.G. "The Boyer Benchmark at Warp Speed". ACM Lisp Pointers V,3 (July-Sept. 1992), 13-14.
[Baker92LLL] Baker, H.G. "Lively Linear Lisp -- 'Look Ma, No Garbage!'". ACM Sigplan Notices 27,8 (Aug. 1992), 89-98.
[Baker92NREV] Baker, H.G. "NREVERSAL of Fortune--The Thermodynamics of Garbage Collection". Int'l. W/S on Memory Mgmt., St Malo, France, Sept. 1992, Springer LNCS 637.
[Baker93] Baker, H.G. "Sparse Polynomials and Linear Logic". ACM Sigsam Bulletin 27,4 (Dec. 1993), 10-14.
[Baker94QS] Baker, H.G. "A 'Linear Logic' Quicksort". ACM Sigplan Notices 29,2 (Feb. 1994), 13-18.
[Baker94LLPS] Baker, H.G. "Linear Logic and Permutation Stacks--The Forth Shall Be First". ACM Sigarch Computer Architecture News 22,1 (Mar. 1994), 34-43.
[Baker94AP] Baker, H.G. "Minimizing Reference Count Updating with Deferred and Anchored Pointers". ACM Sigplan Notices 29, 9 (Sept. 1994), 38-43.
Barth, J. "Shifting garbage collection overhead to compile time". CACM 20, 7 (July 1977),513-518.
Boyer, R. "Rewrite Rule Compilation". TR AI-194-86-P, M.C.C., Austin, TX 1986.
Berry, G., and Boudol, G. "The chemical abstract machine". Theor. Comp. Sci. 96 (1992), 217-248.
Carriero, N., and Gelernter, D. "Linda in Context". CACM 32,4 (April 1989),444-458.
Chirimar, J., et al. "Proving Memory Management Invariants for a Language Based on Linear Logic". Proc. ACM Conf. Lisp & Funct. Prog., San Francisco, CA, June, 1992, also ACM Lisp Pointers V,1 (Jan.-Mar. 1992), 139.
Clark, D.W., and Green, C.C. "An Empirical Study of List Structure in Lisp". CACM 20,2 (Feb. 1977), 78-87.
Collins, G.E. "A method for overlapping and erasure of lists". CACM 3,12 (Dec. 1960), 655-657.
Francis, R.S. "Containment Defines a Class of Recursive Data Structures". Sigplan Not. 18,4 (Apr. 1983), 58-64.
Friedman, D.P., and Wise, D.S. "Aspects of applicative programming for parallel processing". IEEE Trans. Comput. C-27,4 (Apr. 1978), 289-296.
Gabriel, R.P. Performance and Evaluation of Lisp Systems. MIT Press, Camb., MA 1985.
Girard, J.-Y. "Linear Logic". Theoretical Computer Sci. 50 (1987),1-102.
Gray, J.N. "Notes on Database Operating Systems", in Bayer, R., et al, eds. Operating Systems. Springer 1978.
Harms, D.E., and Weide, B.W. "Copying and Swapping: Influences on the Design of Reusable Software Components". IEEE Trans. SW Eng. 17,5 (May 1991),424-435.
Hesselink, W.H. "Axioms and Models of Linear Logic". Formal Aspects of Comput. 2,2 (Apr-June 1990), 139-166.
Kieburtz, R.B. "Programming without pointer variables". Proc. Conf. on Data: Abstraction, Definition and Structure, Sigplan Not. 11 (special issue 1976), 95-107.
Lafont, Y. "The Linear Abstract Machine". Theor. Comp. Sci. 59 (1988), 157-180.
Mairson, H.G. "Deciding ML Typeability is Complete for Deterministic Exponential Time". ACM POPL 17 (1990).
Martí-Oliet, N., and Meseguer, J. "From Petri nets to linear logic". Math. Struct. in Comp. Sci. 1,1 (Mar. 1991).
Shalit, A. Dylan(TM): An object-oriented dynamic language. Apple Computer, Camb., MA, 1992.
[Steele90] Steele, G.L. Common Lisp, The Language; 2nd Ed. Digital Press, Bedford, MA, 1990.
Strom, R.E. "Mechanisms for Compile-Time Enforcement of Security". Proc. ACM POPL 10, Jan. 1983.
Strom, R.E., and Yemini, S. "Typestate: A Programming Language Concept for Enhancing Software Reliability". IEEE Trans. SW Engrg. SE-12,1 (Jan. 1986), 157-171.
Wadler, P. "Is there a use for linear logic?". Proc. ACM PEPM'91, New Haven, June 1991, 255-273.
Wakeling, D., and Runciman, C. "Linearity and Laziness". Proc. Funct. Progr. & Computer Arch., LNCS 523, Springer-Verlag, Aug. 1991, 215-240.
The identity function is already linear, but five must
dispose of its argument before returning the value 5:
(defun identity (x) x)
(defun five (x) (kill x) 5) ; a true Linear Lisp would use "x" instead of "(kill x)"
The kill function, which returns no values, provides an
appropriate "boundary condition" for the parameter x. The appearance
of kill in five signifies non-linearity. (See
Appendix for a definition of kill).
The square function requires two occurrences of its argument,
and is therefore also non-linear. A second copy can be obtained by use of the
dup function, which accepts one argument and returns two
values--i.e., two copies of its argument. (See Appendix for a definition of
dup). The square function follows:
(defun square (x)
(let* ((x x-prime (dup x))) ; Use Dylan-style syntax for multiple values [Shalit92].
(* x x-prime)))
Conditional expressions such as if-expressions require a bit of
sophistication. Since only one "arm" of the conditional will be executed, we
relax the "one-occurrence" linearity condition to allow a reference in both
arms.[5] One should immediately see that
linearity implies that an occurrence in one arm if and only if there is
an occurrence in the other arm. (This condition is similar to that for
typestates [Strom83]).
The boolean expression part of an if-expression requires more sophistication. Strict linearity requires that any name used in the boolean part of an if-expression be counted as an occurrence. However, many predicates are "shallow", in that they examine only a small (i.e., shallow) portion of their arguments (e.g., null, zerop), and therefore a modified policy is required. We have not yet found the best syntax to solve this problem, but provisionally use several new if-like expressions: if-atom, if-null, if-zerop, etc. These if-like expressions require that the boolean part be a simple name, which does not count towards the "occur-once" linearity condition. This modified rule allows for a shallow condition to be tested, and then the name can be reused within the arms of the conditional.[6]
We require a mechanism to linearly extract both components of a Lisp
cons cell, since a use of (car x) precludes the use of (cdr
x), and vice versa, due to the requirement for a single occurrence of
x. We therefore introduce a "destructuring let" operation
dlet*, which takes a series of binding pairs and a body, and binds the
names in the binding pairs before executing the body. Each binding pair
consists of a pattern and an expression; the expression is evaluated to a
value, and the result is matched to the pattern, which consists of list
structure with embedded names. The list structure must match to the value, and
the names are then bound to the portions of the list structure as if the
pattern had been unified with the value. Linearity requires that a name
appear only once within a particular pattern. Linearity also requires that
each name bound by a dlet* binding pair must occur either within an
expression in a succeeding binding pair, or within the body of the
dlet* itself. Using these constructs, we can now program the
append and factorial (fact) functions:
(defun lappend (x y) ; append for "linear" lists.
(if-null x (progn (kill x) y) ; trivial kill
(dlet* (((carx . cdrx) x)) ; disassociate top-level cons.
(cons carx (lappend cdrx y))))) ; this cons will be optimized to reuse input cell x.
(defun fact (n)
(if-zerop n (progn (kill n) 1) ; trivial kill.
(let* ((n n-prime (dup n))) ; Dylan-style multiple-value syntax.
(* n (fact (1- n-prime))))))
(defun kill (x) ; Return no values. (if-atom x (kill-atom x) (dlet* (((carx . cdrx) x)) (kill carx) (kill cdrx) (values)))) (defun dup (x) ; Return 2 values. (if-atom x (dup-atom x) (dlet* (((carx . cdrx) x)) (let* ((carx carx-prime (dup carx)) (cdrx cdrx-prime (dup cdrx))) (values (cons carx cdrx) (cons carx-prime cdrx-prime)))))) ; reuse input cell.
[1] Since our rule compiler performs this check correctly, all of our "compiled" times are with this bug fixed.
[2]The scons function was inlined in every version that we timed.
[3]The statistics of Quicksort [Baker94QS] support the cell reuse optimization far better than these Boyer statistics do.
[4]This number is not 49,747 because [Baker92] did not include the one-way-unify1 fix discussed earlier in this paper.
[5]Any use of parallel or speculative execution of the arms of the conditional would require strict linearity, however.
[6]Although this rule seems a bit messy, it is equivalent to having the shallow predicate return two values: the predicate itself and its unmodified argument. This policy is completely consistent with linear semantics.