We have programmed several versions of Quicksort in a linear fragment of Common Lisp, and--contrary to the conclusions of [Wakeling91]--find that Linear Lisp produces a very fast Quicksort routine. In fact, a linear Quicksort routine for lists (which doesn't require garbage collection) is considerably faster than a non-linear Quicksort routine for lists (which does require garbage collection), while the linear Quicksort routine for vectors is only 3.5% slower than the non-linear Quicksort routine. We do not find the linear style to be particularly burdensome for programming Quicksort, again disputing the conclusions of [Wakeling91], especially when arrays are subdivided instead of indexed.
The identity function is linear, but five must dispose of its
argument before returning the value 5:
(defun identity (x) x)
(defun five (x) (kill x) 5) ; a true Linear Lisp would simply use "x" instead of "(kill x)".
The kill function, which returns no values, provides an
appropriate "boundary condition" for the parameter x. The appearance
of kill in five signifies non-linearity. (See
Appendix for a definition of kill).
The square function requires two occurrences of its argument,
and is therefore non-linear. The second copy may be obtained by using the
dup function, which accepts one argument and returns two
values--i.e., two copies of its argument. (See Appendix for a definition of
dup). The square function is as follows:
(defun square (x)
(let* ((x x-prime (dup x))) ; Use Dylan-style syntax for multiple values [Shalit92].
(* x x-prime)))
Conditional expressions--e.g., if-expressions--require sophistication.
Since only one of the arms of the conditional is executed (assuming
non-speculative computation), we can relax the linearity condition to allow an
occurrence in both arms. Linearity then implies that there exists an
occurrence in one arm if and only if there is also an occurrence in the
other arm. (This condition is similar to that for typestates
[Strom83]).
The boolean expression part of an if-expression requires more sophistication. Strict linearity requires that any name used in the boolean part of an if-expression be counted as an occurrence. However, many predicates are "shallow", in that they examine only a small (i.e., shallow) portion of their arguments (e.g., null, zerop), and therefore a modified policy is more efficient. We have not yet found the best syntax for shallow predicates, but provisionally use several new if-like expressions: if-atom, if-null, if-zerop, etc. These if-like expressions require that the boolean part be a simple name, which does not count towards the "occur-once" linearity condition. This modified rule allows for a shallow condition to be tested, and then the name can be reused within the arms of the conditional.[1]
We require a mechanism to linearly extract both components of a Lisp cons cell, since any use of (car x) precludes the use of (cdr x), and vice versa, due to the requirement for a single occurrence of x. We therefore introduce a "destructuring let" operation dlet*, which takes a series of binding pairs and a body, and binds the names in the binding pairs before executing the body. Each binding pair consists of a pattern and an expression; the expression is strictly evaluated to a value, and this value is matched to the pattern, which consists of list structure with embedded names. The list structure must match to the value, and the names are then bound to the portions of the list structure as if the pattern had been unified with the value. As expected, linearity demands that a name appear only once within a particular pattern. Linearity furthermore requires that each name bound by a dlet* binding pair must occur exactly once--either within an expression in a succeeding binding pair, or within the body of the dlet*.
Using these constructs, we can program the familiar append and
factorial (fact) functions:
(defun lappend (x y) ; append for linear lists.
(if-null x (progn (kill x) y) ; trivial kill.
(dlet* (((carx . cdrx) x))
(lcons carx (lappend cdrx y))))) ; lcons will be optimized.
(defun fact (n)
(if-zerop n (progn (kill n) 1) ; trivial kill.
(let* ((n n-prime (dup n)))
(* n (fact (1- n-prime))))))
(defun qs (x l) ; sort the list x onto the list l. (if (null x) l (let* ((i (car x)) (restx (cdr x)) (high low (highlow restx i nil nil))) (qs low (cons i (qs high l)))))) (defun highlow (x i h l) ; select the high and low elts of x onto h and l. (if (null x) (values h l) (let* ((firstx (car x)) (restx (cdr x))) (if (< firstx i) (highlow restx i h (cons firstx l)) (highlow restx i (cons firstx h) l)))))This applicative version of Quicksort is not only quite efficient, but almost in linear form.
(defun lqs (x l) (if-null x (progn (kill x) l) (dlet* (((i . restx) x)) (let* ((high low i (lhighlow restx i nil nil))) (lqs low (lcons i (lqs high l))))))) (defun lhighlow (x i h l) (if-null x (progn (kill x) (values h l i)) (dlet* (((firstx . restx) x)) (let* ((truth firstx i (l< firstx i))) (if truth (lhighlow restx i h (lcons firstx l)) (lhighlow restx i (lcons firstx h) l)))))This Quicksort routine is in linear form, and executes without producing any garbage, since lcons re-uses any list cells recycled by dlet*. A true Linear Lisp compiler would optimize further by deferring the memory cycles required to put the cells recycled by dlet* back onto the freelist until the scope of the dlet* expression is exited. Then if the compiler sees an lcons within this scope, it can use one of these recycled cells immediately without the cost of popping a cell from the freelist. For example, the cell recycled by the dlet* in lqs can be utilized in the later lcons. Similarly, the cell recycled by the dlet* in lhighlow can be utilized in both mutually exclusive arms of the conditional. Thus, in both these functions the linear compiler can avoid putting cells onto the freelist and taking them off the freelist. In other words, an optimized linear Quicksort never calls the real cons function! We can prototype this optimization by using a macro lcons whose third argument is such a recycled cell.
(defmacro lcons (a d c) `(rplacd (rplaca ,c ,a) ,d)) (defun lqs (x l) (if-null x (progn (kill x) l) (dlet* (((i . restx) x)) (let* ((high low i (lhighlow restx i nil nil))) (lqs low (lcons i (lqs high l) x)))))) (defun lhighlow (x i h l) (if-null x (progn (kill x) (values h l i)) (dlet* (((firstx . restx) x)) (let* ((truth firstx i (l< firstx i))) (if truth (lhighlow restx i h (lcons firstx l x)) (lhighlow restx i (lcons firstx h x) l))))))To preserve linearity, we must change slightly the semantics of dlet*. If dlet* swaps [Baker92LLL] the car and the cdr of x into the pattern variables, x will be left bound to a cell with an empty car and cdr. The later call to lcons swaps the new car and cdr into x and returns it as the consuming occurrence of x. At no time is there ever more than one reference outstanding to the cell or any of its components, so linearity is preserved.
We tested this version of Quicksort on Coral Common Lisp 1.2 on a Macintosh Plus with a 16MHz Radius 68020 accelerator card. We sorted lists of random fixnums of various lengths, and it took approximately 45*n*log2n usecs to sort a random list of n fixnums, or 12.73 secs. for 20,000 fixnums. This timing was 2.16X faster than the built-in sort function, although some of this difference can be attributed to the fact that the built-in sort does not know until run-time what function to use for comparison. Our linear Quicksort was also 1.84X faster than the built-in sort routine for a vector of 20,000 random fixnums.
We found that the Coral Common Lisp compiler did not do a good job of optimizing tail recursion. In fact, when we moved the lhighlow function inside the lqs function--normally an optimization--this sorting routine got much slower due to the creation of unnecessary closures. In order to estimate the speed with properly optimized tail recursion, we performed this optimization by hand by converting it into a Common Lisp prog form with labels and go's. This version was 1.24X faster with a timing of about 36*n*log2n usecs. With a generic comparison predicate, it then took 52.1*n*log2n usecs, or 1.81X faster than the built-in generic sort routine for lists.
We thus have produced an extremely efficient linear Quicksort algorithm. In fact, we are not aware of any further optimizations possible, outside of better register optimization, or perhaps more in-lining (loop unrolling). Thus, our linear Quicksort is not only competitive with the best non-linear algorithm, it is the best non-linear algorithm.
(defun vqs (v k m) ; Quicksort vector v from k up to m. (if (>= k m) v (let* ((x (aref v k)) ; Create a hole at k. (i (split1 v k (1- m) x))) ; Do partition. (setf (aref v i) x) ; Put x back into the hole. (vqs v k i) ; Quicksort v from k up to i. (vqs v (1+ i) m)))) ; Quicksort v from i+1 up to m. (defun split1 (v i j x) ; hole is at v(i). (if (= i j) i (let* ((vj (aref v j))) ; Copy elt. to vj. (if (< vj x) (progn (setf (aref v i) vj) (split2 v (1+ i) j x)) (split1 v i (1- j) x))))) (defun split2 (v i j x) ; hole is at v(j). (if (= i j) i (let* ((vi (aref v i))) ; Copy elt. to vi. (if (>= vi x) (progn (setf (aref v j) vi) (split1 v i (1- j) x)) (split2 v (1+ i) j x)))))
Swapping instead of copying elements from a vector can introduce
an inefficiency into linear Quicksort unless the partitioning loop is tuned to
take advantage of this behavior. In a nonswapping Quicksort, each vector
element is implicitly copied to a register and then compared to the test
element. If the vector element is already in its proper partition, it does not
have to be restored. However, if not, it is stored into the hole created by a
previous movement. In a linear Quicksort, each vector element is swapped into
a register and then compared to the test element. It is therefore better to
have the register hold an element value before the swap, so that the swap can
simultaneously perform both reading and writing. We can "prime the pump" by
having the register initialized with an element that should be stored in the
location of the next swap. Thus, the circuitry transferring data in both
directions is utilized.
(defun lvqs (v k m) ; Quicksort vector v from k up through m.
(let* ((truth k m (l>= k m)))
(if truth (progn (kill k) (kill m) v)
(let* ((x v k (laref v k 'hole)) ; Get test element.
(k1 (1+ k))
(ve v k1 (laref v k1 'hole)) ; Get lower element.
(ve x (sort2 ve x)) ; Conditionally exchange ve, x so that ve<=x.
(k1 k1-prime (dup k1)) ; Trivial dup.
(m m-prime (dup m)) ; Trivial dup.
(v i ve x (lsplitlow v k1-prime m-prime ve x)) ; Do partition.
(ve v k1 (laref v k1 ve)) ; Put back lower element.
(x v i (laref v i x)) ; Put in test element.
(k (1- k1))
(x v k (laref v k x)) ; Put back lower element.
(i i-prime (dup i))) ; Trivial dup.
(lvqs (lvqs v k (1- i-prime)) (1+ i) m)))))
(defun lsplitlow (v i j ve x) ; ve<=x, i<=j.
(let* ((truth i j (l= i j)))
(if truth (progn (kill j) (values v i ve x))
(let* ((i (1+ i))
(ve v i (laref v i ve)) ; Swap lower elt. with unknown elt.
(truth ve x (l<= ve x)))
(if truth (lsplitlow v i j ve x)
(lsplithigh v i j ve x))))))
(defun lsplithigh (v i j ve x) ; ve>x, i<=j.
(let* ((truth i j (l= i j)))
(if truth
(let* ((ve v i (laref v i ve))) ; Swap upper elt. with lower elt.
(kill j) (values v (1- i) ve x))
(let* ((ve v j (laref v j ve)) ; Swap upper elt. with unknown elt.
(j (1- j))
(truth ve x (l<= ve x)))
(if truth (lsplitlow v i j ve x)
(lsplithigh i j ve x))))))
The requirement for continually passing around the vector is not onerous.
However, the requirement to explicitly copy the index values is pedantic. A
type system allowing both linear and non-linear types, with the non-linear
types used for things like array indices, would allow for a more pleasant style
for this task. Even without such types, however, a compiler should find it
easy to optimize trivial duplications.
As in the list Quicksort case, our vector Quicksort was slow due to poor tail-recursion optimization by the Coral Common Lisp compiler. We performed appropriate tail-recursion optimizations by hand for both the non-linear and linear versions of our vector Quicksort, and got substantial speedups. Since Coral Common Lisp does not handle the returning of multiple values efficiently, we performed additional hand optimizations to understand their value in this context. We found that the efficient handling of multiple values is critical to achieving acceptable performance on our linear benchmark. In particular, we assumed that built-in linear versions of predicates which returned multiple values could be compiled as efficiently as in a normal applicative system where the additional values did not need to be returned. We also assumed that the built-in array reference swapping instructions would be similarly optimized.
The final linear generic[2] swapping Quicksort is only 3.5% slower than the non-linear generic non-swapping Quicksort. We could achieve parity if we could eliminate the redundant index calculations which result from the emulation of swapping on an assignment-oriented architecture. Furthermore, if a hardware swap operation takes the same amount of time as a hardware read operation (we have every reason to believe it can), then a swapping Quicksort can be strictly faster than a non-linear Quicksort.
(defun lvqs (v) ; Quicksort the linear vector v in-place. (if-empty v v ; If v is empty, we are done. (let* ((hole v (first&rest v)) ; Split vector into subvectors of lengths 1 and n-1. (x hole zero (laref hole 0 'empty)) ; Get test element from hole. (x lo hole hi (part1 x 0 'empty '#() hole v '#())) ; Partition vector. (hv hole zero (laref hole zero x)) ; Put test element back into hole. (lo (lvqs lo)) ; Sort low partition. (hi (lvqs hi))) ; Sort high partition. (kill zero) (kill hv) ; Trivial kills. (catenate lo hole hi)))) ; Return reconstructed vector. (defun part1 (x zero hv lo hole1 mid hi) ; Partition mid using x; v = lo:hole1:mid:hi. (if-empty mid (progn (kill zero) (kill hv) (kill mid) (values x lo hole1 hi)) (let* ((mid hole2 (rest&last mid)) ; Split off last element from mid. (y hole2 zero (laref hole2 zero hv)) ; Extract element y from hole2. (truth y x (l< y x))) ; Compare y with test element. (if truth (let* ((hv hole1 zero (laref hole1 zero y))) ; Put y into hole1. (part2 x zero hv (catenate lo hole1) mid hole2 hi)) ; Reuse lo,hole1 storage. (let* ((hv hole2 zero (laref hole2 zero y))) ; Put y into hole2. (part1 x zero hv lo hole1 mid (catenate hole2 hi))))))) ; Reuse hole2,hi storage. (defun part2 (x zero hv lo mid hole2 hi) ; Partition mid using x; v = lo:mid:hole2:hi. (if-empty mid (progn (kill zero) (kill hv) (kill mid) (values x lo hole2 hi)) (let* ((hole1 mid (first&rest mid)) ; Split off first element from mid. (y hole1 zero (laref hole1 zero hv)) ; Extract element y from hole1. (truth y x (l< y x))) ; Compare y with test element. (if truth (let* ((hv hole1 zero (laref hole1 zero y))) ; Put y into hole1. (part2 x zero hv (catenate lo hole1) mid hole2 hi)) ; Reuse lo,hole1 storage. (let* ((hv hole2 zero (laref hole2 zero y))) ; Put y into hole2. (part1 x zero hv lo hole1 mid (catenate hole2 hi))))))) ; Reuse hole2,hi storage.We tested this routine, but could not prototype an optimized version because Common Lisp does not trust the user to destroy and compose vectors in this fashion. However, we believe that this style of array manipulations can be quite efficient if linear vectors are implemented by means of separate headers. The recursions would manipulate only the headers, which by linearity could always be located in registers. With proper tail recursion optimizations, the number of parameters in the tail recursions would not affect performance, and the overall performance should be good. By additionally arranging swaps to utilize communication in both directions, as before, performance should be excellent.
A vector subdivision Quicksort is ideal for a parallel processing machine--whether using parallel functional units (VLIW) or parallel instruction streams--because the vector is subdivided into exclusively-owned partitions. These subvectors can then be sorted completely independently and in parallel. The linear style, in which all events are ordered solely by dataflow constraints, makes trivial the compiler's job of detecting interference. The linear style recognizes that every access to a register--e.g., a read used to perform a comparison--ties up the register and therefore requires that the register be passed as an argument and returned as a value to guarantee proper ordering. Yet the linearity constraint also allows the factoring of the binding environment so that subexpressions can easily be reordered or executed in parallel. Quicksort may not be the ideal parallel sorting routine, but this linear subdividing Quicksort automatically extracts as much parallelism as possible.
Many may find onerous the linearity requirement that all names be explicitly consumed. However, we have found that this linearity check is an excellent programming aid, because--for example--it makes sure that every parameter of a function is either used or given a reason why not.
Linearity in a language is closely related to the typestates [Strom83] of NIL/Hermes, and to islands [Hogg91].
Aho, A.V., et al. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA, 1974.
Baase, S. Computer Algorithms: Introduction to Design and Analysis. Addison-Wesley, Reading, MA, 1978.
[Baker92BD] Baker, H.G. "The Buried Binding and Dead Binding Problems of Lisp 1.5: Sources of Incomparability in Garbage Collector Measurements". ACM Lisp Pointers V,2 (Apr.-June 1992), 11-19.
[Baker92LLL] Baker, H.G. "Lively Linear Lisp -- 'Look Ma, No Garbage!'". ACM Sigplan Notices 27,8 (Aug. 1992), 89-98.
[Baker92REV] Baker, H.G. "NREVERSAL of Fortune--The Thermodynamics of Garbage Collection". Int'l. W/S on Memory Mgmt., St Malo, France, Sept. 1992, Springer LNCS 637.
Chirimar, J., et al. "Proving Memory Management Invariants for a Language Based on Linear Logic". Proc. ACM Conf. Lisp & Funct. Prog., San Francisco, CA, June, 1992, also ACM Lisp Pointers V,1 (Jan.-Mar. 1992), 139.
Friedman, D.P., and Wise, D.S. "Aspects of applicative programming for parallel processing". IEEE Trans. Comput. C-27,4 (Apr. 1978), 289-296.
Girard, J.-Y. "Linear Logic". Theoretical Computer Sci. 50 (1987),1-102.
Harms, D.E., and Weide, B.W. "Copying and Swapping: Influences on the Design of Reusable Software Components". IEEE Trans. SW Eng. 17,5 (May 1991),424-435.
Hesselink, W.H. "Axioms and Models of Linear Logic". Formal Aspects of Comput. 2,2 (Apr-June 1990), 139-166.
Hogg, J. "Islands: Aliasing Protection in Object-Oriented Languages". Proc. OOPSLA'91, Sigplan Not. 26,11 (Nov. 1991), 271-285.
Knuth, D.E. The Art of Computer Programming, v. 3, Sorting and Searching. Addison-Wesley, Reading, MA 1973.
Lafont, Y. "The Linear Abstract Machine". Theor. Comp. Sci. 59 (1988), 157-180.
Martí-Oliet, N., and Meseguer, J. "From Petri nets to linear logic". Math. Struct. in Comp. Sci. 1,1 (Mar. 1991).
Mendelson, A. "A Single Cached Copy Data Coherence Scheme for Multiprocessor Systems". Comput. Arch. News 17,6 (Dec. 1989), 36-49.
Shalit, A. Dylan(TM): An object-oriented dynamic language. Apple Computer, Camb., MA, 1992.
Strom, R.E. "Mechanisms for Compile-Time Enforcement of Security". Proc. 10th ACM POPL, Jan. 1983.
Suzuki, N. "Analysis of Pointer 'Rotation'". CACM 25,5 (May 1982)330-335.
Wadler, P. "Is there a use for linear logic?". Proc. ACM PEPM'91, New Haven, June 1991, 255-273.
Wakeling, D., and Runciman, C. "Linearity and Laziness". Proc. Funct. Progr. & Computer Arch., LNCS 523, Springer-Verlag, Aug. 1991, 215-240.
(defun kill (x) ;;; Return no values. (if-atom x (kill-atom x) (dlet* (((carx . cdrx) x)) (kill carx) (kill cdrx) (values)))) (defun dup (x) ;;; Return 2 values. (if-atom x (dup-atom x) (dlet* (((carx . cdrx) x)) (let* ((carx carx-prime (dup carx)) (cdrx cdrx-prime (dup cdrx))) (values (lcons carx cdrx) (lcons carx-prime cdrx-prime))))))
[1] Although this rule seems a bit messy, it is equivalent to having the shallow predicate return multiple values: the predicate itself and its unmodified arguments. This policy is completely consistent with linear semantics.
[2]The comparison predicate passed to a linear generic Quicksort returns 3 values: a boolean value and the two unmodified arguments. Such a predicate allows the linear Quicksort to avoid all copying of array element values.