We show via a counterexample that the quoted statements above are essentially meaningless, because a meaningful discussion of age-based garbage collection schemes requires the presentation of significantly more data in the form of complete object lifetime probability density functions, rather than the one or two data points usually proffered.
Generational garbage collection schemes attempt to take advantage of the intuition that "most cells that die, die young" by separating the objects into different categories based on their age; these categories are called generations. If the intuition is correct, and if most tracing/marking can be focussed on the younger generations, then the categorical schemes can achieve a lower average amount of tracing/marking work per object allocated and thereby achieve a net performance advantage over a non-categorical collector [DeTreville77]. In addition to these first-order savings, one can also achieve second-order savings of increased locality in the memory hierarchy because the more localized tracing work is less likely to disrupt the residence of live objects in the faster memories/caches.
To summarize, generational garbage collection has been advocated for at least three reasons:
Now consider such an application in which new particles are continually being added/created at the same rate that particles are decaying. The average number of particles is then constant over time.[2] In this situation, the same propensity of an object to die independent of its achieved lifetime still holds. In such a radioactive-like system, one will find it true that "most objects die young", and for any single measurement of the form "only 2% of the objects survived 1 second", it is trivial to compute a "half life" consistent with this observation--i.e., tau=177 msec.
However, if one attempts to utilize a generational garbage collector for this application, one finds that no matter how the generations are chosen, the decay rate for each generation is proportional to the size of the generation, not to its age, and therefore, the average mark/cons ratio for each generation is the same. For such an application, a generational garbage collector cannot improve the average mark/cons ratio, and the additional overheads of generational collection may cause the total amount of work to increase relative to a non-generational collector. On the other hand, a generational garbage collector may show increased locality of reference, and/or better average response, due to second-order effects. Thus, it is very important for papers which provide measurements of garbage collection algorithms to document the mark/cons ratios, localities and other important parameters, in addition to total runtime and/or GC load.
In the newest generation of microprocessors which incorporate a significant amount of cache memory, and for which a cache "miss" may cost more than 100 instructions, the "second order" locality effects of generational garbage collection may be more important than the "first order" effects of an improved mark/cons ratio. For example, our stack-based "lazy allocation" generational scheme [Baker92] is highly integrated with the programming language implementation, and shares most of its work with the existing programming language parameter-passing and value-returning mechanisms, and therefore avoids gratuitous non-local behavior.
Our personal belief is that the object lifetimes found in garbage collection measurements cannot be adequately modelled by traditional probability/statistics models which assume that distributions have finite "standard deviations", but will require fractal models [Mandelbrot83], which can have infinite second moments.
Appel, A.W. "Simple Generational Garbage Collection and Fast Allocation". Soft. Prac. & Exper. 19,2 (Feb. 1989), 171-183.
[Baker78] Baker, H.G. "List Processing in Real Time on a Serial Computer". CACM 21,4 (April 1978), 280-294.
[Baker92] Baker, H.G. "CONS Should not CONS its Arguments, or, a Lazy Alloc is a Smart Alloc". ACM Sigplan Not. 27,3 (March 1992), 24-34.
Bekkers, Y., and Cohen, J. eds. Memory Management: Proceedings of the International Workshop on Memory Management, St. Malo, France. Springer LNCS 637, 1992.
Clark, D.W., and Green, C.C. "An Empirical Study of List Structure in LISP". CACM 20,2 (Feb. 1977),78-87.
DeTreville, John. "Reducing the Cost of Garbage Collection". Unpublished manuscript, May, 1977.
Lieberman, H., and Hewitt, C. "A Real-Time Garbage Collector Based on the Lifetimes of Objects". CACM 26,6 (June 1983), 419-429.
Mandelbrot, B. The Fractal Geometry of Nature. W.H.Freeman & Co., New York, 1983.
Moon, D. "Garbage Collection in a Large Lisp System". ACM Symp. on Lisp and Functional Prog., Austin, TX, 1984, 235-246.
Unger, D. "Generation Scavenging: A non-disruptive, high performance storage reclamation algorithm". ACM Soft. Eng. Symp. on Prac. Software Dev. Envs., Sigplan Not. 19,6 (June 1984), 157-167.
Wilson, Paul R. "Some Issues and Strategies in Heap Management and Memory Hierarchies". ACM Sigplan Not. 26,3 (March 1991), 45-52.
[1] Can't computer scientists come up with less grisly names for these concepts?
[2]We consider the simple situation in which a new particle is added/created exactly when one decays. The more likely situation in which only the average rates are equal leads to significant deviations from the constant number of live particles and to second-order effects, although the first-order effects are the same.