Ole Agesen, David L. Detlefs. 1997. “Finding References in Java Stacks”. Sun Labs. OOPSLA97 Workshop on Garbage Collection and Memory Management.
Exact garbage collection for the strongly-typed Java language may seem straightforward. Unfortunately, a single pair of bytecodes in the Java Virtual Machine instruction set presents an obstacle that has thus far not been discussed in the literature. We explain the problem, outline the space of possible solutions, and present a solution utilizing bytecode-preprocessing to enable exact garbage collection while maintaining compatibility with existing compiled Java class files.
Ole Agesen, David L. Detlefs, J. Eliot B. Moss. 1998. “Garbage Collection and Local Variable Type-precision and Liveness in Java Virtual Machines”. ACM. Proceedings of the ACM SIGPLAN ‘98 conference on Programming language design and implementation, pp. 269–279.
Full precision in garbage collection implies retaining only those heap allocated objects that will actually be used in the future. Since full precision is not computable in general, garbage collectors use safe (i.e., conservative) approximations such as reachability from a set of root references. Ambiguous roots collectors (commonly called “conservative”) can be overly conservative because they overestimate the root set, and thereby retain unexpectedly large amounts of garbage. We consider two more precise collection schemes for Java virtual machines (JVMs). One uses a type analysis to obtain a type-precise root set (only those variables that contain references); the other adds a live variable analysis to reduce the root set to only the live reference variables. Even with the Java programming language’s strong typing, it turns out that the JVM specification has a feature that makes type-precise root sets difficult to compute. We explain the problem and ways in which it can be solved.
Our experimental results include measurements of the costs of the type and liveness analyses at load time, of the incremental benefits at run time of the liveness analysis over the type-analysis alone, and of various map sixes and counts. We find that the liveness analysis often produces little or no improvement in heap size, sometimes modest improvements, and occasionally the improvement is dramatic. While further study is in order, we conclude that the main benefit of the liveness analysis is preventing bad surprises.
Andrew Appel, John R. Ellis, Kai Li. 1988. “Real-time Concurrent Collection on Stock Multiprocessors”. ACM, SIGPLAN. ACM PLDI 88, SIGPLAN Notices 23, 7 (July 88), pp. 11–20.
We’ve designed and implemented a copying garbage-collection algorithm that is efficient, real-time, concurrent, runs on commercial uniprocessors and shared-memory multiprocessors, and requires no change to compilers. The algorithm uses standard virtual-memory hardware to detect references to “from space” objects and to synchronize the collector and mutator threads. We’ve implemented and measured a prototype running on SRC’s 5-processor Firefly. It will be straightforward to merge our techniques with generational collection. An incremental, non-concurrent version could be implemented easily on many versions of Unix.
Apple Computer, Inc. 1994. Inside Macintosh: Memory. Addison-Wesley. ISBN 0-201-63240-3.
Inside Macintosh: Memory describes the parts of the Macintosh® Operating System that allow you to directly allocate, release, or otherwise manipulate memory. Everyone who programs Macintosh computers should read this book.
Inside Macintosh: Memory shows in detail how your application can manage the memory partition it is allocated and perform other memory-related operations. It also provides a complete technical reference for the Memory Manager, the Virtual Memory Manager, and other memory-related utilities provided by the system software.
Giuseppe Attardi & Tito Flagella. 1994. “A Customisable Memory Management Framework”. TR-94-010.
Memory management is a critical issue for many large object-oriented applications, but in C++ only explicit memory reclamation through the delete operator is generally available. We analyse different possibilities for memory management in C++ and present a dynamic memory management framework which can be customised to the need of specific applications. The framework allows full integration and coexistence of different memory management techniques. The Customisable Memory Management (CMM) is based on a primary collector which exploits an evolution of Bartlett’s mostly copying garbage collector. Specialised collectors can be built for separate memory heaps. A Heap class encapsulates the allocation strategy for each heap. We show how to emulate different garbage collection styles or user-specific memory management techniques. The CMM is implemented in C++ without any special support in the language or the compiler. The techniques used in the CMM are general enough to be applicable also to other languages.
Giuseppe Attardi, Tito Flagella, Pietro Iglio. 1998. “A customisable memory management framework for C++”. Software – Practice and Experience. 28(11), 1143–1183.
Automatic garbage collection relieves programmers from the burden of managing memory themselves and several techniques have been developed that make garbage collection feasible in many situations, including real time applications or within traditional programming languages. However optimal performance cannot always be achieved by a uniform general purpose solution. Sometimes an algorithm exhibits a predictable pattern of memory usage that could be better handled specifically, delaying as much as possible the intervention of the general purpose collector. This leads to the requirement for algorithm specific customisation of the collector strategies. We present a dynamic memory management framework which can be customised to the needs of an algorithm, while preserving the convenience of automatic collection in the normal case. The Customisable Memory Manager (CMM) organises memory in multiple heaps. Each heap is an instance of a C++ class which abstracts and encapsulates a particular storage discipline. The default heap for collectable objects uses the technique of mostly copying garbage collection, providing good performance and memory compaction. Customisation of the collector is achieved exploiting object orientation by defining specialised versions of the collector methods for each heap class. The object oriented interface to the collector enables coexistence and coordination among the various collectors as well as integration with traditional code unaware of garbage collection. The CMM is implemented in C++ without any special support in the language or the compiler. The techniques used in the CMM are general enough to be applicable also to other languages. The performance of the CMM is analysed and compared to other conservative collectors for C/C++ in various configurations.
Alain Azagury, Elliot K. Kolodner, Erez Petrank, Zvi Yehudai. 1998. “Combining Card Marking with Remembered Sets: How to Save Scanning Time”. ACM. ISMM‘98 pp. 10–19.
We consider the combination of card marking with remembered sets for generational garbage collection as suggested by Hosking and Moss. When more than two generations are used, a naive implementation may cause excessive and wasteful scanning of the cards and thus increase the collection time. We offer a simple data structure and a corresponding algorithm to keep track of which cards need be scanned for which generation. We then extend these ideas for the Train Algorithm of Hudson and Moss. Here, the solution is more involved, and allows tracking of which card should be scanned for which car-collection in the train.
Henry G. Baker, Carl Hewitt. 1977. “The Incremental Garbage Collection of Processes”. ACM. SIGPLAN Notices 12, 8 (August 1977), pp. 55–59.
This paper investigates some problems associated with an argument evaluation order that we call “future” order, which is different from both call-by-name and call-by-value. In call-by-future, each formal parameter of a function is bound to a separate process (called a “future”) dedicated to the evaluation of the corresponding argument. This mechanism allows the fully parallel evaluation of arguments to a function, and has been shown to augment the expressive power of a language.
We discuss an approach to a problem that arises in this context: futures which were thought to be relevant when they were created become irrelevant through being ignored in the body of the expression where they were bound. The problem of irrelevant processes also appears in multiprocessing problem-solving systems which start several processors working on the same problem but with different methods, and return with the solution which finishes first. This “parallel method strategy” has the drawback that the processes which are investigating the losing methods must be identified, stopped, and reassigned to more useful tasks.
The solution we propose is that of garbage collection. We propose that the goal structure of the solution plan be explicitly represented in memory as part of the graph memory (like Lisp’s heap) so that a garbage collection algorithm can discover which processes are performing useful work, and which can be recycled for a new task. An incremental algorithm for the unified garbage collection of storage and processes is described.
Henry G. Baker. 1978. “List Processing in Real Time on a Serial Computer”. ACM. Communications of the ACM 21, 4 (April 1978), pp. 280–294.
A real-time list processing system is one in which the time required by the elementary list operations (e.g. CONS, CAR, CDR, RPLACA, RPLACD, EQ, and ATOM in LISP) is bounded by a (small) constant. Classical implementations of list processing systems lack this property because allocating a list cell from the heap may cause a garbage collection, which process requires time proportional to the heap size to finish. A real-time list processing system is presented which continuously reclaims garbage, including directed cycles, while linearizing and compacting the accessible cells into contiguous locations to avoid fragmenting the free storage pool. The program is small and requires no time-sharing interrupts, making it suitable for microcode. Finally, the system requires the same average time, and not more than twice the space, of a classical implementation, and those space requirements can be reduced to approximately classical proportions by compact list representation. Arrays of different sizes, a program stack, and hash linking are simple extensions to our system, and reference counting is found to be inferior for many applications.
Henry G. Baker. 1979. “Optimizing Allocation and Garbage Collection of Spaces”. In Winston and Brown, eds. Artificial Intelligence: An MIT Perspective. MIT Press.
MACLISP, unlike some other implementations of LISP, allocates storage for different types of objects in noncontiguous areas called “spaces”. These spaces partition the active storage into disjoint areas, each of which holds a different type of object. For example, “list cells” are stored in one space, “full-word integers” reside in another space, “full-word floating point numbers” in another, and so on.
Allocating space in this manner has several advantages. An object’s type can easily be computed from a pointer to it, without any memory references to the object itself. Thus, the LISP primitive ATOM(x) can easily compute its result without even paging in x. Another advantage is that the type of an object does not require any storage within the object, so that arithmetic with hardware data types such as full-word integers can use hardware instructions directly.
There are problems associated with this method of storage and type management, however. When all data types are allocated from the same heap, there is no problem with varying demand for the different data types; all data types require storage from the same pool, so that only the total amount of storage is important. Once different data types must be allocated from different spaces, however, the relative sizes of the spaces becomes important.
Henry G. Baker. 1991. “Cache-Conscious Copying Collectors”. OOPSLA‘91/GC‘91 Workshop on Garbage Collection.
Garbage collectors must minimize the scarce resources of cache space and off-chip communications bandwidth to optimize performance on modern single-chip computer architectures. Strategies for achieving these goals in the context of copying garbage collection are discussed. A multi-processor mutator/collector system is analyzed. Finally, the Intel 80860XP architecture is studied.
Henry G. Baker. 1992. “Lively Linear Lisp – ‘Look Ma, No Garbage!’”. ACM. SIGPLAN Notices 27, 8 (August 1992), pp. 89–98.
Linear logic has been proposed as one solution to the problem of garbage collection and providing efficient “update-in-place” capabilities within a more functional language. Linear logic conserves accessibility, and hence provides a “mechanical metaphor” which is more appropriate for a distributed-memory parallel processor in which copying is explicit. However, linear logic’s lack of sharing may introduce significant inefficiencies of its own.
We show an efficient implementation of linear logic called “Linear Lisp” that runs within a constant factor of non-linear logic. This Linear Lisp allows RPLACX operations, and manages storage as safely as a non-linear Lisp, but does not need a garbage collector. Since it offers assignments but no sharing, it occupies a twilight zone between functional languages and imperative languages. Our Linear Lisp Machine offers many of the same capabilities as combinator/graph reduction machines, but without their copying and garbage collection problems.
Henry G. Baker. 1992. “The Treadmill: Real-Time Garbage Collection Without Motion Sickness”. ACM. SIGPLAN Notices 27, 3 (March 1992), pp. 66–70.
A simple real-time garbage collection algorithm is presented which does not copy, thereby avoiding some of the problems caused by the asynchronous motion of objects. This in-place “treadmill” garbage collection scheme has approximately the same complexity as other non-moving garbage collectors, thus making it usable in a high-level language implementation where some pointers cannot be traced. The treadmill is currently being used in a Lisp system built in Ada.
Henry G. Baker. 1992. “CONS Should not CONS its Arguments, or, a Lazy Alloc is a Smart Alloc”. ACM. SIGPLAN Notices 27, 3 (March 1992), 24–34.
“Lazy allocation” is a model for allocating objects on the execution stack of a high-level language which does not create dangling references. Our model provides safe transportation into the heap for objects that may survive the deallocation of the surrounding stack frame. Space for objects that do not survive the deallocation of the surrounding stack frame is reclaimed without additional effort when the stack is popped. Lazy allocation thus performs a first-level garbage collection, and if the language supports garbage collection of the heap, then our model can reduce the amortized cost of allocation in such a heap by filtering out the short-lived objects that can be more efficiently managed in LIFO order. A run-time mechanism called “result expectation” further filters out unneeded results from functions called only for their effects. In a shared-memory multi-processor environment, this filtering reduces contention for the allocation and management of global memory.
Our model performs simple local operations, and is therefore suitable for an interpreter or a hardware implementation. Its overheads for functional data are associated only with assignments, making lazy allocation attractive for “mostly functional” programming styles. Many existing stack allocation optimizations can be seen as instances of this generic model, in which some portion of these local operations have been optimized away through static analysis techniques.
Important applications of our model include the efficient allocation of temporary data structures that are passed as arguments to anonymous procedures which may or may not use these data structures in a stack-like fashion. The most important of these objects are functional arguments (funargs), which require some run-time allocation to preserve the local environment. Since a funarg is sometimes returned as a first-class value, its lifetime can survive the stack frame in which it was created. Arguments which are evaluated in a lazy fashion (Scheme “delays” or “suspensions”) are similarly handled. Variable-length argument “lists” themselves can be allocated in this fashion, allowing these objects to become “first-class”. Finally, lazy allocation correctly handles the allocation of a Scheme control stack, allowing Scheme continuations to become first-class values.
Henry G. Baker. 1992. “NREVERSAL of Fortune – The Thermodynamics of Garbage Collection”. Springer-Verlag. LNCS Vol. 637.
The need to reverse a computation arises in many contexts – debugging, editor undoing, optimistic concurrency undoing, speculative computation undoing, trace scheduling, exception handling undoing, database recovery, optimistic discrete event simulations, subjunctive computing, etc. The need to analyze a reversed computation arises in the context of static analysis – liveness analysis, strictness analysis, type inference, etc. Traditional means for restoring a computation to a previous state involve checkpoints; checkpoints require time to copy, as well as space to store, the copied material. Traditional reverse abstract interpretation produces relatively poor information due to its inability to guess the previous values of assigned-to variables.
We propose an abstract computer model and a programming language – Psi-Lisp – whose primitive operations are injective and hence reversible, thus allowing arbitrary undoing without the overheads of checkpointing. Such a computer can be built from reversible conservative logic circuits, with the serendipitous advantage of dissipating far less heat than traditional Boolean AND/OR/NOT circuits. Unlike functional languages, which have one “state” for all times, Psi-Lisp has at all times one “state”, with unique predecessor and successor states.
Compiling into a reversible pseudocode can have benefits even when targeting a traditional computer. Certain optimizations, e.g., update-in-place, and compile-time garbage collection may be more easily performed, because the information may be elicited without the difficult and time-consuming iterative abstract interpretation required for most non-reversible models.
In a reversible machine, garbage collection for recycling storage can always be performed by a reversed (sub)computation. While this “collection is reversed mutation” insight does not reduce space requirements when used for the computation as a whole, it does save space when used to recycle at finer scales. This insight also provides an explanation for the fundamental importance of the push-down stack both for recognizing palindromes and for managing storage.
Reversible computers are related to Prolog, linear logic and chemical abstract machines.
Henry G. Baker. 1993. “‘Infant Mortality’ and Generational Garbage Collection”. ACM. SIGPLAN Notices 28, 4 (April 1993), pp. 55–57.
Generation-based garbage collection has been advocated by appealing to the intuitive but vague notion that “young objects are more likely to die than old objects”. The intuition is, that if a generation-based garbage collection scheme focuses its effort on scanning recently created objects, then its scanning efforts will pay off more in the form of more recovered garbage, than if it scanned older objects. In this note, we show a counterexample of a system in which “infant mortality” is as high as you please, but for which generational garbage collection is ineffective for improving the average mark/cons ratio. Other benefits, such as better locality and a smaller number of large delays, may still make generational garbage collection attractive for such a system, however.
Henry G. Baker. 1993. “Equal Rights for Functional Objects or, The More Things Change, The More They Are the Same”. ACM. OOPS Messenger 4, 4 (October 1993), pp. 2–27.
We argue that intensional object identity in object-oriented programming languages and databases is best defined operationally by side-effect semantics. A corollary is that “functional” objects have extensional semantics. This model of object identity, which is analogous to the normal forms of relational algebra, provides cleaner semantics for the value-transmission operations and built-in primitive equality predicate of a programming language, and eliminates the confusion surrounding “call-by-value” and “call-by-reference” as well as the confusion of multiple equality predicates.
Implementation issues are discussed, and this model is shown to have significant performance advantages in persistent, parallel, distributed and multilingual processing environments. This model also provides insight into the “type equivalence” problem of Algol-68, Pascal and Ada.
Henry G. Baker. 1994. “Minimizing Reference Count Updating with Deferred and Anchored Pointers for Functional Data Structures”. ACM. SIGPLAN Notices 29, 9 (September 1994), pp. 38–43.
“Reference counting” can be an attractive form of dynamic storage management. It recovers storage promptly and (with a garbage stack instead of a free list) it can be made “real-time” – i.e., all accesses can be performed in constant time. Its major drawbacks are its inability to reclaim cycles, its count storage, and its count update overhead. Update overhead is especially irritating for functional (read-only) data where updates may dirty pristine cache lines and pages.
We show how reference count updating can be largely eliminated for functional data structures by using the “linear style” of programming that is inspired by Girard’s linear logic, and by distinguishing normal pointers from “anchored pointers”, which indicate not only the object itself, but also the depth of the stack frame that anchors the object. An “anchor” for a pointer is essentially an enclosing data structure that is temporarily locked from being collected for the duration of the anchored pointer’s existence by a deferred reference count. An “anchored pointer” thus implies a reference count increment that has been deferred until it is either cancelled or performed.
Anchored pointers are generalizations of “borrowed” pointers and “phantom” pointers. Anchored pointers can provide a solution to the “derived pointer problem” in garbage collection.
Henry G. Baker. 1994. “Thermodynamics and Garbage Collection”. ACM. SIGPLAN Notices 29, 4 (April 1994), pp. 58–63.
We discuss the principles of statistical thermodynamics and their application to storage management problems. We point out problems which result from imprecise usage of the terms “information”, “state”, “reversible”, “conservative”, etc.
Henry G. Baker. 1995. “‘Use-Once’ Variables and Linear Objects – Storage Management, Reflection and Multi-Threading”. ACM. SIGPLAN Notices 30, 1 (January 1995), pp. 45–52.
Programming languages should have ‘use-once’ variables in addition to the usual ‘multiple-use’ variables. ‘Use-once’ variables are bound to linear (unshared, unaliased, or singly-referenced) objects. Linear objects are cheap to access and manage, because they require no synchronization or tracing garbage collection. Linear objects can elegantly and efficiently solve otherwise difficult problems of functional/mostly-functional systems – e.g., in-place updating and the efficient initialization of functional objects. Use-once variables are ideal for directly manipulating resources which are inherently linear such as freelists and ‘engine ticks’ in reflective languages.
A ‘use-once’ variable must be dynamically referenced exactly once within its scope. Unreferenced use-once variables must be explicitly killed, and multiply-referenced use-once variables must be explicitly copied; this duplication and deletion is subject to the constraint that some linear datatypes do not support duplication and deletion methods. Use-once variables are bound only to linear objects, which may reference other linear or non-linear objects. Non-linear objects can reference other non-linear objects, but can reference a linear object only in a way that ensures mutual exclusion.
Although implementations have long had implicit use-once variables and linear objects, most languages do not provide the programmer any help for their utilization. For example, use-once variables allow for the safe/controlled use of reified language implementation objects like single-use continuations.
Linear objects and use-once variables map elegantly into dataflow models of concurrent computation, and the graphical representations of dataflow models make an appealing visual linear programming language.
Henry G. Baker. 1995. Memory Management: International Workshop IWMM‘95. Springer-Verlag. ISBN 3-540-60368-9.
From the Preface
The International Workshop on Memory Management 1995 (IWMM‘95) is a continuation of the excellent series started by Yves Bekkers and Jacques Cohen with IWMM‘92. The present volume assembles the refereed and invited technical papers which were presented during this year’s workshop.
Nick Barnes, Richard Brooksby, David Jones, Gavin Matthews, Pekka P. Pirinen, Nick Dalton, P. Tucker Withington. 1997. “A Proposal for a Standard Memory Management Interface”. OOPSLA97 Workshop on Garbage Collection and Memory Management.
From the notes
There is no well-defined memory-management library API which would allow programmers to easily choose the best memory management implementation for their application.
Some languages allow replacement of their memory management functions, but usually only the program API is specified, hence replacement of the entire program interface is required.
Few languages support multiple memory management policies within a single program. Those that do use proprietary memory management policies.
We believe that the design of an abstract program API is a prerequisite to the design of a “server” API and eventually an API that would permit multiple cooperating memory “servers”. If the interface is simple yet powerful enough to encompass most memory management systems, it stands a good chance of being widely adopted.
David A. Barrett, Benjamin Zorn. 1993. “Using Lifetime Predictors to Improve Memory Allocation Performance”. ACM. SIGPLAN‘93 Conference on Programming Language Design and Implementation, pp. 187–196.
Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated. Using five significant, allocation-intensive C programs, we show that a great fraction of all bytes allocated are short-lived (> 90% in all cases). Furthermore, we describe an algorithm for lifetime prediction that accurately predicts the lifetimes of 42–99% of all objects allocated. We describe and simulate a storage allocator that takes advantage of lifetime prediction of short-lived objects and show that it can significantly improve a program’s memory overhead and reference locality, and even, at times, improve CPU performance as well.
David A. Barrett, Benjamin Zorn. 1995. “Garbage Collection using a Dynamic Threatening Boundary”. ACM. SIGPLAN‘95 Conference on Programming Language Design and Implementation, pp. 301–314.
Generational techniques have been very successful in reducing the impact of garbage collection algorithms upon the performance of programs. However, it is impossible for designers of collection algorithms to anticipate the memory allocation behavior of all applications in advance. Existing generational collectors rely upon the applications programmer to tune the behavior of the collector to achieve maximum performance for each application. Unfortunately, because the many tuning parameters require detailed knowledge of both the collection algorithm and the program allocation behavior in order to be used effectively, such tuning is difficult and error prone. We propose a new garbage collection algorithm that uses just two easily understood tuning parameters that directly reflect the maximum memory and pause time constraints familiar to application programmers and users.
Like generational collectors, ours divides memory into two spaces, one for short-lived, and another for long-lived objects. Unlike previous work, our collector dynamically adjusts the boundary between these two spaces in order to directly meet the resource constraints specified by the user. We describe two methods for adjusting this boundary, compare them with several existing algorithms, and show how effectively ours meets the specified constraints. Our pause time collector saved memory by holding median pause times closer to the constraint than the other pause time constrained algorithm and, when not over-constrained, our memory constrained collector exhibited the lowest CPU overhead of the algorithms we measured yet was capable of maintaining a maximum memory constraint.
Joel F. Bartlett. 1988. “Compacting Garbage Collection with Ambiguous Roots”. Digital Equipment Corporation.
This paper introduces a copying garbage collection algorithm which is able to compact most of the accessible storage in the heap without having an explicitly defined set of pointers that contain all the roots of all accessible storage. Using “hints” found in the processor’s registers and stack, the algorithm is able to divide heap allocated objects into two groups: those that might be referenced by a pointer in the stack or registers, and those that are not. The objects which might be referenced are left in place, and the other objects are copied into a more compact representation.
A Lisp compiler and runtime system which uses such a collector need not have complete control of the processor in order to force a certain discipline on the stack and registers. A Scheme implementation has been done for the Digital WRL Titan processor which uses a garbage collector based on this “mostly copying” algorithm. Like other languages for the Titan, it uses the Mahler intermediate language as its target. This simplifies the compiler and allows it to take advantage of the significant machine dependent optimizations provided by Mahler. The common intermediate language also simplifies call-outs from Scheme programs to functions written in other languages and call-backs from functions in other languages.
Measurements of the Scheme implementation show that the algorithm is efficient, as little unneeded storage is retained and only a very small fraction of the heap is left in place.
Simple pointer manipulation protocols also mean that compiler support is not needed in order to correctly handle pointers. Thus it is reasonable to provide garbage collected storage in languages such as C. A collector written in C which uses this algorithm is included in the Appendix.
Joel F. Bartlett. 1989. “Mostly-Copying Garbage Collection Picks Up Generations and C++”. Digital Equipment Corporation.
The “mostly-copying” garbage collection algorithm provides a way to perform compacting garbage collection in spite of the presence of ambiguous pointers in the root set. As originally defined, each collection required almost all accessible objects to be moved. While adequate for many applications, programs that retained a large amount of storage spent a significant amount of time garbage collecting. To improve performance of these applications, a generational version of the algorithm has been designed. This note reports on this extension of the algorithm, and its application in collectors for Scheme and C++.
Yves Bekkers & Jacques Cohen. 1992. “Memory Management, International Workshop IWMM 92”. Springer-Verlag. LNCS Vol. 637, ISBN 3-540-55940-X.
Emery D. Berger, Robert D. Blumofe. 1999. “Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-Memory Multiprocessors”. University of Texas at Austin. UTCS TR99-22.
In this paper, we present Hoard, a memory allocator for shared-memory multiprocessors. We prove that its worst-case memory fragmentation is asymptotically equivalent to that of an optimal uniprocessor allocator. We present experiments that demonstrate its speed and scalability.
Emery D. Berger, Benjamin G. Zorn, Kathryn S. McKinley. 2001. “Composing high-performance memory allocators” ACM SIGPLAN Conference on Programming Language Design and Implementation 2001, pp. 114–124.
Current general-purpose memory allocators do not provide sufficient speed or flexibility for modern high-performance applications. Highly-tuned general purpose allocators have per-operation costs around one hundred cycles, while the cost of an operation in a custom memory allocator can be just a handful of cycles. To achieve high performance, programmers often write custom memory allocators from scratch – a difficult and error-prone process.
In this paper, we present a flexible and efficient infrastructure for building memory allocators that is based on C++ templates and inheritance. This novel approach allows programmers to build custom and general-purpose allocators as “heap layers” that can be composed without incurring any additional runtime overhead or additional programming cost. We show that this infrastructure simplifies allocator construction and results in allocators that either match or improve the performance of heavily-tuned allocators written in C, including the Kingsley allocator and the GNU obstack library. We further show this infrastructure can be used to rapidly build a general-purpose allocator that has performance comparable to the Lea allocator, one of the best uniprocessor allocators available. We thus demonstrate a clean, easy-to-use allocator interface that seamlessly combines the power and efficiency of any number of general and custom allocators within a single application.
Hans-J. Boehm, Mark Weiser. 1988. “Garbage collection in an uncooperative environment”. Software – Practice and Experience. 18(9):807–820.
We describe a technique for storage allocation and garbage collection in the absence of significant co-operation from the code using the allocator. This limits garbage collection overhead to the time actually required for garbage collection. In particular, application programs that rarely or never make use of the collector no longer encounter a substantial performance penalty. This approach greatly simplifies the implementation of languages supporting garbage collection. It further allows conventional compilers to be used with a garbage collector, either as the primary means of storage reclamation, or as a debugging tool.
Hans-J. Boehm, Alan J. Demers, Scott Shenker. 1991. “Mostly Parallel Garbage Collection”. Xerox PARC. ACM PLDI 91, SIGPLAN Notices 26, 6 (June 1991), pp. 157–164.
We present a method for adapting garbage collectors designed to run sequentially with the client, so that they may run concurrently with it. We rely on virtual memory hardware to provide information about pages that have been updated or “dirtied” during a given period of time. This method has been used to construct a mostly parallel trace-and-sweep collector that exhibits very short pause times. Performance measurements are given.
Hans-J. Boehm, David Chase. 1992. “A Proposal for Garbage-Collector-Safe C Compilation”. Journal of C Language Translation. vol. 4, 2 (December 1992), pp. 126–141.
Conservative garbage collectors are commonly used in combination with conventional C programs. Empirically, this usually works well. However, there are no guarantees that this is safe in the presence of “improved” compiler optimization. We propose that C compilers provide a facility to suppress optimizations that are unsafe in the presence of conservative garbage collection. Such a facility can be added to an existing compiler at very minimal cost, provided the additional analysis is done in a machine-independent source-to-source prepass. Such a prepass may also check the source code for garbage-collector-safety.
Hans-J. Boehm. 1993. “Space Efficient Conservative Garbage Collection”. ACM, SIGPLAN. Proceedings of the ACM SIGPLAN ‘91 Conference on Programming Language Design and Implementation, SIGPLAN Notices 28, 6, pp 197–206.
We call a garbage collector conservative if it has only partial information about the location of pointers, and is thus forced to treat arbitrary bit patterns as though they might be pointers, in at least some cases. We show that some very inexpensive, but previously unused techniques can have dramatic impact on the effectiveness of conservative garbage collectors in reclaiming memory. Our most significant observation is that static data that appears to point to the heap should not result in misidentified reference to the heap. The garbage collector has enough information to allocate around such references. We also observe that programming style has a significantly impact on the amount of spuriously retained storage, typically even if the collector is not terribly conservative. Some fairly common C and C++ programming styles significantly decrease the effectiveness of any garbage collector. These observations suffice to explain some of the different assessments of conservative collection that have appeared in the literature.
Hans-J. Boehm. 2000. “Reducing Garbage Collector Cache Misses”. ACM. ISMM‘00 pp. 59–64.
Cache misses are currently a major factor in the cost of garbage collection, and we expect them to dominate in the future. Traditional garbage collection algorithms exhibit relatively litle temporal locality; each live object in the heap is likely to be touched exactly once during each garbage collection. We measure two techniques for dealing with this issue: prefetch-on-grey, and lazy sweeping. The first of these is new in this context. Lazy sweeping has been in common use for a decade. It was introduced as a mechanism for reducing paging and pause times; we argue that it is also crucial for eliminating cache misses during the sweep phase.
Our measurements are obtained in the context of a non-moving garbage collector. Fully copying garbage collection inherently requires more traffic through the cache, and thus probably also stands to benefit substantially from something like the prefetch-on-grey technique. Generational garbage collection may reduce the benefit of these techniques for some applications, but experiments with a non-moving generational collector suggest that they remain quite useful.
Hans-J. Boehm. 2001. “Bounding Space Usage of Conservative Garbage Collectors http://www.hpl.hp.com/techreports/2001/HPL-2001-251.html”. HP Labs technical report HPL-2001-251.
Conservative garbage collectors can automatically reclaim unused memory in the absence of precise pointer location information. If a location can possibly contain a pointer, it is treated by the collector as though it contained a pointer. Although it is commonly assumed that this can lead to unbounded space use due to misidentified pointers, such extreme space use is rarely observed in practice, and then generally only if the number of misidentified pointers is itself unbounded. We show that if the program manipulates only data structures satisfying a simple GC-robustness criterion, then a bounded number of misidentified pointers can result at most in increasing space usage by a constant factor. We argue that nearly all common data structures are already GC- robust, and it is typically easy to identify and replace those that are not. Thus it becomes feasible to prove space bounds on programs collected by mildly conservative garbage collectors, such as the one in Barabash et al. (2001). The worst-case space overhead introduced by such mild conservatism is comparable to the worst-case fragmentation overhead for inherent in any non-moving storage allocator. The same GC-robustness criterion also ensures the absence of temporary space leaks of the kind discussed in Rojemo (1995) for generational garbage collectors.
Hans-J. Boehm. 2002. “Destructors, Finalizers, and Synchronization”. HP Labs technical report HPL-2002-335.
We compare two different facilities for running cleanup actions for objects that are about to reach the end of their life. Destructors, such as we find in C++, are invoked synchronously when an object goes out of scope. They make it easier to implement cleanup actions for objects of well-known lifetime, especially in the presence of exceptions. Languages like Java, Modula-3, and C# provide a different kind of “finalization” facility: Cleanup methods may be run when the garbage collector discovers a heap object to be otherwise inaccessible. Unlike C++ destructors, such methods run in a separate thread at some much less well-defined time. We argue that these are fundamentally different, and potentially complementary, language facilities. We also try to resolve some common misunderstandings about finalization in the process. In particular: 1. The asynchronous nature of finalizers is not just an accident of implementation or a shortcoming of tracing collectors; it is necessary for correctness of client code, fundamentally affects how finalizers must be written, and how finalization facilities should be presented to the user. 2. An object may legitimately be finalized while one of its methods are still running. This should and can be addressed by the language specification and client code.
Robert S. Boyer and J. Strother Moore. 1977. “A Fast String Searching Algorithm”. Communications of the ACM 20(10):762–772.
An algorithm is presented that searches for the location, “i,” of the first occurrence of a character string, “pat,” in another string, “string.” During the search operation, the characters of pat are matched starting with the last character of pat. The information gained by starting the match at the end of the pattern often allows the algorithm to proceed in large jumps through the text being searched. Thus the algorithm has the unusual property that, in most cases, not all of the first i characters of string are inspected. The number of characters actually inspected (on the average) decreases as a function of the length of pat. For a random English pattern of length 5, the algorithm will typically inspect i/4 characters of string before finding a match at i. Furthermore, the algorithm has been implemented so that (on the average) fewer than i + patlen machine instructions are executed. These conclusions are supported with empirical evidence and a theoretical analysis of the average behavior of the algorithm. The worst case behavior of the algorithm is linear in i + patlen, assuming the availability of array space for tables linear in patlen plus the size of the alphabet.
P. Branquart, J. Lewi. 1972. “A scheme of storage allocation and garbage collection for ALGOL 68”. Elsevier/North-Holland. ALGOL 68 Implementation – Proceedings of the IFIP Working Conference on ALGOL 68 Implementation, July 1970.
Richard Brooksby. 2002. “The Memory Pool System: Thirty person-years of memory management development goes Open Source”. ISMM‘02.
The Memory Pool System (MPS) is a very general, adaptable, flexible, reliable, and efficient memory management system. It permits the flexible combination of memory management techniques, supporting manual and automatic memory management, in-line allocation, finalization, weakness, and multiple simultaneous co-operating incremental generational garbage collections. It also includes a library of memory pool classes implementing specialized memory management policies.
Between 1994 and 2001, Harlequin (now part of Global Graphics) invested about thirty person-years of effort developing the MPS. The system contained many innovative techniques and abstractions which were kept secret. In 1997 Richard Brooksby, the manager and chief architect of the project, and Nicholas Barnes, a senior developer, left Harlequin to form their own consultancy company, Ravenbrook, and in 2001, Ravenbrook acquired the MPS technology from Global Graphics. We are happy to announce that we are publishing the source code and documentation under an open source licence. This paper gives an overview of the system.
International Standard ISO/IEC 9899:1990. “Programming languages — C”.
International Standard ISO/IEC 9899:1999. “Programming languages — C”.
Brad Calder, Dirk Grunwald, Benjamin Zorn. 1994. “Quantifying Behavioral Differences Between C and C++ Programs”. Journal of Programming Languages. 2(4):313–351.
Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs execute faster. In many application domains, the C++ language is replacing C as the programming language of choice. In this paper, we measure the empirical behavior of a group of significant C and C++ programs and attempt to identify and quantify behavioral differences between them. Our goal is to determine whether optimization technology that has been successful for C programs will also be successful in C++ programs. We furthermore identify behavioral characteristics of C++ programs that suggest optimizations that should be applied in those programs. Our results show that C++ programs exhibit behavior that is significantly different than C programs. These results should be of interest to compiler writers and architecture designers who are designing systems to execute object-oriented programs.
Dante J. Cannarozzi, Michael P. Plezbert, Ron K. Cytron. 2000. “Contaminated garbage collection”. ACM. Proceedings of the ACM SIGPLAN ‘00 conference on on Programming language design and implementation, pp. 264–273.
We describe a new method for determining when an object can be garbage collected. The method does not require marking live objects. Instead, each object X is dynamically associated with a stack frame M, such that X is collectable when M pops. Because X could have been dead earlier, our method is conservative. Our results demonstrate that the methos nonetheless idenitifies a large percentage of collectable objects. The method has been implemented in Sun’s Java™ Virtual Machine interpreter, and results are presented based on this implementation.
Patrick J. Caudill, Allen Wirfs-Brock. 1986. “A Third-Generation Smalltalk-80 Implementation”. ACM. SIGPLAN Notices. 21(11), OOPSLA‘86 ACM Conference on Object-Oriented Systems, Languages and Applications.
A new, high performance Smalltalk-80™ implementation is described which builds directly upon two previous implementation efforts. This implementation supports a large object space while retaining compatibility with previous Smalltalk-80™ images. The implementation utilizes a interpreter which incorporates a generation based garbage collector and which does not have an object table. This paper describes the design decisions which lead to this implementation and reports preliminary performance results.
C. J. Cheney. 1970. “A non-recursive list compacting algorithm”. CACM. 13-11 pp. 677–678.
A simple nonrecursive list structure compacting scheme or garbage collector suitable for both compact and LISP-like list structures is presented. The algorithm avoids the need for recursion by using the partial structure as it is built up to keep track of those lists that have been copied.
Perry Cheng, Robert Harper, Peter Lee. 1998. “Generational stack collection and profile-driven pretenuring”. ACM. Proceedings of SIGPLAN‘98 Conference on Programming Language Design and Implementation, pp. 162–173.
This paper presents two techniques for improving garbage collection performance: generational stack collection and profile-driven pretenuring. The first is applicable to stack-based implementations of functional languages while the second is useful for any generational collector. We have implemented both techniques in a generational collector used by the TIL compiler, and have observed decreases in garbage collection times of as much as 70% and 30%, respectively.
Functional languages encourage the use of recursion which can lead to a long chain of activation records. When a collection occurs, these activation records must be scanned for roots. We show that scanning many activation records can take so long as to become the dominant cost of garbage collection. However, most deep stacks unwind very infrequently, so most of the root information obtained from the stack remains unchanged across successive garbage collections. Generational stack collection greatly reduces the stack scan cost by reusing information from previous scans.
Generational techniques have been successful in reducing the cost of garbage collection. Various complex heap arrangements and tenuring policies have been proposed to increase the effectiveness of generational techniques by reducing the cost and frequency of scanning and copying. In contrast, we show that by using profile information to make lifetime predictions, pretenuring can avoid copying data altogether. In essence, this technique uses a refinement of the generational hypothesis (most data die young) with a locality principle concerning the age of data: most allocations sites produce data that immediately dies, while a few allocation sites consistently produce data that survives many collections.
Trishul M. Chilimbi, James R. Larus. 1998. “Using Generational Garbage Collection To Implement Cache-Conscious Data Placement”. ACM. ISMM‘98 pp. 37–48.
Processor and memory technology trends show a continual increase in the cost of accessing main memory. Machine designers have tried to mitigate the effect of this trend through a variety of techniques that attempt to reduce or tolerate memory latency. These techniques, unfortunately, have only been partially successful for pointer-manipulating programs. Recent research has demonstrated that these programs can benefit greatly from the complementary approach of reorganizing pointer data structures to improve cache locality. This paper describes how a generational garbage collector can be used to achieve a cache-conscious data layout, in which objects with high temporal affinity are placed next to each other, so they are likely to reside in the same cache block. The paper demonstrates the feasibility of collecting low overhead, real-time profiling information about data access patterns for object-oriented languages, and describes a new copying algorithm that utilizes this information to produce a cache-conscious object layout. Preliminary results indicate that this technique reduces cache miss rates by 21-42%, and improves program performance by 14-37%.
William D Clinger & Lars T Hansen. 1997. “Generational Garbage Collection and the Radioactive Decay Model”. ACM. Proceedings of PLDI 1997.
If a fixed exponentially decreasing probability distribution function is used to model every object’s lifetime, then the age of an object gives no information about its future life expectancy. This radioactive decay model implies that there can be no rational basis for deciding which live objects should be promoted to another generation. Yet there remains a rational basis for deciding how many objects to promote, when to collect garbage, and which generations to collect.
Analysis of the model leads to a new kind of generational garbage collector whose effectiveness does not depend upon heuristics that predict which objects will live longer than others.
This result provides insight into the computational advantages of generational garbage collection, with implications for the management of objects whose life expectancies are difficult to predict.
Jacques Cohen. 1981. “Garbage collection of linked data structures”. Computing Surveys. Vol. 13, no. 3.
A concise and unified view of the numerous existing algorithms for performing garbage collection of linked data structures is presented. The emphasis is on garbage collection proper, rather than on storage allocation.
First, the classical garbage collection algorithms and their marking and collecting phases, with and without compacting, are discussed.
Algorithms describing these phases are classified according to the type of cells to be collected: those for collecting single-sized cells are simpler than those for varisized cells. Recently proposed algorithms are presented and compared with the classical ones. Special topics in garbage collection are also covered. A bibliography with topical annotations is included.
Dominique Colnet, Philippe Coucaud, Olivier Zendra. 1998. “Compiler Support to Customize the Mark and Sweep Algorithm”. ACM. ISMM‘98 pp. 154–165.
Mark and sweep garbage collectors (GC) are classical but still very efficient automatic memory management systems. Although challenged by other kinds of systems, such as copying collectors, mark and sweep collectors remain among the best in terms of performance.
This paper describes our implementation of an efficient mark and sweep garbage collector tailored to each program. Compiler support provides the type information required to statically and automatically generate this customized garbage collector. The segregation of object by type allows the production of a more efficient GC code. This technique, implemented in SmallEiffel, our compiler for the object-oriented language Eiffel, is applicable to other languages and other garbage collection algorithms, be they distributed or not.
We present the results obtained on programs featuring a variety of programming styles and compare our results to a well-known and high-quality garbage collector.
Jonathan E. Cook, Alexander L. Wolf, Benjamin Zorn. 1994. “Partition Selection Policies in Object Database Garbage Collection”. ACM. SIGMOD. International Conference on the Management of Data (SIGMOD‘94), pp. 371–382.
The automatic reclamation of storage for unreferenced objects is very important in object databases. Existing language system algorithms for automatic storage reclamation have been shown to be inappropriate. In this paper, we investigate methods to improve the performance of algorithms for automatic storage reclamation of object databases. These algorithms are based on a technique called partitioned garbage collection, in which a subset of the entire database is collected independently of the rest. Specifically, we investigate the policy that is used to select what partition in the database should be collected. The new partition selection policies that we propose and investigate are based on the intuition that the values of overwritten pointers provide good hints about where to find garbage. Using trace-driven simulation, we show that one of our policies requires less I/O to collect more garbage than any existing implementable policy and performs close to an impractical-to-implement but near-optimal policy over a wide range of database sizes and connectivities.
Jonathan E. Cook, Artur Klauser, Alexander L. Wolf, Benjamin Zorn. 1996. “Semi-automatic, Self-adaptive Control of Garbage Collection Rates in Object Databases”. ACM, SIGMOD. International Conference on the Management of Data (SIGMOD‘96), pp. 377–388.
A fundamental problem in automating object database storage reclamation is determining how often to perform garbage collection. We show that the choice of collection rate can have a significant impact on application performance and that the “best” rate depends on the dynamic behavior of the application, tempered by the particular performance goals of the user. We describe two semi-automatic, self-adaptive policies for controlling collection rate that we have developed to address the problem. Using trace-driven simulations, we evaluate the performance of the policies on a test database application that demonstrates two distinct reclustering behaviors. Our results show that the policies are effective at achieving user-specified levels of I/O operations and database garbage percentage. We also investigate the sensitivity of the policies over a range of object connectivities. The evaluation demonstrates that semi-automatic, self-adaptive policies are a practical means for flexibly controlling garbage collection rate.
Eric Cooper, Scott Nettles, Indira Subramanian. 1992. “Improving the Performance of SML Garbage Collection using Application-Specific Virtual Memory Management”. ACM Conference on LISP and Functional Programming, pp. 43–52.
We improved the performance of garbage collection in the Standard ML of New Jersey system by using the virtual memory facilities provided by the Mach kernel. We took advantage of Mach’s support for large sparse address spaces and user-defined paging servers. We decreased the elapsed time for realistic applications by as much as a factor of 4.
Michael C. Daconta. 1993. C Pointers and Dynamic Memory Management. Wiley. ISBN 0-471-56152-5.
Michael C. Daconta. 1995. C++ Pointers and Dynamic Memory Management. Wiley. ISBN 0-471-04998-0.
From the back cover
Using techniques developed in the classroom at America Online’s Programmer’s University, Michael Daconta deftly pilots programmers through the intricacies of the two most difficult aspects of C++ programming: pointers and dynamic memory management. Written by a programmer for programmers, this no-nonsense, nuts-and-bolts guide shows you how to fully exploit advanced C++ programming features, such as creating class-specific allocators, understanding references versus pointers, manipulating multidimensional arrays with pointers, and how pointers and dynamic memory are the core of object-oriented constructs like inheritance, name-mangling, and virtual functions.
O.-J. Dahl. 1963. “The SIMULA Storage Allocation Scheme”. Norsk Regnesentral. NCC Document no. 162.
P. J. Denning. 1968. “Thrashing: Its Causes and Prevention”. Proceedings AFIPS,1968 Fall Joint Computer Conference, vol. 33, pp. 915–922.
From the introduction
A particularly troublesome phenomenon, thrashing, may seriously interfere with the performance of paged memory systems, reducing computing giants (Multics, IBM System 360, and others not necessarily excepted) to computing dwarfs. The term thrashing denotes excessive overhead and severe performance degradation or collapse caused by too much paging. Thrashing inevitably turns a shortage of memory space into a surplus of processor time.
P. J. Denning. 1970. “Virtual Memory”. ACM. ACM Computing Surveys, vol. 2, no. 3, pp. 153–190, Sept. 1970.
The need for automatic storage allocation arises from desires for program modularity, machine independence, and resource sharing. Virtual memory is an elegant way of achieving these objectives. In a virtual memory, the addresses a program may use to identify information are distinguished from the addresses the memory system uses to identify physical storage sites, and program-generated addresses are translated automatically to the corresponding machine addresses. Two principal methods for implementing virtual memory, segmentation and paging, are compared and contrasted. Many contemporary implementations have experienced one or more of these problems: poor utilization of storage, thrashing, and high costs associated with loading information into memory. These and subsidiary problems are studied from a theoretic view, and are shown to be controllable by a proper combination of hardware and memory management policies.
P. J. Denning, S. C. Schwartz. 1972. “Properties of the Working-set Model”. CACM. vol. 15, no. 3, pp. 191–198.
A program’s working set W(t, T) at time t is the set of distinct pages among the T most recently referenced pages. Relations between the average working-set size, the missing-page rate, and the interreference-interval distribution may be derived both from time-average definitions and from ensemble-average (statistical) definitions. An efficient algorithm for estimating these quantities is given. The relation to LRU (least recently used) paging is characterized. The independent-reference model, in which page references are statistically independent, is used to assess the effects of interpage dependencies on working-set size observations. Under general assumptions, working-set size is shown to be normally distributed.
David L. Detlefs. 1992. “Garbage collection and runtime typing as a C++ library”. USENIX C++ Conference.
From the introduction
Automatic storage management, or garbage collection, is a feature that can ease program development and enhance program reliability. Many high-level languages other than C++ provide garbage collection. This paper proposes the use of “smart pointer” template classes as an interface for the use of garbage collection in C++. Template classes and operator overloading are techniques allowing language extension at the level of user code; I claim that using these techniques to create smart pointer classes provdes a syntax for manipulating garbage-collected storage safely and conveniently. Further, the use of a smart-pointer template class offers the possibility of implementing the collector at the user-level, without requiring support from the compiler. If such a compiler-independent implementation is possible with adequate performance, then programmers can start to write code using garbage collection without waiting for language and compiler modifications. If the use of such a garbage collection interface becomes widespread, then C++ compilation systems can be built to specially support tht garbage collection interface, thereby allowing the use of collection algorithms with enhanced performance.
David L. Detlefs, Al Dosser, Benjamin Zorn. 1994. “Memory Allocation Costs in Large C and C++ Programs”. Software – Practice and Experience. 24(6):527–542.
Dynamic storage allocation is an important part of a large class of computer programs written in C and C++. High-performance algorithms for dynamic storage allocation have been, and will continue to be, of considerable interest. This paper presents detailed measurements of the cost of dynamic storage allocation in 11 diverse C and C++ programs using five very different dynamic storage allocation implementations, including a conservative garbage collection algorithm. Four of the allocator implementations measured are publicly-available on the Internet. A number of the programs used in these measurements are also available on the Internet to facilitate further research in dynamic storage allocation. Finally, the data presented in this paper is an abbreviated version of more extensive statistics that are also publicly-available on the Internet.
L. Peter Deutsch, Daniel G. Bobrow. 1976. “An Efficient, Incremental, Automatic Garbage Collector”. CACM. vol. 19, no. 9, pp. 522–526.
This paper describes a new way of solving the storage reclamation problem for a system such as Lisp that allocates storage automatically from a heap, and does not require the programmer to give any indication that particular items are no longer useful or accessible. A reference count scheme for reclaiming non-self-referential structures, and a linearizing, compacting, copying scheme to reorganize all storage at the users discretion are proposed. The algorithms are designed to work well in systems which use multiple levels of storage, and large virtual address space. They depend on the fact that most cells are referenced exactly once, and that reference counts need only be accurate when storage is about to be reclaimed. A transaction file stores changes to reference counts, and a multiple reference table stores the count for items which are referenced more than once.
E. W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, E. F. M. Steffens. 1976. “On-the-fly Garbage Collection: An Exercise in Cooperation”. Springer-Verlag. Lecture Notes in Computer Science, Vol. 46.
As an example of cooperation between sequential processes with very little mutual interference despite frequent manipulations of a large shared data space, a technique is developed which allows nearly all of the activity needed for garbage detection and collection to be performed by an additional processor operating con- currently with the processor devoted to the computation proper. Exclusion and synchronization constraints have been kept as weak as could be achieved; the severe complexities engendered by doing so are illustrated.
Amer Diwan, Richard L. Hudson, J. Eliot B. Moss. 1992. “Compiler Support for Garbage Collection in a Statically Typed Language”. ACM. Proceedings of the 5th ACM SIGPLAN conference on Programming language design and implementation, pp. 273–282.
We consider the problem of supporting compacting garbage collection in the presence of modern compiler optimizations. Since our collector may move any heap object, it must accurately locate, follow, and update all pointers and values derived from pointers. To assist the collector, we extend the compiler to emit tables describing live pointers, and values derived from pointers, at each program location where collection may occur. Significant results include identification of a number of problems posed by optimizations, solutions to those problems, a working compiler, and experimental data concerning table sizes, table compression, and time overhead of decoding tables during collection. While gc support can affect the code produced, our sample programs show no significant changes, the table sizes are a modest fraction of the size of the optimized code, and stack tracing is a small fraction of total gc time. Since the compiler enhancements are also modest, we conclude that the approach is practical.
Amer Diwan, David Tarditi, J. Eliot B. Moss. 1993. “Memory Subsystem Performance of Programs with Intensive Heap Allocation”. Carnegie Mellon University. CMU-CS-93-227.
Heap allocation with copying garbage collection is a general storage management technique for modern programming languages. It is believed to have poor memory subsystem performance. To investigate this, we conducted an in-depth study of the memory subsystem performance of heap allocation for memory subsystems found on many machines. We studied the performance of mostly-functional Standard ML programs which made heavy use of heap allocation. We found that most machines support heap allocation poorly. However, with the appropriate memory subsystem organization, heap allocation can have good performance. The memory subsystem property crucial for achieving good performance was the ability to allocate and initialize a new object into the cache without a penalty. This can be achieved by having subblock placement with a subblock size of one word with a write allocate policy, along with fast page-mode writes or a write buffer. For caches with subblock placement, the data cache overhead was under 9% for a 64k or larger data cache; without subblock placement the overhead was often higher than 50%.
Amer Diwan, David Tarditi, J. Eliot B. Moss. 1994. “Memory Subsystem Performance of Programs Using Copying Garbage Collection”. ACM. CMU-CS-93-210, also in POPL ‘94.
Heap allocation with copying garbage collection is believed to have poor memory subsystem performance. We conducted a study of the memory subsystem performance of heap allocation for memory subsystems found on many machines. We found that many machines support heap allocation poorly. However, with the appropriate memory subsystem organization, heap allocation can have good memory subsystem performance.
Damien Doligez & Xavier Leroy. 1993. “A concurrent, generational garbage collector for a multithreaded implementation of ML”. ACM. POPL ‘93, 113–123.
This paper presents the design and implementation of a “quasi real-time” garbage collector for Concurrent Caml Light, an implementation of ML with threads. This two-generation system combines a fast, asynchronous copying collector on the young generation with a non-disruptive concurrent marking collector on the old generation. This design crucially relies on the ML compile-time distinction between mutable and immutable objects.
Damien Doligez & Georges Gonthier. 1994. “Portable, unobtrusive garbage collection for multiprocessor systems”. ACM. POPL ‘94, 70–83.
We describe and prove the correctness of a new concurrent mark-and-sweep garbage collection algorithm. This algorithm derives from the classical on-the-fly algorithm from Dijkstra et al. A distinguishing feature of our algorithm is that it supports multiprocessor environments where the registers of running processes are not readily accessible, without imposing any overhead on the elementary operations of loading a register or reading or initializing a field. Furthermore our collector never blocks running mutator processes except possibly on requests for free memory; in particular, updating a field or creating or marking or sweeping a heap object does not involve system-dependent synchronization primitives such as locks. We also provide support for process creation and deletion, and for managing an extensible heap of variable-sized objects.
R. Kent Dybvig, Carl Bruggeman, David Eby. 1993. “Guardians in a Generation-Based Garbage Collector”. SIGPLAN. Proceedings of the ACM SIGPLAN ‘93 Conference on Programming Language Design and Implementation, June 1993.
This paper describes a new language feature that allows dynamically allocated objects to be saved from deallocation by an automatic storage management system so that clean-up or other actions can be performed using the data stored within the objects. The program has full control over the timing of clean-up actions, which eliminates several potential problems and often eliminates the need for critical sections in code that interacts with clean-up actions. Our implementation is “generation-friendly” in the sense that the additional overhead within the mutator is proportional to the number of clean-up actions actually performed.
Daniel R. Edelson. 1992. “Smart pointers: They’re smart, but they’re not pointers”. USENIX C++ Conference.
From the introduction
This paper shows hhow the behaviour of smart pointers diverges from that of pointers in certain common C++ constructs. Given this, we conclude that the C++ programming language does not support seamless smart pointers: smart pointers cannot transparently replace raw pointers in all ways except declaration syntax. We show that this conclusion also applies to accessors.
Daniel R. Edelson. 1992. “Comparing Two Garbage Collectors for C++”. University of California at Santa Cruz. Technical Report UCSC-CRL-93-20.
Our research is concerned with compiler- independent, tag-free garbage collection for the C++ programming language. This paper presents a mark-and-sweep collector, and explains how it ameliorates shortcomings of a previous copy collector. The new collector, like the old, uses C++’s facilities for creating abstract data types to define a tracked reference type, called roots, at the level of the application program. A programmer wishing to utilize the garbage collection service uses these roots in place of normal, raw pointers. We present a detailed study of the cost of using roots, as compared to both normal pointers and reference counted pointers, in terms of instruction counts. We examine the efficiency of a small C++ application using roots, reference counting, manual reclamation, and conservative collection. Coding the application to use garbage collection, and analyzing the resulting efficiency, helped us identify a number of memory leaks and inefficiencies in the original, manually reclaimed version. We find that for this program, garbage collection using roots is much more efficient than reference counting, though less efficient than manual reclamation. It is hard to directly compare our collector to the conservative collector because of the differing efficiencies of their respective memory allocators.
Daniel J. Edwards. n.d. “Lisp II Garbage Collector”. MIT. AI Memo 19 (AIM-19).
(This short memo doesn’t have an abstract. Basically, it describes the plan for the LISP II Relocating Garbage Collector. It has four phases: marking, collection, relocation and moving. Marking is by recursive descent using a bit table. The remaining phases are linear sweeps through the bit table. The collection phase calculates how much everything needs to move, storing this information in the free blocks. The relocation phase updates all relocatable addresses. The moving phase moves the surviving objects into one contiguous block.)
John R. Ellis, David L. Detlefs. 1993. “Safe, Efficient Garbage Collection for C++”. Xerox PARC.
We propose adding safe, efficient garbage collection to C++, eliminating the possibility of storage-management bugs and making the design of complex, object-oriented systems much easier. This can be accomplished with almost no change to the language itself and only small changes to existing implementations, while retaining compatibility with existing class libraries.
Paulo Ferreira. 1996. “Larchant: garbage collection in a cached distributed shared store with persistence by reachability”. Université Paris VI. Thése de doctorat.
The model of Larchant is that of a Shared Address Space (spanning every site in a network including secondary storage) with Persistence By Reachability. To provide the illusion of a shared address space across the network, despite the fact that site memories are disjoint, Larchant implements a distributed shared memory mechanism. Reachability is accessed by tracing the pointer graph, starting from the persistent root, and reclaiming unreachable objects. This is the task of Garbage Collection (GC).
GC was until recently thought to be intractable in a large-scale system, due to problems of scale, incoherence, asynchrony, and performance. This thesis presents the solutions that Larchant proposes to these problems.
The GC algorithm in Larchant combines tracing and reference-listing. It traces whenever economically feasible, i.e., as long as the memory subset being collected remains local to a site, and counts references that would cost I/O traffic to trace. GC is orthogonal to coherence, i.e., makes progress even if only incoherent replicas are locally available. The garbage collector runs concurrently and asynchronously to applications. The reference-listing boundary changes dynamically and seamlessly, and independently at each site, in order to collect cycles of unreachable objects.
We prove formally that our GC algorithm is correct, i.e., it is safe and live. The performance results from our Larchant prototype show that our design goals (scalability, coherence orthogonality, and good performance) are fulfilled.
Paulo Ferreira & Marc Shapiro. 1998. “Modelling a Distributed Cached Store for Garbage Collection”. Springer-Verlag. Proceedings of 12th European Conference on Object-Oriented Programming, ECOOP98, LNCS 1445.
Caching and persistence support efficient, convenient and transparent distributed data sharing. The most natural model of persistence is persistence by reachability, managed automatically by a garbage collector (GC). We propose a very general model of such a system (based on distributed shared memory) and a scalable, asynchronous distributed GC algorithm. Within this model, we show sufficient and widely applicable correctness conditions for the interactions between applications, store, memory, coherence, and GC.
The GC runs as a set of processes (local to each participating machine) communicating by asynchronous messages. Collection does not interfere with applications by setting locks, polluting caches, or causing I/O; this requirement raised some novel and interesting challenges which we address in this article. The algorithm is safe and live; it is not complete, i.e. it collects some distributed cycles of garbage but not necessarily all.
Daniel P Friedman, David S. Wise. 1976. “Garbage collecting a heap which includes a scatter table”. Information Processing Letters. 5, 6 (December 1976): 161–164.
A new algorithm is introduced for garbage collecting a heap which contains shared data structures accessed from a scatter table. The scheme provides for the purging of useless entries from the scatter table with no traverslas beyond the two required by classic collection schemes. For languages which use scatter tables to sustain unique existence of complex structures, like natural variables of SNOBOL, it indirectly allows liberal use of a single scatter table by ensuring efficient deletion of useless entries. Since the scatter table is completely restructured during the course of execution, the hashing scheme itself is easily altered during garbage collection whenever skewed loading of the scatter table warrants abandonment of the old hashing. This procedure is applicable to the maintenance of dynamic structures such as those in information retrieval schemes or in languages like LISP and SNOBOL.
Daniel P Friedman, David S. Wise. 1977. “The One Bit Reference Count”. BIT. (17)3: 351–359.
Deutsch and Bobrow propose a storage reclamation scheme for a heap which is a hybrid of garbage collection and reference counting. The point of the hybrid scheme is to keep track of very low reference counts between necessary invocation of garbage collection so that nodes which are allocated and rather quickly abandoned can be returned to available space, delaying necessity for garbage collection. We show how such a scheme may be implemented using the mark bit already required in every node by the garbage collector. Between garbage collections that bit is used to distinguish nodes with a reference count known to be one. A significant feature of our scheme is a small cache of references to nodes whose implemented counts “ought to be higher” which prevents the loss of logical count information in simple manipulations of uniquely referenced structures.
Daniel P Friedman, David S. Wise. 1979. “Reference counting can manage the circular environments of mutual recursion”. Information Processing Letters. 8, 1 (January 1979): 41–45.
From the introduction
In this note we advance reference counting as a storage management technique viable for implementing recursive languages like ISWIM or pure LISP with the labels construct for implementing mutual recursion from SCHEME. Labels is derived from letrec and displaces the label operator, a version of the paradoxical Y-combinator. The key observation is that the requisite circular structure (which ordinarily cripples reference counts) occurs only within the language–rather than the user–structure, and that the references into this structure are well-controlled.
Dirk Grunwald, Benjamin Zorn, R. Henderson. 1993. “Improving the Cache Locality of Memory Allocation”. SIGPLAN. SIGPLAN ‘93, Conference on PLDI, June 1993, Albuquerque, New Mexico.
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simulation of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient, by coalescing adjacent free objects show poor reference locality, possibly negating the benefits of space efficiency. At the other extreme, algorithms can expend considerable effort to increase reference locality yet gain little in total execution performance. Our measurements suggest an allocator design that is both very fast and has good locality of reference.
Dirk Grunwald & Benjamin Zorn. 1993. “CustoMalloc: Efficient Synthesized Memory Allocators”. Software – Practice and Experience. 23(8):851–869.
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. Yet, in some applications, programmers use domain-specific knowledge in an attempt to improve the speed or memory utilization of memory allocators. In this paper, we describe a program (CustoMalloc) that synthesizes a memory allocator customized for a specific application. Our experiments show that the synthesized allocators are uniformly faster than the common binary-buddy (BSD) allocator, and are more space efficient. Constructing a custom allocator requires little programmer effort. The process can usually be accomplished in a few minutes, and yields results superior even to domain-specific allocators designed by programmers. Our measurements show the synthesized allocators are from two to ten times faster than widely used allocators.
David Gudeman. 1993. “Representing Type Information in Dynamically Typed Languages”. University of Arizona at Tucson. Technical Report TR 93-27.
This report is a discussion of various techniques for representing type information in dynamically typed languages, as implemented on general-purpose machines (and costs are discussed in terms of modern RISC machines). It is intended to make readily available a large body of knowledge that currently has to be absorbed piecemeal from the literature or re-invented by each language implementor. This discussion covers not only tagging schemes but other forms of representation as well, although the discussion is strictly limited to the representation of type information. It should also be noted that this report does not purport to contain a survey of the relevant literature. Instead, this report gathers together a body of folklore, organizes it into a logical structure, makes some generalizations, and then discusses the results in terms of modern hardware.
Timothy Harris. 1999. “Early storage reclamation in a tracing garbage collector”. ACM. ACM SIG-PLAN Notices 34:4, pp. 46–53.
This article presents a technique for allowing the early recovery of storage space occupied by garbage data. The idea is similar to that of generational garbage collection, except that the heap is partitioned based on a static analysis of data type definitions rather than on the approximate age of allocated objects. A prototype implementation is presented, along with initial results and ideas for future work.
Roger Henriksson. 1994. “Scheduling Real Time Garbage Collection”. Department of Computer Science at Lund University. LU-CS-TR:94-129.
This paper presents a new model for scheduling the work of an incremental garbage collector in a system with hard real time requirements. The method utilizes the fact that just some of the processes in the system have to meet hard real time requirements and that these processes typically run periodically, a fact that we can make use of when scheduling the garbage collection. The work of the collector is scheduled to be performed in the pauses between the critical processes and is suspended when the processes with hard real time requirements run. It is shown that this approach is feasible for many real time systems and that it leaves the time-critical parts of the system undisturbed from garbage collection induced delays.
Roger Henriksson. 1996. “Adaptive Scheduling of Incremental Copying Garbage Collection for Interactive Applications”. NWPER96.
Incremental algorithms are often used to interleave the work of a garbage collector with the execution of an application program, the intention being to avoid long pauses. However, overestimating the worst-case storage needs of the program often causes all the garbage collection work to be performed in the beginning of the garbage collection cycles, slowing down the application program to an unwanted degree. This paper explores an approach to distributing the work more evenly over the garbage collection cycle.
Roger Henriksson. 1998. “Scheduling Garbage Collection in Embedded Systems”. Department of Computer Science at Lund University. Ph.D. thesis.
The complexity of systems for automatic control and other safety-critical applications grows rapidly. Computer software represents an increasing part of the complexity. As larger systems are developed, we need to find scalable techniques to manage the complexity in order to guarantee high product quality. Memory management is a key quality factor for these systems. Automatic memory management, or garbage collection, is a technique that significantly reduces the complex problem of correct memory management. The risk of software errors decreases and development time is reduced.
Garbage collection techniques suitable for interactive and soft real-time systems exist, but few approaches are suitable for systems with hard real-time requirements, such as control systems (embedded systems). One part of the problem is solved by incremental garbage collection algorithms, which have been presented before. We focus on the scheduling problem which forms the second part of the problem, i.e. how the work of a garbage collector should be scheduled in order to disturb the application program as little as possible. It is studied how a priori scheduling analysis of systems with automatic memory management can be made. The field of garbage collection research is thus joined with the field of scheduling analysis in order to produce a practical synthesis of the two fields.
A scheduling strategy is presented that employs the properties of control systems to ensure that no garbage collection work is performed during the execution of critical processes. The hard real-time part of the system is thus never disturbed by garbage collection work. Existing incremental garbage collection algorithms are adapted to the presented strategy. Necessary modifications of the algorithms and the real-time kernel are discussed. A standard scheduling analysis technique, rate monotonic analysis, is extended in order to make a priori analysis of the schedulability of the garbage collector possible.
The scheduling algorithm has been implemented in an industrially relevant real-time environment in order to show that the strategy is feasible in practice. The experimental evaluation shows that predictable behaviour and sub-millisecond worst-case delays can be achieved on standard hardware even by a non-optimized prototype garbage collector.
Antony L. Hosking. 1991. “Main memory management for persistence”. ACM. Proceedings of the ACM OOPSLA‘91 Workshop on Garbage Collection.
Reachability-based persistence imposes new requirements for main memory management in general, and garbage collection in particular. After a brief introduction to the characteristics and requirements of reachability-based persistence, we present the design of a run-time storage manager for Persistent Smalltalk and Persistent Modula-3, which allows the reclamation of storage from both temporary objects and buffered persistent objects.
Antony L. Hosking, J. Eliot B. Moss, Darko Stefanovic. 1992. “A comparative performance evaluation of write barrier implementations”. ACM. OOPSLA‘92 Conference Proceedings, ACM SIGPLAN Notices 27(10), pp 92–109.
Generational garbage collectors are able to achieve very small pause times by concentrating on the youngest (most recently allocated) objects when collecting, since objects have been observed to die young in many systems. Generational collectors must keep track of all pointers from older to younger generations, by “monitoring” all stores into the heap. This write barrier has been implemented in a number of ways, varying essentially in the granularity of the information observed and stored. Here we examine a range of write barrier implementations and evaluate their relative performance within a generation scavenging garbage collector for Smalltalk.
Antony L. Hosking, Richard L. Hudson. 1993. “Remembered sets can also play cards”. ACM. Proceedings of the ACM OOPSLA‘93 Workshop on Memory Management and Garbage Collection.
Remembered sets and dirty bits have been proposed as alternative implementations of the write barrier for garbage collection. There are advantages to both approaches. Dirty bits can be efficiently maintained with minimal, bounded overhead per store operation, while remembered sets concisely, and accurately record the necessary information. Here we present evidence to show that hybrids can combine the virtues of both schemes and offer competitive performance. Moreover, we argue that a hybrid can better avoid the devils that are the downfall of the separate alternatives.
Antony L. Hosking, J. Eliot B. Moss. 1993. “Protection traps and alternatives for memory management of an object-oriented language”. ACM. Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, ACM Operating Systems Review 27(5), pp 106–119.
Many operating systems allow user programs to specify the protection level (inaccessible, read-only, read-write) of pages in their virtual memory address space, and to handle any protection violations that may occur. Such page-protection techniques have been exploited by several user-level algorithms for applications including generational garbage collection and persistent stores. Unfortunately, modern hardware has made efficient handling of page protection faults more difficult. Moreover, page-sized granularity may not match the natural granularity of a given application. In light of these problems, we reevaluate the usefulness of page-protection primitives in such applications, by comparing the performance of implementations that make use of the primitives with others that do not. Our results show that for certain applications software solutions outperform solutions that rely on page-protection or other related virtual memory primitives.
Richard L. Hudson, J. Eliot B. Moss, Amer Diwan, Christopher F. Weight. 1991. “A Language-Independent Garbage Collector Toolkit”. University of Massachusetts at Amherst. COINS Technical Report 91–47.
We describe a memory management toolkit for language implementors. It offers efficient and flexible generation scavenging garbage collection. In addition to providing a core of language-independent algorithms and data structures, the toolkit includes auxiliary components that ease implementation of garbage collection for programming languages. We have detailed designs for Smalltalk and Modula-3 and are confident the toolkit can be used with a wide variety of languages. The toolkit approach is itself novel, and our design includes a number of additional innovations in flexibility, efficiency, accuracy, and cooperation between the compiler and the collector.
Richard L. Hudson, J. Eliot B. Moss. 1992. “Incremental Collection of Mature Objects”. Springer-Verlag. LNCS #637 International Workshop on Memory Management, St. Malo, France, Sept. 1992, pp. 388–403.
We present a garbage collection algorithm that extends generational scavenging to collect large older generations (mature objects) non-disruptively. The algorithm’s approach is to process bounded-size pieces of mature object space at each collection; the subtleties lie in guaranteeing that it eventually collects any and all garbage. The algorithm does not assume any special hardware or operating system support, e.g., for forwarding pointers or protection traps. The algorithm copies objects, so it naturally supports compaction and reclustering.
Richard L. Hudson, Ron Morrison, J. Eliot B. Moss, David S. Munro. 1997. “Garbage Collecting the World: One Car at a Time”. ACM. Proc. OOPSLA 97, pp. 162–175.
A new garbage collection algorithm for distributed object systems, called DMOS (Distributed Mature Object Space), is presented. It is derived from two previous algorithms, MOS (Mature Object Space), sometimes called the train algorithm, and PMOS (Persistent Mature Object Space). The contribution of DMOS is that it provides the following unique combination of properties for a distributed collector: safety, completeness, non-disruptiveness, incrementality, and scalability. Furthermore, the DMOS collector is non-blocking and does not use global tracing.
Mark S. Johnstone. 1997. “Non-Compacting Memory Allocation and Real-Time Garbage Collection”. University of Texas at Austin.
Dynamic memory use has been widely recognized to have profound effects on program performance, and has been the topic of many research studies over the last forty years. In spite of years of research, there is considerable confusion about the effects of dynamic memory allocation. Worse, this confusion is often unrecognized, and memory allocators are widely thought to be fairly well understood.
In this research, we attempt to clarify many issues for both manual and automatic non-moving memory management. We show that the traditional approaches to studying dynamic memory allocation are unsound, and develop a sound methodology for studying this problem. We present experimental evidence that fragmentation costs are much lower than previously recognized for most programs, and develop a framework for understanding these results and enabling further research in this area. For a large class of programs using well-known allocation policies, we show that fragmentation costs are near zero. We also study the locality effects of memory allocation on programs, a research area that has been almost completely ignored. We show that these effects can be quite dramatic, and that the best allocation policies in terms of fragmentation are also among the best in terms of locality at both the cache and virtual memory levels of the memory hierarchy.
We extend these fragmentation and locality results to real-time garbage collection. We have developed a hard real-time, non-copying generational garbage collector which uses a write-barrier to coordinate collection work only with modifications of pointers, therefore making coordination costs cheaper and more predictable than previous approaches. We combine this write-barrier approach with implicit non-copying reclamation, which has most of the advantages of copying collection (notably avoiding both the sweep phase required by mark-sweep collectors, and the referencing of garbage objects when reclaiming their space), without the disadvantage of having to actually copy the objects. In addition, we present a model for non-copying implicit-reclamation garbage collection. We use this model to compare and contrast our work with that of others, and to discuss the tradeoffs that must be made when developing such a garbage collector.
Mark S. Johnstone, Paul R. Wilson. 1998. “The Memory Fragmentation Problem: Solved?”. ACM. ISMM‘98 pp. 26–36.
We show that for 8 real and varied C and C++ programs, several conventional dynamic storage allocators provide near-zero fragmentation, once overheads due to implementation details (headers, alignment, etc.) are properly accounted for. This substantially strengthens our previous results showing that the memory fragmentation problem has generally been misunderstood, and that good allocator policies can provide good memory usage for most programs. The new results indicate that for most programs, excellent allocator policies are readily available, and efficiency of implementation is the major challenge. While we believe that our experimental results are state-of-the-art and our methodology is superior to most previous work, more work should be done to identify and study unusual problematic program behaviors not represented in our sample.
Richard E. Jones. 1992. “Tail recursion without space leaks”. Journal of Functional Programming. 2(1):73–79.
The G-machine is a compiled graph reduction machine for lazy functional languages. The G-machine compiler contains many optimisations to improve performance. One set of such optimisations is designed to improve the performance of tail recursive functions. Unfortunately the abstract machine is subject to a space leak–objects are unnecessarily preserved by the garbage collector.
This paper analyses why a particular form of space leak occurs in the G-machine, and presents some ideas for fixing this problem. This phenomena in other abstract machines is also examined briefly.
Richard E. Jones, Rafael Lins. 1992. “Cyclic weighted reference counting without delay”. Computing Laboratory, The University of Kent at Canterbury. Technical Report 28-92.
Weighted Reference Counting is a low-communication distributed storage reclamation scheme for loosely-coupled multiprocessors. The algorithm we present herein extends weighted reference counting to allow the collection of cyclic data structures. To do so, the algorithm identifies candidate objects that may be part of cycles and performs a tricolour mark-scan on their subgraph in a lazy manner to discover whether the subgraph is still in use. The algorithm is concurrent in the sense that multiple useful computation processes and garbage collection processes can be performed simultaneously.
Richard E. Jones, Rafael Lins. 1996. “Garbage Collection: Algorithms for Automatic Dynamic Memory Management”. Wiley. ISBN 0-471-94148-4.
From the back cover
The memory storage requirements of complex programs are extremely difficult to manage correctly by hand. A single error may lead to indeterminate and inexplicable program crashes. Worse still, failures are often unrepeatable and may surface only long after the program has been delivered to the customer. The eradication of memory errors typically consumes a substantial amount of development time. And yet the answer is relatively easy – garbage collection; removing the clutter of memory management from module interfaces, which then frees the programmer to concentrate on the problem at hand rather than low-level book-keeping details. For this reason, most modern object-oriented languages such as Smalltalk, Eiffel, Java and Dylan, are supported by garbage collection. Garbage collecting libraries are even available for such uncooperative languages as C and C++.
This book considers how dynamic memory can be recycled automatically to guarantee error-free memory management. There is an abundant but disparate literature on the subject, largely confined to research papers. This book sets out to pool this experience in a single accessible and unified framework.
Each of the important algorithms is explained in detail, often with illustrations of its characteristic features and animations of its use. Techniques are described and compared for declarative and imperative programming styles, for sequential, concurrent and distributed architectures.
For professionals developing programs from simple software tools to complex systems, as well as for researchers and students working in compiler construction, functional, logic and object-oriented programming design, this book will provide not only a clear introduction but also a convenient reference source for modern garbage collection techniques.
Richard E. Jones. 1998. “ISMM‘98 International Symposium on Memory Management”. ACM. ISBN 1-58113-114-3.
From the Preface
The International Symposium on Memory Management is a forum for research in several related areas of memory management, especially garbage collectors and dynamic storage allocators. [...] The nineteen papers selected for publication in this volume cover a remarkably broad range of memory management topics from explicit malloc-style allocation to automatic memory management, from cache-conscious data layout to efficient management of distributed references, from conservative to type-accurate garbage collection, for applications ranging from user application to long-running servers, supporting languages as different as C, C++, Modula-3, Java, Eiffel, Erlang, Scheme, ML, Haskell and Prolog.
Richard E. Jones, Antony Hosking, and Eliot Moss. 2012. “The Garbage Collection Handbook”. Chapman & Hall.
Ian Joyner. 1996. “C++??: A Critique of C++.”.
The C++?? Critique is an analysis of some of the flaws of C++. It is by no means exhaustive, nor does it attempt to document every little niggle with C++, rather concentrating on main themes. The critique uses Java and Eiffel as comparisons to C++ to give a more concrete feel to the criticisms, viewing conceptual differences rather than syntactic ones as being more important. Some C++ authors realising there are glaring deficiencies in C++ have chosen to defend C++ by also being critical within their own work. Most notable are Bjarne Stroustup’s “Design and Evolution of C++,” and Scott Meyers’ “Effective” and “More Effective C++.” These warn of many traps and pitfalls, but reach the curious conclusion that since “good” C++ programmers are aware of these problems and know how to avoid them, C++ is alright.
The C++ critique makes many of the same criticisms, but comes to the different conclusion that these pitfalls are not acceptable, and should not be in a language used for modern large scale software engineering. Clean design is more important than after the fact warnings, and it is inconceivable that purchasers of end user software would tolerate this tactic on the part of vendors. The critique also takes a look at C, and concludes that many of the features of C should be left out of modern languages, and that C is a flawed base for a language.
Bob Kanefsky. 1989. “Recursive Memory Allocation”. Bob Kanefsky. Songworm 3, p.?.
Jin-Soo Kim, Xiaohan Qin, Yarsun Hsu. 1998. “Memory Characterization of a Parallel Data Mining Workload”. IEEE. Proc. Workload Characterization: Methodology and Case Studies, pp. .
This paper studies a representative of an important class of emerging applications, a parallel data mining workload. The application, extracted from the IBM Intelligent Miner, identifies groups of records that are mathematically similar based on a neural network model called self-organizing map. We examine and compare in details two implementations of the application: (1) temporal locality or working set sizes; (2) spatial locality and memory block utilization; (3) communication characteristics and scalability; and (4) TLB performance.
First, we find that the working set hierarchy of the application is governed by two parameters, namely the size of an input record and the size of prototype array; it is independent of the number of input records. Second, the application shows good spatial locality, with the implementation optimized for sparse data sets having slightly worse spatial locality. Third, due to the batch update scheme, the application bears very low communication. Finally, a 2-way set associative TLB may result in severely skewed TLB performance in a multiprocessor environment caused by the large discrepancy in the amount of conflict misses. Increasing the set associativity is more effective in mitigating the problem than increasing the TLB size.
Jin-Soo Kim & Yarsun Hsu. 2000. “Memory system behavior of Java programs: methodology and analysis”. ACM. Proc. International conference on measurements and modeling of computer systems, pp. 264–274.
This paper studies the memory system behavior of Java programs by analyzing memory reference traces of several SPECjvm98 applications running with a Just-In-Time (JIT) compiler. Trace information is collected by an exception-based tracing tool called JTRACE, without any instrumentation to the Java programs or the JIT compiler.First, we find that the overall cache miss ratio is increased due to garbage collection, which suffers from higher cache misses compared to the application. We also note that going beyond 2-way cache associativity improves the cache miss ratio marginally. Second, we observe that Java programs generate a substantial amount of short-lived objects. However, the size of frequently-referenced long-lived objects is more important to the cache performance, because it tends to determine the application’s working set size. Finally, we note that the default heap configuration which starts from a small initial heap size is very inefficient since it invokes a garbage collector frequently. Although the direct costs of garbage collection decrease as we increase the available heap size, there exists an optimal heap size which minimizes the total execution time due to the interaction with the virtual memory performance.
Elliot K. Kolodner. 1992. “Atomic Incremental Garbage Collection and Recovery for a Large Stable Heap”. Laboratory for Computer Science at MIT. MIT-LCS-TR-534.
A stable heap is a storage that is managed automatically using garbage collection, manipulated using atomic transactions, and accessed using a uniform storage model. These features enhance reliability and simplify programming by preventing errors due to explicit deallocation, by masking failures and concurrency using transactions, and by eliminating the distinction between accessing temporary storage and permanent storage. Stable heap management is useful for programming language for reliable distributed computing, programming languages with persistent storage, and object-oriented database systems. Many applications that could benefit from a stable heap (e.g., computer-aided design, computer-aided software engineering, and office information systems) require large amounts of storage, timely responses for transactions, and high availability. We present garbage collection and recovery algorithms for a stable heap implementation that meet these goals and are appropriate for stock hardware. The collector is incremental: it does not attempt to collect the whole heap at once. The collector is also atomic: it is coordinated with the recovery system to prevent problems when it moves and modifies objects . The time for recovery is independent of heap size, and can be shortened using checkpoints.
Per-Åke Larson & Murali Krishnan. 1998. “Memory Allocation for Long-Running Server Applications”. ACM. ISMM‘98 pp. 176–185.
Prior work on dynamic memory allocation has largely neglected long-running server applications, for example, web servers and mail servers. Their requirements differ from those of one-shot applications like compilers or text editors. We investigated how to build an allocator that is not only fast and memory efficient but also scales well on SMP machines. We found that it is not sufficient to focus on reducing lock contention. Only limited improvement can be achieved this way; higher speedups require a reduction in cache misses and cache invalidation traffic. We then designed and prototyped a new allocator, called Lkmalloc, targeted for both traditional applications and server applications. LKmalloc uses several subheaps, each one with a separate set of free lists and memory arena. A thread always allocates from the same subheap but can free a block belonging to any subheap. A thread is assigned to a subheap by hashing on its thread ID. We compared its performance with several other allocators on a server-like, simulated workload and found that it indeed scales well and is quite fast but could use memory more efficiently.
Henry Lieberman & Carl Hewitt. 1983. “A real-time garbage collector based on the lifetimes of objects”. ACM. 26(6):419–429.
In previous heap storage systems, the cost of creating objects and garbage collection is independent of the lifetime of the object. Since objects with short lifetimes account for a large portion of storage use, it is worth optimizing a garbage collector to reclaim storage for these objects more quickly. The garbage collector should spend proportionately less effort reclaiming objects with longer lifetimes. We present a garbage collection algorithm that (1) makes storage for short-lived objects cheaper than storage for long-lived objects, (2) that operates in real-time–object creation and access times are bounded, (3) increases locality of reference, for better virtual memory performance, (4) works well with multiple processors and a large address space.
J. McCarthy, M. L. Minsky. 1959. “Artificial Intelligence, Quarterly Progress Report no. 53”. Research Laboratory of Electronics at MIT.
J. McCarthy. 1960. “Recursive Functions of Symbolic Expressions and Their Computation by Machine”. CACM.
A programming system called LISP (for LISt Processor) has been developed for the IBM 704 computer by the Artificial Intelligence group at M.I.T. The system was designed to facilitate experiments with a proposed system called the Advice Taker, whereby a machine could be instructed to handle declarative as well as imperative sentences and could exhibit “common sense” in carrying out its instructions. The original proposal for the Advice Taker was made in November 1958. The main requirement was a programming system for manipulating expressions representing formalized declarative and imperative sentences so that the Advice Taker could make deductions.
In the course of its development the LISP system went through several stages of simplification and eventually came to be based on a scheme for representing the partial recursive functions of a certain class of symbolic expressions. This representation is independent of the IBM 704 computer, or of any other electronic computer, and it now seems expedient to expound the system by starting with the class of expressions called S-expressions and the functions called S-functions.
John McCarthy. 1979. “History of Lisp”. In History of programming languages I, pp. 173–185. ACM.
Veljko Milutinovic, Jelica Protic, Milo Tomasevic. 1997. “Distributed shared memory: concepts and systems”. IEEE Computer Society Press. ISBN 0-8186-7737-6.
From the publisher’s catalog
Presents a survey of both distributed shared memory (DSM) efforts and commercial DSM systems. The book discusses relevant issues that make the concept of DSM one of the most attractive approaches for building large-scale, high-performance multiprocessor systems. Its text provides a general introduction to the DSM field as well as a broad survey of the basic DSM concepts, mechanisms, design issues, and systems.
Distributed Shared Memory concentrates on basic DSM algorithms, their enhancements, and their performance evaluation. In addition, it details implementations that employ DSM solutions at the software and the hardware level. The book is a research and development reference that provides state-of-the art information that will be useful to architects, designers, and programmers of DSM systems.
M. L. Minsky. 1963. “A LISP Garbage Collector Algorithm Using Serial Secondary Storage”. MIT. Memorandum MAC-M-129, Artificial Intelligence Project, Memo 58 (revised).
This paper presents an algorithm for reclaiming unused free storage memory cells is LISP. It depends on availability of a fast secondary storage device, or a large block of available temporary storage. For this price, we get 1. Packing of free-storage into a solidly packed block. 2. Smooth packing of arbitrary linear blocks and arrays. 3. The collector will handle arbitrarily complex re-entrant list structure with no introduction of spurious copies. 4. The algorithm is quite efficient; the marking pass visits words at most twice and usually once, and the loading pass is linear. 5. The system is easily modified to allow for increase in size of already fixed consecutive blocks, provide one can afford to initiate a collection pass or use a modified array while waiting for such a pass to occur.
David Moon. 1984. “Garbage Collection in a Large Lisp System”. ACM. Symposium on Lisp and Functional Programming, August 1984.
This paper discusses garbage collection techniques used in a high-performance Lisp implementation with a large virtual memory, the Symbolics 3600. Particular attention is paid to practical issues and experience. In a large system problems of scale appear and the most straightforward garbage-collection techniques do not work well. Many of these problems involve the interaction of the garbage collector with demand-paged virtual memory. Some of the solutions adopted in the 3600 are presented, including incremental copying garbage collection, approximately depth-first copying, ephemeral objects, tagged architecture, and hardware assists. We discuss techniques for improving the efficiency of garbage collection by recognizing that objects in the Lisp world have a variety of lifetimes. The importance of designing the architecture and the hardware to facilitate garbage collection is stressed.
David Moon. 1985. “Architecture of the Symbolics 3600”. IEEE. 12th International Symposium on Computer Architecture, pp. 76–83.
David Moon. 1990. “Symbolics Architecture”. Wiley. Chapter 3 of Computers for Artificial Intelligence Processing, ISBN 0-471-84811-5.
David Moon. 1991. “Genera Retrospective”. IEEE. 1991 International Workshop on Object Orientation in Operating Systems, order #2265.
Ben-Ari Mordechai. 1984. “Algorithms for On-the-fly Garbage Collection”. TOPLAS 6(3): 333–344 (1984).
Luc Moreau. 1998. “Hierarchical Distributed Reference Counting”. ACM. ISMM‘98 pp. 57–67.
Massively distributed computing is a challenging problem for garbage collection algorithm designers as it raises the issue of scalability. The high number of hosts involved in a computation can require large tables for reference listing, whereas the lack of information sharing between hosts in a same locality can entail redundant GC traffic. In this paper, we argue that a conceptual hierarchical organisation of massive distributed computations can solve this problem. By conceptual hierarchical organisation, we mean that processors are still able to communicate in a peer to peer manner using their usual communication mechanism, but GC messages will be routed as if processors were organised in hierarchy. We present an extension of a distributed reference counting algorithm that uses such a hierarchical organisation. It allows us to bound table sizes by the number of hosts in a domain, and it allows us to share GC information between hosts in a same locality in order to reduce cross-network GC traffic.
Greg Morrisett, Matthias Felleisen, Robert Harper. 1995. “Abstract Models of Memory Management”. Carnegie Mellon University. CMU-CS-FOX-95-01.
Most specifications of garbage collectors concentrate on the low-level algorithmic details of how to find and preserve accessible objects. Often, they focus on bit-level manipulations such as “scanning stack frames,” “marking objects,” “tagging data,” etc. While these details are important in some contexts, they often obscure the more fundamental aspects of memory management: what objects are garbage and why?
We develop a series of calculi that are just low-level enough that we can express allocation and garbage collection, yet are sufficiently abstract that we may formally prove the correctness of various memory management strategies. By making the heap of a program syntactically apparent, we can specify memory actions as rewriting rules that allocate values on the heap and automatically dereference pointers to such objects when needed. This formulation permits the specification of garbage collection as a relation that removes portions of the heap without affecting the outcome of evaluation.
Our high-level approach allows us to specify in a compact manner a wide variety of memory management techniques, including standard trace-based garbage collection (i.e., the family of copying and mark/sweep collection algorithms), generational collection, and type-based, tag-free collection. Furthermore, since the definition of garbage is based on the semantics of the underlying language instead of the conservative approximation of inaccessibility, we are able to specify and prove the idea that type inference can be used to collect some objects that are accessible but never used.
David S. Munro, Alfred Brown, Ron Morrison, J. Eliot B. Moss. 1999. “Incremental Garbage Collection of a Persistent Object Store using PMOS”. Morgan Kaufmann. in Advances in Persistent Object Systems, pp. 78–91.
PMOS is an incremental garbage collector designed specifically to reclaim space in a persistent object store. It is complete in that it will, after a finite number of invocations, reclaim all unreachable storage. PMOS imposes minimum constraints on the order of collection and offers techniques to reduce the I/O traffic induced by the collector. Here we present the first implementation of the PMOS collector called PMOS#1. The collector has been incorporated into the stable heap layer of the generic persistent object store used to support a number of languages including Napier88. Our main design goals are to maintain the independence of the language from the store and to retain the existing store interface. The implementation has been completed and tested using a Napier88 system. The main results of this work show that the PMOS collector is implementable in a persistent store and that it can be built without requiring changes to the language interpreter. Initial performance measurements are reported. These results suggest however, that effective use of PMOS requires greater co-operation between language and store.
Scott Nettles, James O’Toole, David Pierce, Nickolas Haines. 1992. “Replication-Based Incremental Copying Collection”. IWMM‘92.
We introduce a new replication-based copying garbage collection technique. We have implemented one simple variation of this method to provide incremental garbage collection on stock hardware with no special operating system or virtual memory support. The performance of the prototype implementation is excellent: major garbage collection pauses are completely eliminated with only a slight increase in minor collection pause times.
Unlike the standard copying algorithm, the replication-based method does not destroy the original replica when a copy is created. Instead, multiple copies may exist, and various standard strategies for maintaining consistency may be applied. In our implementation for Standard ML of New Jersey, the mutator continues to use the from-space replicas until the collector has achieved a consistent replica of all live data in to-space.
We present a design for a concurrent garbage collector using the replication-based technique. We also expect replication-based GC methods to be useful in providing services for persistence and distribution, and briefly discuss these possibilities.
Scott Nettles. 1992. “A Larch Specification of Copying Garbage Collection”. Carnegie Mellon University. CMU-CS-92-219.
Garbage collection (GC) is an important part of many language implementations. One of the most important garbage collection techniques is copying GC. This paper consists of an informal but abstract description of copying collection, a formal specification of copying collection written in the Larch Shared Language and the Larch/C Interface Language, a simple implementation of a copying collector written in C, an informal proof that the implementation satisfies the specification, and a discussion of how the specification applies to other types of copying GC such as generational copying collectors. Limited familiarity with copying GC or Larch is needed to read the specification.
Scott Nettles & James O’Toole. 1993. “Implementing Orthogonal Persistence: A Simple Optimization Using Replicating Collection”. USENIX. IWOOOS‘93.
Orthogonal persistence provides a safe and convenient model of object persistence. We have implemented a transaction system which supports orthogonal persistence in a garbage-collected heap. In our system, replicating collection provides efficient concurrent garbage collection of the heap. In this paper, we show how replicating garbage collection can also be used to reduce commit operation latencies in our implementation.
We describe how our system implements transaction commit. We explain why the presence of non-persistent objects can add to the cost of this operation. We show how to eliminate these additional costs by using replicating garbage collection. The resulting implementation of orthogonal persistence should provide transaction performance that is independent of the quantity of non-persistent data in use. We expect efficient support for orthogonal persistence to be valuable in operating systems applications which use persistent data.
Scott Nettles & James O’Toole. 1993. “Real-Time Replication Garbage Collection”. ACM. PLDI‘93.
We have implemented the first copying garbage collector that permits continuous unimpeded mutator access to the original objects during copying. The garbage collector incrementally replicates all accessible objects and uses a mutation log to bring the replicas up-to-date with changes made by the mutator. An experimental implementation demonstrates that the costs of using our algorithm are small and that bounded pause times of 50 milliseconds can be readily achieved.
Norman R. Nielsen. 1977. “Dynamic Memory Allocation in Computer Simulation”. ACM. CACM 20:11.
This paper investigates the performance of 35 dynamic memory allocation algorithms when used to service simulation programs as represented by 18 test cases. Algorithm performance was measured in terms of processing time, memory usage, and external memory fragmentation. Algorithms maintaining separate free space lists for each size of memory block used tended to perform quite well compared with other algorithms. Simple algorithms operating on memory ordered lists (without any free list) performed surprisingly well. Algorithms employing power-of-two block sizes had favorable processing requirements but generally unfavorable memory usage. Algorithms employing LIFO, FIFO, or memory ordered free lists generally performed poorly compared with others.
James O’Toole. 1990. “Garbage Collecting Locally”.
Generational garbage collection is a simple technique for automatic partial memory reclamation. In this paper, I present the basic mechanics of generational collection and discuss its characteristics. I compare several published algorithms and argue that fundamental considerations of locality, as reflected in the changing relative speeds of processors, memories, and disks, strongly favor a focus on explicit optimization of I/O requirements during garbage collection. I show that this focus on I/O costs due to memory hierarchy debunks a well-known claim about the relative costs of garbage collection and stack allocation. I suggest two directions for future research in this area and discuss some simple architectural changes in virtual memory interfaces which may enable efficient garbage collector utilization of standard virtual memory hardware.
James O’Toole & Scott Nettles. 1994. “Concurrent Replicating Garbage Collection”. ACM. LFP‘94.
We have implemented a concurrent copying garbage collector that uses replicating garbage collection. In our design, the client can continuously access the heap during garbage collection. No low-level synchronization between the client and the garbage collector is required on individual object operations. The garbage collector replicates live heap objects and periodically synchronizes with the client to obtain the client’s current root set and mutation log. An experimental implementation using the Standard ML of New Jersey system on a shared-memory multiprocessor demonstrates excellent pause time performance and moderate execution time speedups.
Simon Peyton Jones, Norman Ramsey, Fermin Reig. 1999. “C–: a portable assembly language that supports garbage collection”. Springer-Verlag. International Conference on Principles and Practice of Declarative Programming 1999, LNCS 1702, pp. 1–28.
For a compiler writer, generating good machine code for a variety of platforms is hard work. One might try to reuse a retargetable code generator, but code generators are complex and difficult to use, and they limit one’s choice of implementation language. One might try to use C as a portable assembly language, but C limits the compiler writer’s flexibility and the performance of the resulting code. The wide use of C, despite these drawbacks, argues for a portable assembly language. C– is a new language designed expressly for this purpose. The use of a portable assembly language introduces new problems in the support of such high-level run-time services as garbage collection, exception handling, concurrency, profiling, and debugging. We address these problems by combining the C– language with a C– run-time interface. The combination is designed to allow the compiler writer a choice of source-language semantics and implementation techniques, while still providing good performance.
John S. Pieper. 1993. “Compiler Techniques for Managing Data Motion”. Carnegie Mellon University. Technical report number CMU-CS-93-217.
Software caching, automatic algorithm blocking, and data overlays are different names for the same problem: compiler management of data movement throughout the memory hierarchy. Modern high-performance architectures often omit hardware support for moving data between levels of the memory hierarchy: iWarp does not include a data cache, and Cray supercomputers do not have virtual memory. These systems have effectively traded a more complicated programming model for performance by replacing a hardware-controlled memory hierarchy with a simple fast memory. The simpler memories have less logic in the critical path, so the cycle time of the memories is improved.
For programs which fit in the resulting memory, the extra performance is great. Unfortunately, the driving force behind supercomputing today is a class of very large scientific problems, both in terms of computation time and in terms of the amount of data used. Many of these programs do not fit in the memory of the machines available. When architects trade hardware support for data migration to gain performance, control of the memory hierarchy is left to the programmer. Either the program size must be cut down to fit into the machine, or every loop which accesses more data than will fit into memory must be restructured by hand. This thesis describes how a compiler can relieve the programmer of this burden, and automate data motion throughout the memory hierarchy without direct hardware support.
This works develops a model of how data is accessed within a nested loop by typical scientific programs. It describes techniques which can be used by compilers faced with the task of managing data motion. The concentration is on nested loops which process large data arrays using linear array subscripts. Because the array subscripts are linear functions of the loop indices and the loop indices form an integer lattice, linear algebra can be applied to solve many compilation problems.
The approach it to tile the iteration space of the loop nest. Tiling allows the compiler to improve locality of reference. The tiling basis matrix is chosen from a set of candidate vectors which neatly divide the data set. The execution order of the tiles is selected to maximize locality between tiles. Finally, the tile sizes are chosen to minimize execution time.
The approach has been applied to several common scientific loop nests: matrix-matrix multiplication, QR-decomposition, and LU-decomposition. In addition, an illustrative example from the Livermore Loop benchmark set is examined. Although more compiler time can be required in some cases, this technique produces better code at no cost for most programs.
Pekka P. Pirinen. 1998. “Barrier techniques for incremental tracing”. ACM. ISMM‘98 pp. 20–25.
This paper presents a classification of barrier techniques for interleaving tracing with mutator operation during an incremental garbage collection. The two useful tricolour invariants are derived from more elementary considerations of graph traversal. Barrier techniques for maintaining these invariants are classified according to the action taken at the barrier (such as scanning an object or changing its colour), and it is shown that the algorithms described in the literature cover all the possibilities except one. Unfortunately, the new technique is impractical. Ways of combining barrier techniques are also discussed.
Tony Printezis. 1996. “Disk Garbage Collection Strategies for Persistent Java”. Proceedings of the First International Workshop on Persistence and Java.
This paper presents work currently in progress on Disk Garbage Collection issues for PJava, an orthogonally persistent version of Java. In particular, it concentrates on the initial Prototype of the Disk Garbage Collector of PJava0 which has already been implemented. This Prototype was designed to be very simple and modular in order to be easily changed, evolved, improved, and allow experimentation. Several experiments were performed in order to test possible optimisations; these experiments concentrated on the following four areas: a) efficient access to the store; b) page-replacement algorithms; c) efficient discovery of live objects during compaction; and d) dealing with forward references. The paper presents a description of the Prototype’s architecture, the results of these experiments and related discussion, and some future directions based on the experience gained from this work.
Tony Printezis & Quentin Cutts. 1996. “Measuring the Allocation Rate of Napier88”. Department of Computing Science at University of Glasgow. TR ?.
M. B. Reinhold. 1993. “Cache Performance of Garbage Collected Programming Languages”. Laboratory for Computer Science at MIT. MIT/LCS/TR-581.
As processor speeds continue to improve relative to main-memory access times, cache performance is becoming an increasingly important component of program performance. Prior work on the cache performance of garbage-collected programming languages has either assumed or argued that conventional garbage-collection methods will yield poor performance, and has therefore concentrated on new collection algorithms designed specifically to improve cache-level reference locality. This dissertation argues to the contrary: Many programs written in garbage-collected languages are naturally well-suited to the direct-mapped caches typically found in modern computer systems.
Using a trace-driven cache simulator and other analysis tools, five nontrivial, long-running Scheme programs are studied. A control experiment shows that the programs have excellent cache performance without any garbage collection at all. A second experiment indicates that the programs will perform well with a simple and infrequently-run generational compacting collector.
An analysis of the test programs’ memory usage patterns reveals that the mostly-functional programming style typically used in Scheme programs, in combination with simple linear storage allocation, causes most data objects to be dispersed in time and space so that references to them cause little cache interference. From this it follows that other Scheme programs, and programs written in similar styles in different languages, should perform well with a simple generational compacting collector; sophisticated collectors intended to improve cache performance are unlikely to be effective. The analysis also suggests that, as locality becomes ever more important to program performance, programs written in garbage-collected languages may turn out to have significant performance advantage over programs written in more conventional languages.
J. M. Robson. 1977. “Worst case fragmentation of first fit and best fit storage allocation strategies”. ACM. ACM Computer Journal, 20(3):242–244.
Gustavo Rodriguez-Rivera & Vince Russo. 1997. “Non-intrusive Cloning Garbage Collection with Stock Operating System Support”. Software – Practice and Experience. 27:8.
It is well accepted that automatic garbage collection simplifies programming, promotes modularity, and reduces development effort. However it is commonly believed that these advantages do not counteract the perceived price: excessive overheads, possible long pause times while garbage collections occur, and the need to modify existing code. Even though there are publically available garbage collector implementations that can be used in existing programs, they do not guarantee short pauses, and some modification of the application using them is still required. In this paper we describe a snapshot-at-beginning concurrent garbage collector algorithm and its implementation. This algorithm guarantees short pauses, and can be easily implemented on stock UNIX-like operating systems. Our results show that our collector performs comparable to other garbage collection implementations on uniprocessor machines and outperforms similar collectors on multiprocessor machines. We also show our collector to be competitive in performance with explicit deallocation. Our collector has the added advantage of being non-intrusive. Using a dynamic linking technique and effective root set inferencing, we have been able to successfully run our collector even in commercial programs where only the binary executable and no source code is available. In this paper we describe our algorithm, its implementation, and provide both an algorithmic and a performance comparison between our collector and other similar garbage collectors.
Niklas Röjemo. 1995. “Highlights from nhc – a space-efficient Haskell compiler”. Chalmers University of Technology.
Self-compiling implementations of Haskell, i.e., those written in Haskell, have been and, except one, are still space consuming monsters. Object code size for the compilers themselves are 3-8Mb, and they need 12-20Mb to recompile themselves. One reason for the huge demands for memory is that the main goal for these compilers is to produce fast code. However, the compiler described in this paper, called “nhc” for “Nearly a Haskell Compiler”, is the one above mentioned exception. This compiler concentrates on keeping memory usage down, even at a cost in time. The code produced is not fast, but nhc is usable, and the resulting programs can be run on computers with small memory.
This paper describes some of the implementation choices done, in the Haskell part of the source code, to reduce memory consumption in nhc. It is possible to use these also in other Haskell compilers with no, or very small, changes to their run-time systems.
Time is neither the main focus of nhc nor of this paper, but there is nevertheless a small section about the speed of nhc. The most notable observation concerning speed is that nhc spends approximately half the time processing interface files, which is much more than needed in the type checker. Processing interface files is also the most space consuming part of nhc in most cases. It is only when compiling source files with large sets of mutually recursive functions that more memory is needed to type check than to process interface files.
Niklas Röjemo. 1995. “Generational garbage collection for lazy functional languages without temporary space leaks”. Chalmers University of Technology.
Generational garbage collection is an established method for creating efficient garbage collectors. Even a simple implementation where all nodes that survive one garbage collection are tenured, i.e., moved to an old generation, works well in strict languages. In lazy languages, however, such an implementation can create severe temporary space leaks. The temporary space leaks appear in programs that traverse large lazily built data structures, e.g., a lazy list representing a large file, where only a small part is needed at any time. A simple generational garbage collector cannot reclaim the memory, used by the lazily built list, at minor collections. The reason is that at least one of the nodes in the list belongs to the old generation, after the first minor collection, and will hold on to the rest of the nodes in the list until the next major collection.
Niklas Röjemo & Colin Runciman. 1996. “Lag, drag, void and use – heap profiling and space-efficient compilation revisited”. ACM, SIGPLAN. ICFP‘96, ACM SIGPLAN Notices 31:6, ISBN 0-89791-770-7, pp. 34–41.
The context for this paper is functional computation by graph reduction. Our overall aim is more efficient use of memory. The specific topic is the detection of dormant cells in the live graph – those retained in heap memory though not actually playing a useful role in computation. We describe a profiler that can identify heap consumption by such ‘useless’ cells. Unlike heap profilers based on traversals of the live heap, this profiler works by examining cells post-mortem. The new profiler has revealed a surprisingly large proportion of ‘useless’ cells, even in some programs that previously seemed space-efficient such as the bootstrapping Haskell compiler “nhc”.
David J. Roth, David S. Wise. 1999. “One-bit counts between unique and sticky”. ACM. ISMM‘98, pp. 49–56.
Stoye’s one-bit reference tagging scheme can be extended to local counts of two or more via two strategies. The first, suited to pure register transactions, is a cache of referents to two shared references. The analog of Deutch’s and Bobrow’s multiple-reference table, this cache is sufficient to manage small counts across successive assignment statements. Thus, accurate reference counts above one can be tracked for short intervals, like that bridging one function’s environment to its successor’s.
The second, motivated by runtime stacks that duplicate references, avoids counting any references from the stack. It requires a local pointer-inversion protocol in the mutator, but one still local to the referent and the stack frame. Thus, an accurate reference count of one can be maintained regardless of references from the recursion stack.
Paul Rovner. 1985. “On Adding Garbage Collection and Runtime Types to a Strongly-Typed, Statically-Checked, Concurrent Language”. Xerox PARC. TR CSL-84-7.
Enough is known now about garbage collection, runtime types, strong-typing, static-checking and concurrency that it is possible to explore what happens when they are combined in a real programming system.
Storage management is one of a few central issues through which one can get a good view of the design of an entire system. Tensions between ease of integration and the need for protection; between generality, simplicity, flexibility, extensibility and efficiency are all manifest when assumptions and attitudes about managing storage are studied. And deep understanding follows best from the analysis of systems that people use to get real work done.
This paper is not for those who seek arguments pro or con about the need for these features in programming systems; such issues are for other papers. This one assumes these features to be good and describes how they combine and interact in Cedar, a programming language and environment designed to help programmers build moderate-sized experimental systems for moderate numbers of people to test and use.
Colin Runciman & David Wakeling. 1992. “Heap Profiling of Lazy Functional Programs”. University of York.
We describe the design, implementation, and use of a new kind of profiling tool that yields valuable information about the memory use of lazy functional programs. The tool has two parts: a modified functional language implementation which generated profiling implementation during the execution of programs, and a separate program which converts this information to graphical form. With the aid of profile graphs, one can make alterations to a functional program which dramatically reduce its space consumption. We demonstrate that this is the case of a genuine example – the first to which the tool has been applied – for which the results are strikingly successful.
Colin Runciman & Niklas Röjemo. 1994. “New dimensions in heap profiling”. University of York.
First-generation heap profilers for lazy functional languages have proved to be effective tools for locating some kinds of space faults, but in other cases they cannot provide sufficient information to solve the problem. This paper describes the design, implementation and use of a new profiler that goes beyond the two-dimensional “who produces what” view of heap cells to provide information about their more dynamic and structural attributes. Specifically, the new profiler can distinguish between cells according to their eventual lifetime, or on the basis of the closure retainers by virtue of which they remain part of the live heap. A bootstrapping Haskell compiler (nhc) hosts the implementation: among examples of the profiler’s use we include self-application to nhc. Another example is the original heap-profiling case study “clausify”, which now consumes even less memory and is much faster.
Colin Runciman & Niklas Röjemo. 1996. “Two-pass heap profiling: a matter of life and death”. Department of Computer Science, University of York.
A heap profile is a chart showing the contents of heap memory throughout a computation. Contents are depicted abstractly by showing how much space is occupied by memory cells in each of several classes. A good heap profiler can use a variety of attributes of memory cells to de-fine a classification. Effective profiling usually involves a combination of attributes. The ideal profiler gives full support for combination in two ways. First, a section of the heap of interest to the programmer can be specified by constraining the values of any combination of cell attributes. Secondly, no matter what attributes are used to specify such a section, a heap profile can be obtained for that section only, and any other attribute can be used to define the classification.
Achieving this ideal is not simple For some combinations of attributes. A heap profile is derived by interpolation of a series of censuses of heap contents at different stages. The obvious way to obtain census data is to traverse the live heap at intervals throughout the computation. This is fine for static attributes (e.g. What type of value does this memory cell represent?), and for dynamic attributes that can be determined for each cell by examining the heap at any given moment (e.g. From which function closures can this cell be reached?). But some attributes of cells can only be determined retrospectively by post-mortem inspection asa cell is overwritten or garbage-collected (e.g. Is this cell ever used again?). Now we see the problem: if a profiler supports both live and pose-mortem attributes, how can we implement the ideal of unrestricted combinations? That is the problem me solve in this paper. We give techniques for profiling a. heap section specified in terms of both live and post-mortem attributes. We show how to generate live-attribute profiles of a section of the heal, specified using post-mortem attributes, and vice versa.
Jacob Seligmann & Steffen Grarup. 1995. “Incremental Mature Garbage Collection Using the Train Algorithm”. Springer-Verlag. ECOOP‘95, Lecture Notes in Computer Science, Vol. 952, pp. 235–252, ISBN 3-540-60160-0.
We present an implementation of the Train Algorithm, an incremental collection scheme for reclamation of mature garbage in generation-based memory management systems. To the best of our knowledge, this is the first Train Algorithm implementation ever. Using the algorithm, the traditional mark-sweep garbage collector employed by the Mjølner run-time system for the object-oriented BETA programming language was replaced by a non-disruptive one, with only negligible time and storage overheads.
Manuel Serrano, Hans-J. Boehm. 2000. “Understanding memory allocation of Scheme programs”. ACM. Proceedings of International Conference on Functional Programming 2000.
Memory is the performance bottleneck of modern architectures. Keeping memory consumption as low as possible enables fast and unobtrusive applications. But it is not easy to estimate the memory use of programs implemented in functional languages, due to both the complex translations of some high level constructs, and the use of automatic memory managers. To help understand memory allocation behavior of Scheme programs, we have designed two complementary tools. The first one reports on frequency of allocation, heap configurations and on memory reclamation. The second tracks down memory leaks. We have applied these tools to our Scheme compiler, the largest Scheme program we have been developing. This has allowed us to drastically reduce the amount of memory consumed during its bootstrap process, without requiring much development time. Development tools will be neglected unless they are both conveniently accessible and easy to use. In order to avoid this pitfall, we have carefully designed the user interface of these two tools. Their integration into a real programming environment for Scheme is detailed in the paper.
Marc Shapiro & Paulo Ferreira. 1994. “Larchant-RDOSS: a distributed shared persistent memory and its garbage collector”. INRIA. INRIA Rapport de Recherche no. 2399; Cornell Computer Science TR94-1466.
Larchant-RDOSS is a distributed shared memory that persists on reliable storage across process lifetimes. Memory management is automatic: including consistent caching of data and of locks, collecting objects unreachable from the persistent root, writing reachable objects to disk, and reducing store fragmentation. Memory management is based on a novel garbage collection algorithm, that approximates a global trace by a series of local traces, with no induced I/O or locking traffic, and no synchronization between the collector and the application processes. This results in a simple programming model, and expected minimal added application latency. The algorithm is designed for the most unfavorable environment (uncontrolled programming language, reference by pointers, distributed system, non-coherent shared memory) and should work well also in more favorable settings.
Robert A. Shaw. 1987. “Improving Garbage Collector Performance in Virtual Memory”. Stanford University. CSL-TR-87-323.
Robert A. Shaw. 1988. “Empirical Analysis of a LISP System”. Stanford University. CSL-TR-88-351.
Vivek Singhal, Sheetal V. Kakkad, Paul R. Wilson. 1992. “Texas: An Efficient, Portable Persistent Store”. University of Texas at Austin.
Texas is a persistent storage system for C++, providing high performance while emphasizing simplicity, modularity and portability. A key component of the design is the use of pointer swizzling at page fault time, which exploits existing virtual memory features to implement large address spaces efficiently on stock hardware, with little or no change to existing compilers. Long pointers are used to implement an enormous address space, but are transparently converted to the hardware-supported pointer format when pages are loaded into virtual memory.
Runtime type descriptors and slightly modified heap allocation routines support pagewise pointer swizzling by allowing objects and their pointer fields to be identified within pages. If compiler support for runtime type identification is not available, a simple preprocessor can be used to generate type descriptors.
This address translation is largely independent of issues of data caching, sharing, and checkpointing; it employs operating systems’ existing virtual memories for caching, and a simple and flexible log-structured storage manager to improve checkpointing performance.
Pagewise virtual memory protections are also used to detect writes for logging purposes, without requiring any changes to compiled code. This may degrade checkpointing performance for small transactions with poor locality of writes, but page diffing and sub-page logging promise to keep performance competitive with finer-grained checkpointing schemes.
Texas presents a simple programming interface; an application creates persistent objects by simply allocating them on the persistent heap. In addition, the implementation is relatively small, and is easy to incorporate into existing applications. The log-structured storage module easily supports advanced extensions such as compressed storage, versioning, and adaptive reorganization.
P. G. Sobalvarro. 1988. “A Lifetime-based Garbage Collector for LISP Systems on General-Purpose Computers”. MIT. AITR-1417.
Garbage collector performance in LISP systems on custom hardware has been substantially improved by the adoption of lifetime-based garbage collection techniques. To date, however, successful lifetime-based garbage collectors have required special-purpose hardware, or at least privileged access to data structures maintained by the virtual memory system. I present here a lifetime-based garbage collector requiring no special-purpose hardware or virtual memory system support, and discuss its performance.
Guy L. Steele. 1975. “Multiprocessing Compactifying Garbage Collection”. CACM. 18:9 pp. 495–508.
Algorithms for a multiprocessing compactifying garbage collector are presented and discussed. The simple case of two processors, one performing LISP-like list operations and the other performing garbage collection continuously, is thoroughly examined. The necessary capabilities of each processor are defined, as well as interprocessor communication and interlocks. Complete procedures for garbage collection and for standard list processing primitives are presented and thoroughly explained. Particular attention is given to the problems of marking and relocating list cells while another processor may be operating on them. The primary aim throughout is to allow the list processor to run unimpeded while the other processor reclaims list storage The more complex case involving several list processors and one or more garbage collection processors are also briefly discussed.
Guy L. Steele. 1976. “Corrigendum: Multiprocessing Compactifying Garbage Collection”. CACM. 19:6 p.354.
Guy L. Steele. 1977. “Data Representation in PDP-10 MACLISP”. MIT. AI Memo 420.
The internal representations of the various MacLISP data types are presented and discussed. Certain implementation tradeoffs are considered. The ultimate decisions on these tradeoffs are discussed in the light of MacLISP’s prime objective of being an efficient high-level language for the implementation of large systems such as MACSYMA. The basic strategy of garbage collection is outlined, with reference to the specific representations involved. Certain “clever tricks” are explained and justified. The “address space crunch” is explained and some alternative solutions explored.
James M. Stichnoth, Guei-Yuan Lueh, Michal Cierniak. 1999. “Support for Garbage Collection at Every Instruction in a Java Compiler”. SIGPLAN. Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). SIGPLAN Notices 34(5). pp. 118–127.
A high-performance implementation of a Java Virtual Machine requires a compiler to translate Java bytecodes into native instructions, as well as an advanced garbage collector (e.g., copying or generational). When the Java heap is exhausted and the garbage collector executes, the compiler must report to the garbage collector all live object references contained in physical registers and stack locations. Typical compilers only allow certain instructions (e.g., call instructions and backward branches) to be GC-safe; if GC happens at some other instruction, the compiler may need to advance execution to the next GC-safe point. Until now, no one has ever attempted to make every compiler-generated instruction GC-safe, due to the perception that recording this information would require too much space. This kind of support could improve the GC performance in multithreaded applications. We show how to use simple compression techniques to reduce the size of the GC map to about 20% of the generated code size, a result that is competitive with the best previously published results. In addition, we extend the work of Agesen, Detlefs, and Moss, regarding the so-called “JSR Problem” (the single exception to Java’s type safety property), in a way that eliminates the need for extra runtime overhead in the generated code.
Will R Stoye, T J W Clarke, Arthur C Norman. 1984. “Some Practical Methods for Rapid Combinator Reduction”. In LFP 1984, 159–166.
The SKIM II processor is a microcoded hardware machine for the rapid evaluation of functional languages. This paper gives details of some of the more novel methods employed by SKIM II, and resulting performance measurements. The authors conclude that combinator reduction can still form the basis for the efficient implementation of a functional language.
David Tarditi & Amer Diwan. 1995. “Measuring the Cost of Storage Management”. Carnegie Mellon University. CMU-CS-94-201.
We study the cost of storage management for garbage-collected programs compiled with the Standard ML of New Jersey compiler. We show that the cost of storage management is not the same as the time spent garbage collecting. For many of the programs, the time spent garbage collecting is less than the time spent doing other storage-management tasks.
Stephen Thomas, Richard E. Jones. 1994. “Garbage Collection for Shared Environment Closure Reducers”. Computing Laboratory, The University of Kent at Canterbury. Technical Report 31-94.
Shared environment closure reducers such as Fairbairn and Wray’s TIM incur a comparatively low cost when creating a suspension, and so provide an elegant method for implementing lazy functional evaluation. However, comparatively little attention has been given to the problems involved in identifying which portions of a shared environment are needed (and ignoring those which are not) during a garbage collection. Proper consideration of this issue has subtle consequences when implementing a storage manager in a TIM-like system. We describe the problem and illustrate the negative consequences of ignoring it.
We go on to describe a solution in which the compiler determines statically which portions of that code’s environment are required for each piece of code it generates, and emits information to assist the run-time storage manager to scavenge environments selectively. We also describe a technique for expressing this information directly as executable code, and demonstrate that a garbage collector implemented in this way can perform significantly better than an equivalent, table-driven interpretive collector.
Stephen Thomas. 1995. “Garbage Collection in Shared-Environment Closure Reducers: Space-Efficient Depth First Copying using a Tailored Approach”. Information Processing Letters. 56:1, pp. 1–7.
Implementations of abstract machines such as the OP-TIM and the PG-TIM need to use a tailored garbage collector which seems to require an auxiliary stack,with a potential maximum size that is directly proportional to the amount of live data in the heap. However, it turns out that it is possible to build a recursive copying collector that does not require additional space by reusing already-scavenged space. This paper is a description of this technique.
Mads Tofte & Jean-Pierre Talpin. 1997. “Region-Based Memory Management”. Information and Computation 132(2), pp. 109–176.
This paper describes a memory management discipline for programs that perform dynamic memory allocation and de-allocation. At runtime, all values are put into regions. The store consists of a stack of regions. All points of region allocation and de-allocation are inferred automatically, using a type and effect based program analysis. The scheme does not assume the presence of a garbage collector. The scheme was first presented in 1994 (M. Tofte and J.-P. Talpin, in Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 188–201); subsequently, it has been tested in the ML Kit with Regions, a region-based, garbage-collection free implementation of the Standard ML Core Language, which includes recursive datatypes, higher-order functions and updatable references (L. Birkedal, M. Tofte, and M. Vejlstrup, (1996), in Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 171–183). This paper defines a region-based dynamic semantics for a skeletal programming language extracted from Standard ML. We present the inference system which specifies where regions can be allocated and de-allocated and a detailed proof that the system is sound with respect to a standard semantics. We conclude by giving some advice on how to write programs that run well on a stack of regions, based on practical experience with the ML Kit.
Dave Ungar. 1984. “Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm”. ACM, SIGSOFT, SIGPLAN. Practical Programming Environments Conference.
Many interactive computing environments provide automatic storage reclamation and virtual memory to ease the burden of managing storage. Unfortunately, many storage reclamation algorithms impede interaction with distracting pauses. Generation Scavenging is a reclamation algorithm that has no noticeable pauses, eliminates page faults for transient objects, compacts objects without resorting to indirection, and reclaims circular structures, in one third the time of traditional approaches.
Dave Ungar & Frank Jackson. 1988. “Tenuring Policies for Generation-Based Storage Reclamation”. SIGPLAN. OOPSLA ‘88 Conference Proceedings, ACM SIGPLAN Notices, Vol. 23, No. 11, pp. 1–17.
One of the most promising automatic storage reclamation techniques, generation-based storage reclamation, suffers poor performance if many objects live for a fairly long time and then die. We have investigated the severity of the problem by simulating Generation Scavenging automatic storage reclamation from traces of actual four-hour sessions. There was a wide variation in the sample runs, with garbage-collection overhead ranging from insignificant, during interactive runs, to sever, during a single non-interactive run. All runs demonstrated that performance could be improved with two techniques: segregating large bitmaps and strings, and mediating tenuring with demographic feedback. These two improvements deserve consideration for any generation-based storage reclamation strategy.
Kiem-Phong Vo. 1996. “Vmalloc: A General and Efficient Memory Allocator”. Software – Practice and Experience. 26(3): 357–374 (1996).
On C/Unix systems, the malloc interface is standard for dynamic memory allocation. Despite its popularity, malloc’s shortcomings frequently cause programmers to code around it. The new library Vmalloc generalizes malloc to give programmers more control over memory allocation. Vmalloc introduces the idea of organizing memory into separate regions, each with a discipline to get raw memory and a method to manage allocation. Applications can write their own disciplines to manipulate arbitrary type of memory or just to better organize memory in a region by creating new regions out of its memory. The provided set of allocation methods include general purpose allocations, fast special cases and aids for memory debugging or profiling. A compatible malloc interface enables current applications to select allocation methods using environment variables so they can tune for performance or perform other tasks such as profiling memory usage, generating traces of allocation calls or debugging memory errors. A performance study comparing Vmalloc and currently popular malloc implementations shows that Vmalloc is competitive to the best of these allocators. Applications can gain further performance improvement by using the right mixture of regions with different Vmalloc methods.
Daniel C. Watson, David S. Wise. 1976. “Tuning Garwick’s algorithm for repacking sequential storage”. BIT. 16, 4 (December 1976): 442–450.
Garwick’s algorithm, for repacking LIFO lists stored in a contiguous block of memory, bases the allocation of remaining space upon both sharing and previous stack growth. A system whereby the weight applied to each method can be adjusted according to the current behaviour of the stacks is discussed.
We also investigate the problem of determining during memory repacking that the memory is used to saturation and the driving program should therefore be aborted. The tuning parameters studied here seem to offer no new grasp on this problem.
Paul R. Wilson, Michael S. Lam, Thomas G. Moher. 1992. “Caching Considerations for Generational Garbage Collection”. ACM. L&FP 92.
GC systems allocate and reuse memory cyclically; this imposes a cyclic pattern on memory accesses that has its own distinctive locality characteristics. The cyclic reuse of memory tends to defeat caching strategies if the reuse cycle is too large to fit in fast memory. Generational GCs allow a smaller amount of memory to be reused more often. This improves VM performance, because the frequently-reused area stays in main memory. The same principle can be applied at the level of high-speed cache memories, if the cache is larger than the youngest generation. Because of the repeated cycling through a fixed amount of memory, however, generational GC interacts with cache design in unusual ways, and modestly set-associative caches can significantly outperform direct-mapped caches.
While our measurements do not show very high miss rates for GCed systems, they indicate that performance problems are likely in faster next-generation systems, where second-level cache misses may cost scores of cycles. Software techniques can improve cache performance of garbage-collected systems, by decreasing the cache “footprint” of the youngest generation; compiler techniques that reduce the amount of heap allocation also improve locality. Still, garbage-collected systems with a high rate of heap allocation require somewhat more cache capacity and/or main memory bandwidth than conventional systems.
Paul R. Wilson, Sheetal V. Kakkad. 1992. “Pointer Swizzling at Page Fault Time”. University of Texas at Austin.
Pointer swizzling at page fault time is a novel address translation mechanism that exploits conventional address translation hardware. It can support huge address spaces efficiently without long hardware addresses; such large address spaces are attractive for persistent object stores, distributed shared memories, and shared address space operating systems. This swizzling scheme can be used to provide data compatibility across machines with different word sizes, and even to provide binary code compatibility across machines with different hardware address sizes.
Pointers are translated (“swizzled”) from a long format to a shorter hardware-supported format at page fault time. No extra hardware is required, and no continual software overhead is incurred by presence checks of indirection of pointers. This pagewise technique exploits temporal and spatial locality in much the same way as normal virtual memory; this gives it many desirable performance characteristics, especially given the trend toward larger main memories. It is easy to implement using common compilers and operating systems.
Paul R. Wilson. 1994. “Uniprocessor Garbage Collection Techniques”. University of Texas.
We survey basic garbage collection algorithms, and variations such as incremental and generational collection; we then discuss low-level implementation considerations and the relationships between storage management systems, languages, and compilers. Throughout, we attempt to present a unified view based on abstract traversal strategies, addressing issues of conservatism, opportunism, and immediacy of reclamation; we also point out a variety of implementation details that are likely to have a significant impact on performance.
Paul R. Wilson, Mark S. Johnstone, Michael Neely, David Boles. 1995. “Dynamic Storage Allocation: A Survey and Critical Review”. University of Texas at Austin.
Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960, and memory allocation is widely considered to be either a solved problem or an insoluble one. In this survey, we describe a variety of memory allocator designs and point out issues relevant to their design and evaluation. We then chronologically survey most of the literature on allocators between 1961 and 1995. (Scores of papers are discussed, in varying detail, and over 150 references are given.)
We argue that allocator designs have been unduly restricted by an emphasis on mechanism, rather than policy, while the latter is more important; higher-level strategic issues are still more important, but have not been given much attention.
Most theoretical analyses and empirical allocator evaluations to date have relied on very strong assumptions of randomness and independence, but real program behavior exhibits important regularities that must be exploited if allocators are to perform well in practice.
David S. Wise. 1978. “The double buddy system”. Department of Computer Science at Indiana University. Technical Report 79.
A new buddy system is described in which the region of storage being managed is partitioned into two sub-regions, each managed by a fairly standard “binary” buddy system. Like the weighted buddy systems of Shen and Peterson, the block sizes are of sizes 2n+1 or 3·2n, but unlike theirs there is no extra overhead for typing information or for buddy calculation, and an allocation which requires splitting an extant available block only rarely creates a block smaller than the one being allocated. Such smaller blocks are carved out only when the boundary between the two subregions floats; the most interesting property of this system is that the procedures for allocation and deallocation are designed to keep blocks immediately adjacent to the subregion boundary free, so that the boundary may be moved within a range of unused space without disturbing blocks in use. This option is attained with a minimum of extra computation beyond that of a binary buddy system, and provides this scheme with a new approach to the problem of external fragmentation.
David S. Wise. 1979. “Morris’s garbage compaction algorithm restores reference counts”. TOPLAS. 1, 1 (July 1979): 115–120.
The two-pass compaction algorithm of F.L. Morris, which follows upon the mark phase in a garbage collector, may be modified to recover reference counts for a hybrid storage management system. By counting the executions of two loops in that algorithm where upward and downward references, respectively, are forwarded to the relocation address of one node, we can initialize a count of active references and then update it but once. The reference count may share space with the mark bit in each node, but it may not share the additional space required in each pointer by Morris’s algorithm, space which remains unused outside the garbage collector.
David S. Wise. 1985. “Design for a multiprocessing heap with on-board reference counting”. Springer-Verlag. In J.-P. Jouannaud (ed.), Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201: 289–304.
A project to design a pair of memory chips with a modicum of intelligence is described. Together, the two allow simple fabrication of a small memory bank, a heap of binary (LISP-like) nodes that offers the following features: 64-bit nodes; two pointer fields per node up to 29 bits each; reference counts implicitly maintained on writes; 2 bits per node for marking (uncounted) circular references; 4 bits per node for conditional-store testing at the memory; provision for processor-driven, recounting garbage collection.
David S. Wise. 1993. “Stop-and-copy and one-bit reference counting”. Information Processing Letters. 46, 5 (July 1993): 243–249.
A stop-and-copy garbage collector updates one-bit reference counting with essentially no extra space and minimal memory cycles beyond the conventional collection algorithm. Any object that is uniquely referenced during a collection becomes a candidate for cheap recovery before the next one, or faster recopying then if it remains uniquely referenced. Since most objects stay uniquely referenced, subsequent collections run faster even if none are recycled between garbage collections. This algorithm extends to generation scavenging, it admits uncounted references from roots, and it corrects conservatively stuck counters, that result from earlier uncertainty whether references were unique.
David S. Wise, Joshua Walgenbach. 1996. “Static and Dynamic Partitioning of Pointers as Links and Threads”. SIGPLAN. Proc. 1996 ACM SIGPLAN Intl. Conf. on Functional Programming, SIGPLAN Not. 31, 6 (June 1996), pp. 42–49.
Identifying some pointers as invisible threads, for the purposes of storage management, is a generalization from several widely used programming conventions, like threaded trees. The necessary invariant is that nodes that are accessible (without threads) emit threads only to other accessible nodes. Dynamic tagging or static typing of threads ameliorates storage recycling both in functional and imperative languages.
We have seen the distinction between threads and links sharpen both hardware- and software-supported storage management in SCHEME, and also in C. Certainly, therefore, implementations of languages that already have abstract management and concrete typing, should detect and use this as a new static type.
David S. Wise, Brian Heck, Caleb Hess, Willie Hunt, Eric Ost. 1997. “Uniprocessor Performance of a Reference-Counting Hardware Heap”. LISP and Symbolic Computation. 10, 2 (July 1997), pp. 159–181.
A hardware self-managing heap memory (RCM) for languages like LISP, SMALLTALK, and JAVA has been designed, built, tested and benchmarked. On every pointer write from the processor, reference-counting transactions are performed in real time within this memory, and garbage cells are reused without processor cycles. A processor allocates new nodes simply by reading from a distinguished location in its address space. The memory hardware also incorporates support for off-line, multiprocessing, mark-sweep garbage collection.
Performance statistics are presented from a partial implementation of SCHEME over five different memory models and two garbage collection strategies, from main memory (no access to RCM) to a fully operational RCM installed on an external bus. The performance of the RCM memory is more than competitive with main memory.
P. Tucker Withington. 1991. “How Real is ‘Real-Time’ Garbage Collection?”. ACM. OOPSLA/ECOOP ‘91 Workshop on Garbage Collection in Object-Oriented Systems.
A group at Symbolics is developing a Lisp runtime kernel, derived from its Genera operating system, to support real-time control applications. The first candidate application has strict response-time requirements (so strict that it does not permit the use of paged virtual memory). Traditionally, Lisp’s automatic storage-management mechanism has made it unsuitable to real-time systems of this nature. A number of garbage collector designs and implementations exist (including the Genera garbage collector) that purport to be “real-time”, but which actually have only mitigated the impact of garbage collection sufficiently that it usually goes unnoticed by humans. Unfortunately, electro-mechanical systems are not so forgiving. This paper examines the limitations of existing real-time garbage collectors and describes the avenues that we are exploring in our work to develop a CLOS-based garbage collector that can meet the real-time requirements of real real-time systems.
G. May Yip. 1991. “Incremental, Generational Mostly-Copying Garbage Collection in Uncooperative Environments”. Digital Equipment Corporation.
The thesis of this project is that incremental collection can be done feasibly and efficiently in an architecture and compiler independent manner. The design and implementation of an incremental, generational mostly-copying garbage collector for C++ is presented. The collector achieves, simultaneously, real-time performance (from incremental collection), low total garbage collection delay (from generational collection), and the ability to function without hardware and compiler support (from mostly-copying collection).
The incremental collector runs on commercially-available uniprocessors, such as the DECStation 3100, without any special hardware support. It uses UNIX’s user controllable page protection facility (mprotect) to synchronize between the scanner (of the collector) and the mutator (of the application program). Its implementation does not require any modification to the C++ compiler. The maximum garbage collection pause is well within the 100-millisecond limit imposed by real-time applications executing on interactive workstations. Compared to its non-incremental version, the total execution time of the incremental collector is not adversely affected.
Taiichi Yuasa. 1990. “Real-Time Garbage Collection on General-Purpose Machines”. Journal of Software and Systems. 11:3 pp. 181–198.
An algorithm for real-time garbage collection is presented, proved correct, and evaluated. This algorithm is intended for list-processing systems on general-purpose machines, i.e., Von Neumann style serial computers with a single processor. On these machines, real-time garbage collection inevitably causes some overhead on the overall execution of the list-processing system, because some of the primitive list-processing operations must check the status of garbage collection. By removing such overhead from frequently used primitives such as pointer references (e.g., Lisp car and cdr) and stack manipulations, the presented algorithm reduces the execution overhead to a great extent. Although the algorithm does not support compaction of the whole data space, it efficiently supports partial compaction such as array relocation.
Benjamin Zorn & Paul Hilfinger. 1988. “A Memory Allocation Profiler for C and Lisp Programs”. USENIX. Proceedings for the Summer 1988 USENIX Conference, pp. 223–237.
This paper describes inprof, a tool used to study the memory allocation behavior of programs. mprof records the amount of memory each function allocates, breaks down allocation information by type and size, and displays a program’s dynamic cal graph so that functions indirectly responsible for memory allocation are easy to identify. mprof is a two-phase tool. The monitor phase is linked into executing programs and records information each time memory is allocated. The display phase reduces the data generated by the monitor and displays the information to the user in several tables. mprof has been implemented for C and Kyoto Common Lisp. Measurements of these implementations are presented.
Benjamin Zorn. 1989. “Comparative Performance Evaluation of Garbage Collection Algorithms”. Computer Science Division (EECS) of University of California at Berkeley. Technical Report UCB/CSD 89/544 and PhD thesis.
This thesis shows that object-level, trace-driven simulation can facilitate evaluation of language runtime systems and reaches new conclusions about the relative performance of important garbage collection algorithms. In particular, I reach the unexpected conclusion that mark-and-sweep garbage collection, when augmented with generations, shows comparable CPU performance and much better reference locality than the more widely used copying algorithms. In the past, evaluation of garbage collection algorithms has been limited by the high cost of implementing the algorithms. Substantially different algorithms have rarely been compared in a systematic way.
With the availability of high-performance, low-cost workstations, trace-driven performance evaluation of these algorithms is now economical. This thesis describes MARS, a runtime system simulator that is driven by operations on program objects, and not memory addresses. MARS has been attached to a commercial Common Lisp system and eight large Lisp applications are used in the thesis as test programs. To illustrate the advantages of the object-level tracing technique used by MARS, this thesis compares the relative performance of stop-and-copy, incremental, and mark-and-sweep collection algorithms, all organized with multiple generations. The comparative evaluation is based on several metrics: CPU overhead, reference locality, and interactive availability.
Mark-and-sweep collection shows slightly higher CPU overhead than stop-and-copy ability (5 percent), but requires significantly less physical memory to achieve the same page fault rate (30-40 percent). Incremental collection has very good interactive availability, but implementing the read barrier on stock hardware incurs a substantial CPU overhead (30-60 percent). In the future, I will use MARS to investigate other performance aspects of sophisticated runtime systems.
Benjamin Zorn. 1990. “Comparing Mark-and-sweep and Stop-and-copy Garbage Collection”. ACM. Conference on Lisp and Functional Programming, pp. 87–98.
Stop-and-copy garbage collection has been preferred to mark-and-sweep collection in the last decade because its collection time is proportional to the size of reachable data and not to the memory size. This paper compares the CPU overhead and the memory requirements of the two collection algorithms extended with generations, and finds that mark-and-sweep collection requires at most a small amount of additional CPU overhead (3-6%) but requires an average of 20% (and up to 40%) less memory to achieve the same page fault rate. The comparison is based on results obtained using trace-driven simulation with large Common Lisp programs.
Benjamin Zorn. 1990. “Barrier Methods for Garbage Collection”. University of Colorado at Boulder. Technical Report CU-CS-494-90.
Garbage collection algorithms have been enhanced in recent years with two methods: generation-based collection and Baker incremental copying collection. Generation-based collection requires special actions during certain store operations to implement the “write barrier”. Incremental collection requires special actions on certain load operations to implement the “read barrier”. This paper evaluates the performance of different implementations of the read and write barriers and reaches several important conclusions. First, the inlining of barrier checks results in surprisingly low overheads, both for the write barrier (2%-6%) and the read barrier (< 20%). Contrary to previous belief, these results suggest that a Baker-style read barrier can be implemented efficiently without hardware support. Second, the use of operating system traps to implement garbage collection methods results in extremely high overheads because the cost of trap handling is so high. Since this large overhead is completely unnecessary, operating system memory protection traps should be reimplemented to be as fast as possible. Finally, the performance of these approaches on several machine architectures is compared to show that the results are generally applicable.
Benjamin Zorn. 1991. “The Effect of Garbage Collection on Cache Performance”. University of Colorado at Boulder. Technical Report CU-CS-528-91.
Cache performance is an important part of total performance in modern computer systems. This paper describes the use of trace-driven simulation to estimate the effect of garbage collection algorithms on cache performance. Traces from four large Common Lisp programs have been collected and analyzed with an all-associativity cache simulator. While previous work has focused on the effect of garbage collection on page reference locality, this evaluation unambiguously shows that garbage collection algorithms can have a profound effect on cache performance as well. On processors with a direct-mapped cache, a generation stop-and-copy algorithm exhibits a miss rate up to four times higher than a comparable generation mark-and-sweep algorithm. Furthermore, two-way set-associative caches are shown to reduce the miss rate in stop-and-copy algorithms often by a factor of two and sometimes by a factor of almost five over direct-mapped caches. As processor speeds increase, cache performance will play an increasing role in total performance. These results suggest that garbage collection algorithms will play an important part in improving that performance.
Benjamin Zorn & Dirk Grunwald. 1992. “Empirical Measurements of Six Allocation-intensive C Programs”. ACM, SIGPLAN. SIGPLAN notices, 27(12):71–80.
Dynamic memory management is an important part of a large class of computer programs and high-performance algorithms for dynamic memory management have been, and will continue to be, of considerable interest. This paper presents empirical data from a collection of six allocation-intensive C programs. Extensive statistics about the allocation behavior of the programs measured, including the distributions of object sizes, lifetimes, and interarrival times, are presented. This data is valuable for the following reasons: first, the data from these programs can be used to design high-performance algorithms for dynamic memory management. Second, these programs can be used as a benchmark test suite for evaluating and comparing the performance of different dynamic memory management algorithms. Finally, the data presented gives readers greater insight into the storage allocation patterns of a broad range of programs. The data presented in this paper is an abbreviated version of more extensive statistics that are publicly available on the internet.
Benjamin Zorn. 1993. “The Measured Cost of Conservative Garbage Collection”. Software – Practice and Experience. 23(7):733–756.
Because dynamic memory management is an important part of a large class of computer programs, high-performance algorithms for dynamic memory management have been, and will continue to be, of considerable interest. Experience indicates that for many programs, dynamic storage allocation is so important that programmers feel compelled to write and use their own domain-specific allocators to avoid the overhead of system libraries. Conservative garbage collection has been suggested as an important algorithm for dynamic storage management in C programs. In this paper, I evaluate the costs of different dynamic storage management algorithms, including domain-specific allocators; widely-used general-purpose allocators; and a publicly available conservative garbage collection algorithm. Surprisingly, I find that programmer enhancements often have little effect on program performance. I also find that the true cost of conservative garbage collection is not the CPU overhead, but the memory system overhead of the algorithm. I conclude that conservative garbage collection is a promising alternative to explicit storage management and that the performance of conservative collection is likely to be improved in the future. C programmers should now seriously consider using conservative garbage collection instead of malloc/free in programs they write.
Benjamin Zorn & Dirk Grunwald. 1994. “Evaluating Models of Memory Allocation”. ACM. Transactions on Modeling and Computer Simulation 4(1):107–131.
Because dynamic memory management is an important part of a large class of computer programs, high-performance algorithms for dynamic memory management have been, and will continue to be, of considerable interest. We evaluate and compare models of the memory allocation behavior in actual programs and investigate how these models can be used to explore the performance of memory management algorithms. These models, if accurate enough, provide an attractive alternative to algorithm evaluation based on trace-driven simulation using actual traces. We explore a range of models of increasing complexity including models that have been used by other researchers. Based on our analysis, we draw three important conclusions. First, a very simple model, which generates a uniform distribution around the mean of observed values, is often quite accurate. Second, two new models we propose show greater accuracy than those previously described in the literature. Finally, none of the models investigated appear adequate for generating an operating system workload.