The Memory Management Reference
Articles
Memory management in various languages

 Contents | News | Glossary | FAQ | Articles | Bibliography | Links | Feedback

This is a survey of memory management in various programming languages. We can't hope to cover all the programming languages in the world, so we've chosen some that are widely-used or that have particularly interesting memory management features.

Each language and its memory management facilities are characterized briefly, with links to additional reference material, such as the glossary, the bibliography, and useful web sites. For a full explanation of any terms used, see the glossary. The bibliography's summary by language contains a complete list of work related in some way to particular languages.

Generally, it may be observed that most modern high-level languages have automatic memory management, with C++ as the notable exception. For object-oriented languages, like JavaTM, Common Lisp or Modula-3, it helps in designing systems with a high level of abstraction and low coupling between classes. For scripting languages, like Perl, it's a matter of convenience. And for interactive software development environments, it allows dynamic interaction with a running program, since neither the program nor the developer need to keep track of every block of memory.

It can also be seen that old languages, such as COBOL and Fortran, have grown over time to have more sophisticated memory management features included in the standard. The notable exception here is Lisp, which started off with a garbage collector already included.

Languages covered: ALGOL, BASIC, C, COBOL, Common Lisp, C++, DylanTM, Emacs Lisp, Fortran, JavaTM, JavaScriptTM, Lisp, ML, Modula-3, Pascal, Perl, PostScript®, Prolog, Scheme, Simula, Smalltalk.

ALGOL 

ALGOL, designed in 1958 for scientific computing, was the first block-structured language. It spawned a whole family of languages, and inspired many more, including Scheme, Simula and Pascal.

The block structure of ALGOL 60 induced a stack allocation discipline. It had limited dynamic arrays, but no general heap allocation. The substantially redesigned ALGOL 68 had both heap and stack allocation. It also had something like the modern pointer type, and required garbage collection for the heap. The new language was complex and difficult to implement, and it was never as successful as its predecessor.

Relevant publications:

BASIC 

BASIC is a simple and easily-learned programming language developed by T. E. Kurtz and J. G. Kemeny in 1963-4. The motivation was to make computers easily accessible to undergraduate students in all disciplines.

Most BASICs had quite powerful string handling operations that required a simple garbage collector. In many implementations, the garbage collector could be forced to run by running the mysterious expression FRE("").

BASIC is now old-fashioned, but survives as a scripting language, in particular in Visual BASIC®, which is an application development environment with a BASIC-like scripting language. These descendants invariably have automatic memory management as well.

C 

C is a systems programming language sometimes described as "a portable assembler" because it was intended to be sufficiently low-level to allow performance comparable to assembler or machine code, but sufficiently high-level to allow programs to be reused on other platforms with little or no modification.

Memory management is typically manual (the standard library functions for memory(2) management in C, malloc and free(2), have become almost synonymous with manual memory management), although with the Boehm-Weiser collector(1), it is now possible to use garbage collection.

The language is notorious for fostering large numbers of memory management bugs, including:

Related terms: automatic storage duration; static storage duration

Relevant publications:

Useful websites:

COBOL 

COBOL was designed by the CODASYL committee in 1959-60 to be a business programming language, and has been extended many times since. It is still the most widely-used programming language (in terms of lines of code in use).

COBOL has no heap allocation, and has done quite well in its domain without it. The next version of the ISO standard, slated for 2002, will have pointers and heap allocation through ALLOCATE and FREE, mainly in order to be able to use C-style interfaces. It will also support a high level of abstraction through object-oriented programming and garbage collection (probably including finalization).

Useful websites:

Common Lisp 

Common Lisp is the major dialect of the Lisp family. In addition to the usual Lisp features, it has an advanced object system, data types from hash tables to complex numbers, and a rich standard library.

Common Lisp is a garbage-collected language, and modern implementations, such as LispWorks® and Liquid Common Lisp®, include advanced features, such as finalization and weakness.

Useful websites:

C++ 

C++ is a (weakly) object-oriented language, extending the systems programming language C with a multiple-inheritance class mechanism and simple method dispatch.

The standard library functions for memory(2) management in C++ are new and delete. The higher abstraction level of C++ makes the bookkeeping required for manual memory management even harder. Although the standard library provides only manual memory management, with the Boehm-Weiser collector(1), it is now possible to use garbage collection. Smart pointers are another popular solution.

The language is notorious for fostering large numbers of memory management bugs, including:

Historical note: C++ was designed by Bjarne Stroustrup, as a minimal object-oriented extension to C. It has since grown to include some other modern programming language ideas. The first implementations were preprocessors that produced C code, but modern implementations are dedicated C++ compilers.

C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user.

(from Ellis and Stroustrup's The Annotated C++ Reference Manual)

Related terms: constructor(2); destructor(2)

Relevant publications:

Useful websites:

DylanTM 

Dylan is a modern programming language invented by Apple® around 1993 and developed by Harlequin and other partners. The language is a distillation of the best ideas in dynamic and object-oriented programming. Its ancestors include Lisp, Smalltalk, and C++. Dylan is aimed at building modular component software and delivering safe, compact applications. It also facilitates the rapid development and incremental refinement of prototype programs.

Dylan provides automatic memory management. The generic allocation function is called make. Most implementations provide finalization and weak(1) hash tables, although interfaces for these features have not yet been standardized. An object may be registered for finalization via the function finalize-when-unreachable, in which case there will be a call to the finalize function once the garbage collector has determined that the object is unreachable. Weak hash tables may have either weak keys or values, depending on a parameter supplied at allocation time. A hash table entry will be deleted once the garbage collector has determined that there are no strong references to the key or value of the entry, for weak key or value tables, respectively.

Useful websites:

Emacs Lisp 

Emacs Lisp or elisp is a dialect of Lisp used in the Emacs family of text editors, of which the most widely-used is GNU Emacs.

Like most Lisps, elisp requires garbage collection. GNU Emacs has a simple mark-sweep collector. It has been speculated that the non-incremental nature of this GC, combined with the fact that it prints a message whenever it collects, has given GC a bad name in programming circles.

Erik Naggum (erik@naggum.no) reports:

I have run some tests at the U of Oslo with about 100 users who generally agreed that Emacs had become faster in the latest Emacs pretest. All I had done was to remove the "Garbage collecting" message which people perceive as slowing Emacs down and tell them that it had been sped up. It is, somehow, permissible for a program to take a lot of time doing any other task than administrative duties like garbage collection.

Emacs was originally written in Teco, not in Lisp, but it still had a garbage collector, though this was heuristic and conservative in nature. Teco-based Emacs was capable of running for weeks at a time in a 256 kB address space.

Useful websites:

Fortran 

Fortran, created in 1957, was one of the first languages qualifying as a high-level language. It is popular among scientists and has substantial support in the form of numerical libraries. For a long time, it had static allocation only. The Fortran 90 standard added recursion with stack allocation (automatic arrays). It also added dynamic allocation using ALLOCATE with manual deallocation using DEALLOCATE.

Useful websites:

JavaTM 

A modern object-oriented language with a rich collection of useful features. The JavaTM language started as an attempt by the Java group at SunTM to overcome software engineering problems introduced by C++. Key reasons for the language's success were the security model and the portable execution environment, the Java Virtual Machine (JVM), which created a lot of interest for it as a platform for distributed computing on open networks.

Java is garbage-collected, as this facilitates object-oriented programming and is essential for security (which use after free would break). It had finalization from version 1.0 and three kinds of weakness from version 1.2 (confusingly, part of the JavaTM 2 Platform).

Early JVMs had simple collectors that didn't scale well for large programs, but the current crop is catching up to the state of the art.

Related terms: reference object; strong reference; soft reference; weak reference(2); phantom reference; strongly reachable; softly reachable; weakly reachable; phantom reachable

Useful websites:

JavaScriptTM 

JavaScriptTM is a scripting language used by some web browsers. The loose type system resembles other scripting languages, although the syntax follows C. There's a prototype-based object system. Note that JavaScript is not related to JavaTM in any way except name. There's a standard by ECMA, known as ECMAScript.

String management resembles BASIC: there's no way to manually deallocate strings created by string concatenation and string methods (one hopes they are garbage-collected, but some browsers have been known to leak them instead). Likewise, objects created by the C++-like new operator are automatically managed (the delete operator deletes properties (variables and fields), rather than values).

Useful websites:

Lisp 

Lisp is a family of computer languages combining functional and procedural features with automatic memory management.

Lisp was invented by John McCarthy around 1958 for the manipulation of symbolic expressions. As part of the original implementation of Lisp, he invented garbage collection. He noted:

This process, because it is entirely automatic, is more convenient for the programmer than a system in which he has to keep track of lists and erase unwanted lists.

Modern Lisp implementations, such as LispWorks® and Liquid Common Lisp®, have advanced garbage collectors.

Lisp is now used for all kinds of symbolic programming and other advanced software development. Major dialects today are Emacs Lisp, Common Lisp and Scheme. Most modern dialects and related languages, such as DylanTM, are object-oriented.

The Lisp Machines

Of particular interest in the history of memory management are the Lisp Machines, early workstation computers built around a custom processor designed to improve the execution speed of Lisp by implementing primitive Lisp operations in microcode. The Lisp Machine garbage collector is a generalization of the algorithm described in Baker's List Processing in Real Time on a Serial Computer and used a technique similar to that described in Ungar's Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm, but utilizing hardware to improve performance.

A description of the garbage collector of one particular model is in Garbage Collection in a Large Lisp System. The features important for its performance were:

The remembered sets were based on a BIBOP division of the virtual address space. The Lisp Machine page-table, unlike virtually all modern virtual memory systems, was a flat, hash-based table (sometimes called an inverted page table), and thus insensitive to sparsely-populated virtual address spaces associated with BIBOP schemes.

These custom processors eventually lost out to rapidly advancing stock hardware. Many of the techniques pioneered on Lisp Machines are used in today's implementations, at a cost of a few more cycles.

Related terms: cons(1)

Relevant publications:

Useful websites:

ML 

ML is a family of strongly-typed functional languages, of which the principal members are Standard ML and Caml.

Like other functional languages, ML provides automatic memory management. Modern ML implementations usually have advanced garbage collectors. The combination of clean functional semantics and strong typing allows advanced techniques, such as region inference.

The Standard ML of New Jersey (SML/NJ) system, which implements a slight variant of Standard ML, has been important to memory management research for three reasons. Firstly, the source code is publicly available and widely ported, allowing experimentation with both the collector(2) and mutator. Secondly, the compiler generates code that does not use a control stack, but allocates function activation records on the heap instead. This means that the allocation rate is very high (up to one byte per instruction), and also that the collector has a very small root set. Thirdly, it uses a simple copying collector that is easy to modify.

Related terms: immutable

Relevant publications:

Useful websites:

Modula-3 

An object-oriented descendant of Pascal.

Modula-3 is mostly garbage-collected, although it is possible to use manual memory management in certain modules.

Useful websites:

Pascal 

An imperative language characterized by block structure and a relatively strong (for its time) static type system. Pascal was designed by Niklaus Wirth around 1970.

Pascal was popular as a teaching language due to its small size, but it lacked many features needed for applications programming. Now it's been largely supplanted by its more feature-rich descendants Modula-2, Modula-3, and Oberon, mainly surviving in the popular Delphi development tool.

Pascal uses manual memory management (with the operators NEW and DISPOSE). The descendants mentioned all offer automatic memory management.

Useful websites:

Perl 

Perl is a complex but powerful language that is an eclectic mixture of scripting languages and programming languages.

Perl programmers can work with strings, arrays, and associative arrays without having to worry about manual memory management. Perl is well-suited to complex text file manipulation, such as report generation, file format conversion, and web server CGI scripts. It is also useful for rapid prototyping, but large Perl scripts are often unmaintainable.

Perl's memory management is well-hidden, but is based on reference counts and garbage collection. It also has mortal variables, whose lifetimes are limited to the current context. It is possible to free(1) the memory(2) assigned to variables (including arrays) explicitly, by undef-ing the only reference to them.

Useful websites:

PostScript® 

The PostScript® language is an interpretive language with powerful graphics features, widely used as a page description language for printers and typesetters.

The Level 1 PostScript language has a simple stack-like memory management model, using save and restore operators to recycle memory. The Level 2 PostScript language adds garbage collection to this model.

Related terms: VM(2); composite object; simple object

Useful websites:

Prolog 

A logic programming language invented by Alain Colmerauer around 1970, Prolog is popular in the AI and symbolic computation community. It is special because it deals directly with relationships and inference rather than functions or commands.

Storage is usually managed using a garbage collector, but the complex control flow places special requirements on the collector.

Useful websites:

Scheme 

A small functional language blending influences from Lisp and Algol.

Key features of Scheme include symbol and list operations, heap allocation and garbage collection, lexical scoping with first-class function objects (implying closures), reliable tail-call elimination (allowing iterative procedures to be described tail-recursively), the ability to dynamically obtain the current continuation as a first-class object, and a language description that includes a formal semantics.

Scheme has been gaining popularity as an extension language; Project GNU's extension package of choice, Guile, is a Scheme interpreter. Garbage collection is an important part of the ease of use that is expected from an extension language.

Useful websites:

Simula 

Simula was designed as a language for simulation, but it expanded into a full general-purpose programming language and the first object-oriented language.

Simula I, designed in 1962-64 by Kristen Nygaard & Ole-Johan Dahl, was based on Algol 60, but the stack allocation discipline was replaced by a two-dimensional free-list.

It was Simula 67 that pioneered classes and inheritance to express behavior. This domain-oriented design was supported by garbage collection.

Relevant publications:

Smalltalk 

Smalltalk is an object-oriented language with single inheritance and message-passing.

Automatic memory management is an essential part of the Smalltalk philosophy. Many important techniques were first developed or implemented for Smalltalk.

Relevant publications:

Useful websites: