Richard Jones

with a chapter on Distributed Garbage Collection by Rafael Lins

John Wiley & Sons, Ltd, ISBN 0-471-94148-4
403 pages, hardback


Preface

This book is about garbage collection: the automatic reclamation of heap-allocated storage after its last use by a program. Memory is and has always been a scarce and precious resource. In the early days of computing, before the advent of very large scale integrated circuits, memory was expensive and even a time-sharing operating system like Unix was expected to run in just one 64-kilobyte segment. Today, although SIMM memory modules are comparatively cheap to buy and easy to install, programs are increasingly profligate in their consumption of this resource. Microsoft Windows'95, an operating system for a single-user personal computer, needs more than twelve megabytes of RAM to operate optimally. Thus main memory alone may account for half the cost of a PC. As with all precious resources, memory needs to be carefully managed and recycled when no longer needed.

The storage requirements of many computer programs are simple and predictable. Allocation and deallocation of memory for such programs can be handled by the programmer or the compiler, and indeed is best done so. Other programs have grown enormously in size and complexity. Languages like Lisp and Prolog typically manipulate large data structures with complex inter-dependencies. Functional and logic languages have complex patterns of execution. The result is that the useful lifetimes of many data structures can no longer be determined before execution, either by programmer or by compiler. Automatic storage reclamation is essential.

One reflection of the growing importance of garbage collection is the level of debate on this subject within the computer science community. As well as individual papers in journals and conferences, there have been workshops on garbage collection at the 1990, 1991 and 1993 Object-Oriented Systems, Languages and Applications OOPSLA conferences, as well as international workshops exclusively devoted to the topic in 1992 and 1995. Garbage collection is also a perennially popular subject for extended argument on Usenet news groups.

Object-orientation is the strongest growing area of interest in analysis, design and programming today. The key to good software engineering is the control of complexity. One of the ways that object-oriented design achieves this goal is the encapsulation of abstractions into objects that communicate through clearly defined interfaces. Programmer-controlled storage management inhibits this modularity. For this reason, most modern object-oriented languages, such as Smalltalk, Eiffel, Java and Dylan, are supported by garbage collection. Today, even languages used in part for systems programming, such as Modula-3 and Oberon, provide garbage collection for these sound but pragmatic reasons. Garbage collecting libraries are even available for such uncooperative languages as C and C++.

The audience

The literature on garbage collection is enormous. Well over a thousand journal articles, chapters in books, presentations to conferences, technical reports and postgraduate theses have been written on the subject. Despite this many myths about garbage collectors prevail. `They are only necessary for Lisp and functional languages; they can only be used with interpreters rather than for compiled code; they place an intolerable overhead on programs' - and doubtless they have cloven hooves and forked tails as well! Two corollaries follow. First, garbage-collected solutions are often ignored where they could profitably be applied. Second, where the complexities of the data structures involved demand garbage collection, the experience provided in the literature is often unfamiliar so a wheel is reinvented.

The aim of this book is to draw this wealth of experience together into a single, accessible and unified framework. State of the art techniques are described and compared for declarative and imperative programming styles, for sequential, concurrent and distributed architectures. Each of the most important algorithms is explained in detail, often with illustrations of its characteristic features and animations of its use. Its complexity, performance, applicability and relationship with other related algorithms is also discussed.

We believe that this survey should prove useful to postgraduate students and researchers working in Compiler Construction; Functional, Logic and Object-oriented Programming and Design; Software Engineering; and Operating Systems. The book should also be of interest to students taking advanced courses in these areas. We hope that professionals developing programs - from simple software tools to complex real-time systems - will find this book valuable. In particular, the rapid growth in popularity of object-oriented systems over the past few years makes a thorough understanding of garbage collection methods essential for any programmer in this area.

Organisation of the book

The first chapter begins with the evolution of computer memory management and the need for automatic storage reclamation. We then describe the representation of objects in our heap, and discuss the yardsticks by which different strategies of garbage collection may be measured. The chapter ends with a description of our pseudo-code notation.

Chapter 2 introduces the three `classical' techniques for garbage collection: reference counting, mark-sweep and copying collection. Readers with some experience of these techniques may wish to skip this chapter.

The next four chapters cover these styles of collection - and mark-compact collection - in more detail. Chapter 7 introduces generational garbage collection, a paradigm that has proved effective at reducing garbage collection pause times and overall costs in a wide range of applications, and Chapter 8 describes how garbage collection can be finely interleaved with the rest of a computation. Chapters 9 and 10 extend garbage collection to environments in which there is no support from the language compiler, C and C++ respectively. The next chapter of the book discusses a relatively new research area, the interaction of garbage collection with hardware data caches. Finally, Chapter 12 briefly surveys garbage collection for distributed systems.

We have included a summary of issues to consider at the end of each chapter. These summaries are intended to offer guidelines to the reader on the questions that should be answered about the collectors, the client program and the operating system and architecture before a garbage collector is chosen. These questions are designed as prompts to the reader. The summaries are certainly not a substitute for reading the appropriate chapter, and we have not attempted to provide `pat' solutions. Moreover, strategies of garbage collection (such as reference counting, mark-sweep or copying) introduced in earlier chapters are revisited in later ones. The characteristics and performance of naive implementations should not be mistaken for those of state of the art implementations of the same garbage collection strategy. Nevertheless we hope that these summaries will provide, rather than a `cook book', a focus for further analysis.

We should also declare what is missing from the book. The most effortless form of memory management is to do none at run-time. A considerable amount of research has gone into compile-time techniques to discover when objects can be discarded or reused. Most of this work has been theoretic and, as yet, we believe that there has been little evidence of substantial performance gains. We have omitted this material. Some techniques and tricks are language specific. While we have chosen to cover C++ because of its increasing popularity and growing realisation by many of its practitioners that garbage collection is sorely needed, we have concentrated in the main on generally applicable methods. Techniques that are specific to certain styles of programming, for example pure functional programming or logic programming, are only mentioned briefly.

Finally, energetic researchers who trawl through on-line bibliographic databases will discover papers on other garbage collection techniques and issues in their catch. We were intrigued by, but chose to ignore, burying garbage in landfills, incineration and dumping it at sea. The question of public health and garbage collection is also often raised - but language wars are another ball game altogether!

The Bibliography and the World-Wide Web

We mentioned earlier that over a thousand papers have been published on this topic. The bibliography at the end of this book is considerably shorter. However a comprehensive database is available electronically. This bibliography also contains some abstracts and URLs for electronically available papers. Richard Jones will endeavour to keep his bibliography up to date and would be most grateful to receive further entries (preferably in BibTeX format) as well as URLs for existing papers (and any corrections).

We have endeavoured to eradicate any errors from the code fragments presented in the book. While not having the courage to repeat Donald Knuth's offer of cash for errors reported, a list of any errors found will be maintained at this web site. Reports should be sent to Richard Jones either by email, or to

The Computing Laboratory,
University of Kent at Canterbury,
Canterbury,
Kent,
CT2 7NF,
UK.

Acknowledgements

This book would never have been completed with the encouragement, assistance and patience of many people. I would like to thank the Theoretical Computer Science research group at the University of Kent at Canterbury for cheering from the side-lines. In particular I am indebted to Simon Thompson for patiently reading and commenting on drafts of this book, and for all his encouragement. I would also like to thank Hans Boehm, Jacques Cohen, Keith Dimond, and David Turner for their comments and suggestions. I would also like to thank Martin Broom, Tim Hopkins, and Simon Thompson for their advice on how to wrestle LaTeX into submission. The two term study leave granted to me by the University of Kent and visits to the Federal University of Pernambuco, Recife, Brazil (and the caipirinha), funded by the British Council and CNPq-Brazil, were invaluable.

I am also grateful to Rafael Lins who originally conceived the idea for the book and wrote the chapter on Distributed garbage collection, as well as contributing to some other chapters. Acknowledgement must also be paid to all those - too numerous to mention - who have worked on garbage collection over the last thirty-six years.

Finally, and above all, I must thank Robbie, Helen, Kate and William without whose support and forbearance none of this would have been possible. For more than two years you have put up with me occupying the dining room claiming that it was my study; you have forgiven my bad temper; and you have graciously accepted that I could not come out to play. I thank you from the bottom of my heart.

Richard Jones
Canterbury
February 1996

I am most grateful to the many people who contributed in many different ways to make this book possible, and in particular to David Turner, Simon Thompson, Jon Salkild, Rosita Wachenchauzer, Alejandro Martinez and Marcia Correia. I am grateful to the Universidade Federal de Pernambuco, Recife, Brazil, for granting sabbatical leave, funded by CAPES-Brazil, and several visits to the University of Kent, funded by CNPq-Brazil and the British Council. The many friends at Kent made those visits ever so pleasant! I am also grateful for all the support and love I have received from Carmo, Gilka, Maria Teresa, Rilane and Silvia.

Rafael Lins
Recife
February 1996


Top of page the GC page Richard Jones' homepage
[ book ] [ GC page ] [ home ]

Copyright ©1996, Richard Jones