Ugawa, T., Ritson, C. and Jones, R. (2018). Transactional Sapphire: Lessons in High Performance, On-the-fly Garbage Collection. Transactions on Programming Languages and Systems[Online]40:15:1-15:56. Available at: http://dx.doi.org/10.1145/3226225.
Constructing a high-performance garbage collector is hard. Constructing a fully concurrent 'on-the-fly', compacting collector is much more so. We describe our experience of implementing the Sapphire algorithm as
the first on-the-fly, parallel, replication copying, garbage collector for the Jikes RVM Java virtual machine. In part, we explain our innovations such as copying with hardware and software transactions, on-the-fly management of Java's reference types and simple, yet correct, lock-free management of volatile fields in a
replicating collector. We fully evaluate, for the first time, and using realistic benchmarks, Sapphire's performance and suitability as a low latency collector. An important contribution of this work is a detailed description of our experience of building an on-the-fly copying collector for a complete JVM with some assurance that it is correct.
A key aspect of this is model checking of critical components of this complicated and highly concurrent system.
Blackburn, S. et al. (2016). The Truth, the Whole Truth, and Nothing but the Truth: A Pragmatic Guide to Assessing Empirical Evaluations. Transactions on Programming Languages and Systems (TOPLAS)38.
An unsound claim can misdirect a field, encouraging the pursuit of unworthy ideas and the abandonment of promising ideas. An inadequate description of a claim can make it difficult to reason about the claim, for example to determine whether the claim is sound. Many practitioners will acknowledge the threat of un- sound claims or inadequate descriptions of claims to their field. We believe that this situation is exacerbated and even encouraged by the lack of a systematic approach to exploring, exposing, and addressing the source of unsound claims and poor exposition.
This paper proposes a framework that identifies three sins of reasoning that lead to unsound claims and two sins of exposition that lead to poorly described claims. Sins of exposition obfuscate the objective of determining whether or not a claim is sound, while sins of reasoning lead directly to unsound claims.
Our framework provides practitioners with a principled way of critiquing the integrity of their own work and the work of others. We hope that this will help individuals conduct better science and encourage a cultural shift in our research community to identify and promulgate sound claims.
Jones, R. and Blackburn, S. (2008). International Symposium on Memory Management (ISMM 2008) summary. ACM SIGPLAN Notices[Online]43:12-14. Available at: http://dx.doi.org/10.1145/1416216.1416220.
The Java RMI collector is arguably the most widely used distributed garbage collector. Its distributed reference listing algorithm was introduced by Birrell in the context of Network Objects, where the description was informal and heavily biased toward implementation. In this paper, we formalise this algorithm in an implementation-independent manner, which allows us to clarify weaknesses of the initial presentation. In particular, we discover cases critical to the correctness of the algorithm that are not accounted for by Birrell. We use our formalisation to derive an invariant-based proof of correctness of the algorithm that avoids notoriously difficult temporal reasoning. Furthermore, we offer a novel graphical representation of the state transition diagram, which we use to provide intuitive explanations of the algorithm and to investigate its tolerance to faults in a systematic manner. Finally, we examine how the algorithm may be optimised, either by placing constraints on message channels or by tightening the coupling between application program and distributed garbage collector.
Hartel, P. et al. (1996). Benchmarking Implementations of Functional Languages with `Pseudoknot', a Float-Intensive Benchmark. Journal of Functional Programming6:621-655.
Over 25 implementations of different functional languages are benchmarked using the same program, a floating-point intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation. With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the then current GNU C compiler (GCC version 2.5.8). Compilers that generate C or Lisp are often slower than those that generate native code directly: the cost of compiling the intermediate form is normally a large fraction of the total compilation time. There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implementations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations. The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of `typical' applications. Secondly, the compilers vary in the kind and level of optimisations offered, so the effort required to obtain an optimal version of the program is similarly varied.
Brown, P. and Jones, R. (1992). Marking EP coursework using electronic communication. Electronic Publishing: Origination, Dissemination and Design5.
Jones, R. (1995). Introducing More Systematic Quality Assurance Arrangements for Course Approval and Review. in:Middlehurst, R. ed.Managing for Quality: Stories and Strategies.Higher Education Quality Council, pp. 15-17.
Measuring performance and quantifying a performance change are core evaluation techniques in programming language and systems research. Out of 122 recent scientific papers published at PLDI, ASPLOS, ISMM, TOPLAS, and TACO, as many as 65 included experimental evaluation that quantified a performance change using a ratio of execution times. Unfortunately, few of these papers evaluated their results with the level of rigour that has come to be expected in other experimental sciences. The uncertainty of measured results was largely ignored. Scarcely any of the papers mentioned uncertainty in the ratio of the mean execution times, and most did not even mention uncertainty in the two means themselves. Furthermore, most of the papers failed to address the non-deterministic execution of computer programs (caused by factors such as memory placement, for example), and none addressed non-deterministic compilation (when a compiler creates different binaries from the same sources, which differ in performance, for example again because of impact on memory placement). It turns out that the statistical methods presented in the computer systems performance evaluation literature for the design and summary of experiments do not readily allow this either. This poses a hazard to the repeatability, reproducibility and even validity of quantitative results. Inspired by statistical methods used in other fields of science, and building on results in statistics that did not make it to introductory textbooks, we present a statistical model that allows us both to quantify uncertainty in the ratio of (execution time) means and to design experiments with a rigorous treatment of those multiple sources of non-determinism that might impact measured performance. Better still, under our framework summaries can be as simple as ''system A is faster than system B by 5.5 ± 2.5, with 95 confidence'', a more natural statement than those derived from typical current practice, which are often misinterpreted.
Hellyer, L., Jones, R. and Hosking, A. (2010). The Locality of Concurrent Write Barriers (extended version). Available at: http://www.cs.kent.ac.uk/pubs/2010/3011.
Concurrent and incremental collectors require barriers to ensure correct synchronisation between mutator and collector. The overheads imposed by particular barriers on particular systems have been widely studied. Somewhat fewer studies have also compared barriers in terms of their termination properties or the volume of floating garbage they generate. Until now, the consequences for locality of different barrier choices has not been studied, although locality will be of increasing importance for emerging architectures. This report expands upon our ISMM 2010 paper and provides a study of the locality of concurrent write barriers, independent of the processor architecture, virtual machine, compiler or garbage collection algorithm.
Jones, R. and King, A. (2004). Collecting the garbage without blocking the traffic. University of Kent. Available at: http://www.cs.kent.ac.uk/pubs/2004/1970/.
Java is an increasingly common platform for server-side applications. Such applications are usually long-running, heavily multi-threaded, require very large heaps, executed on multiprocessors, load classes dynamically and make stringent demands of garbage collector performance. Synchronisation of all application threads in order to perform a collection is shown to be a significant bottleneck but current methods fail to solve this issue.
We show how a combination of a new static analysis and novel garbage collector framework can address this issue by allowing independent collection of thread-local heaps. In contrast to previous work, our analysis can classify objects even in the presence of incomplete knowledge; our system is safe in the presence of dynamic class loading; it requires neither synchronisation for nor locks during thread-local collections; and it does not use a write-barrier that may do an unbounded work. Finally, our analysis is sufficiently fast to permit its integration into a high-performance, production-quality virtual machine.
Jones, R. (2003). DO garbage collection. Kent University. Available at: http://www.cs.kent.ac.uk/pubs/2003/1587.
GCspy is an architectural framework for the collection, transmission, storage and replay of memory management behaviour. It makes new contributions to the understanding of the dynamic memory behaviour of programming languages (and especially object-oriented languages that make heavy demands on the performance of memory managers). GCspy's architecture allows easy incorporation into any memory management system: it is not limited to garbage-collected languages. It requires only small changes to the system in which it is incorporated but provides a simple to use yet powerful data-gathering API. GCspy scales to allow very large heaps to be visualised effectively and efficiently. It allows already-running, local or remote, systems to be visualised and those systems to run at full speed outside the points at which data is gathered. GCspy's visualisation tool presents this information in a number of novel ways. Deep understanding of program behaviour is essential to the design of the next generation of garbage collectors and explicit allocators. Until now no satisfactory tools have been available to assist the implementer in gaining an understanding of heap behaviour. GCspy has been demonstrated to be a practical solution to this dilemma. It has been used to analyse production Java virtual machines running applications of realistic size. Its use has revealed important insights into the interaction between application program and JVM and has led to the development of better garbage collectors.
Rodrigues, H. and Jones, R. (1997). Cyclic Distributed Garbage Collection with Group Merger. University of Kent at Canterbury. Available at: http://www.cs.kent.ac.uk/pubs/1997/548.
This paper presents a new algorithm for distributed garbage collection and outlines its implementation within the Network Objects system. The algorithm is based on a reference listing scheme, which is augmented by partial tracing in order to collect distributed garbage cycles. Our collector is designed to be flexible, allowing efficiency, expediency and fault-tolerance to be traded against completeness. Processes may be dynamically organised into groups, according to appropriate heuristics, in order to reclaim distributed garbage cycles. Unlike previous group-based algorithms, multiple concurrent distributed garbage collections that span groups are supported: when two collection meet they may either merge, overlap or retreat. The algorithm places no overhead on local collectors and suspends local mutators only briefly. Partial tracing of the distributed graph involves only objects thought to be part of a garbage cycle: no collaboration with other processes is required.
Thomas, S. and Jones, R. (1994). Garbage Collecting Shared Environment Closure Reducers. Computing Laboratory, University of Kent at Canterbury. Available at: http://dx.doi.org/10.1016/0020-0190(95)00131-U.
Shared environment closure reducers such as Fairbairn and Wray's TIM incur a comparatively low cost when creating a suspension, and so provide an elegant method for implementing lazy functional evaluation. However, comparatively little attention has been given to the problems involved in identifying which portions of a shared environment are needed (and ignoring those which are not) during a garbage collection. Proper consideration of this issue has subtle consequences when implementing a storage manager in a TIM-like system. We describe the problem and illustrate the negative consequences of ignoring it. We go on to describe a solution in which the compiler determines statically which portions of that code's environment are required for each piece of code it generates, and emits information to assist the run-time storage manager to scavenge environments selectively. We also describe a technique for expressing this information directly as executable code, and demonstrate that a garbage collector implemented in this way can perform significantly better than an equivalent, table-driven interpretive collector.
Jones, R. and Utting, I. (1993). Teaching Electronic Publishing: Learning Software Engineering. University of Kent, Canterbury, UK. Available at: http://www.cs.kent.ac.uk/pubs/1993/98/index.html.
This paper describes an experience of teaching electronic publishing to computer scientists. It takes the novel position of using electronic publishing concepts to support computer science education, rather than the more usual reverse.
Jones, R. and Lins, R. (1992). Cyclic Weighted Reference Counting without Delay. UKC. Available at: http://www.cs.kent.ac.uk/pubs/1992/122/index.html.
Weighted Reference Counting is a low communication distributed storage reclamation scheme for loosely-couple multiprocessors. The algorithm we present herein extends weighted reference counting to allow the collection of cyclic data structures. To do so, the algorithm identifies candidate objects that may be part of cycles and performs a tricolour mark-scan on their subgraph in a lazy manner to discover whether the subgraph is still in use. The algorithm is concurrent in the sense that multiple useful computation processes and garbage collection processes can be performed simultaneously.
Lins, R. and Jones, R. (1991). Cyclic Weighted Reference Counting. UKC.
Mercier, D., Chawdhary, A. and Jones, R. (2017). dynStruct: An automatic reverse engineering tool for structure recovery and memory use analysis. in:IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).Institute of Electrical and Electronics Engineers, pp. 497-501. Available at: http://dx.doi.org/10.1109/SANER.2017.7884661.
dynStruct is an open source structure recovery tool for ×86 binaries. It uses dynamic binary instrumentation to record information about memory accesses, which is then processed off-line to recover structures created and used by the binary. It provides a powerful web interface which not only displays the raw data and the recovered structures but also allows this information to be explored and manually edited. dynStruct is an effective tool for analyzing programs as complex as emacs.
Ritson, C., Ugawa, T. and Jones, R. (2014). Exploring Garbage Collection with Haswell Hardware Transactional Memory. in:ACM/SIGPLAN International Symposium on Memory Management (ISMM14).New York: ACM, pp. 105-115. Available at: http://dx.doi.org/10.1145/2602988.2602992.
Intel's latest processor microarchitecture, Haswell, adds support for a restricted form of transactional memory to the x86 programming model. We explore how this can be applied to three garbage collection scenarios in Jikes RVM: parallel copying, concurrent copying and bitmap marking. We demonstrate gains in concurrent copying speed over traditional synchronisation mechanisms of 48–101%. We also show how similar but portable performance gains can be achieved through software transactional memory techniques. We identify the architectural overhead of capturing sufficient work for transactional execution as a major stumbling block to the effective use of transactions in the other scenarios.
Ugawa, T., Jones, R. and Ritson, C. (2014). Reference Object Processing in On-The-Fly Garbage Collection. in:ACM/SIGPLAN International Symposium on Memory Management (ISMM14).New York: ACM, pp. 59-69. Available at: http://dx.doi.org/10.1145/2602988.2602991.
Most proposals for on-the-fly garbage collection ignore the ques- tion of Java's weak and other reference types. However, we show that reference types are heavily used in DaCapo benchmarks. Of the few collectors that do address this issue, most block mutators, either globally or individually, while processing reference types. We introduce a new framework for processing reference types on- the-fly in Jikes RVM. Our framework supports both insertion and deletion write barriers. We have model checked our algorithm and incorporated it in our new implementation of the Sapphire on-the- fly collector. Using a deletion barrier, we process references while mutators are running in less than three times the time that previous approaches take while mutators are halted; our overall execution times are no worse, and often better.
Ugawa, T., Jones, R. and Ritson, C. (2014). An On-The-Fly Copying Garbage Collection Framework for Jikes RVM. in:12th Asian Symposium on Programming Languages and Systems.
We propose a new, principled approach to adaptive heap sizing based on control theory. We review current state-of-the-art heap sizing mechanisms, as deployed in Jikes RVM and HotSpot. We then formulate heap sizing as a control problem, apply and tune a standard controller algorithm, and evaluate its performance on a set of well-known benchmarks. We find our controller adapts the heap size more responsively than existing mechanisms. This responsiveness allows tighter virtual machine memory footprints while preserving target application throughput, which is ideal for both embedded and utility computing domains. In short, we argue that formal, systematic approaches to memory management should be replacing ad-hoc heuristics as the discipline matures.
Kalibera, T. and Jones, R. (2013). Rigorous Benchmarking in Reasonable Time. in:ACM SIGPLAN International Symposium on Memory Management (ISMM 2013).New York: ACM, pp. 63-74.
Experimental evaluation is key to systems research. Because modern systems are complex and non-deterministic, good experimental methodology demands that researchers account for uncertainty. To obtain valid results, they are expected to run many iterations of benchmarks, invoke virtual machines (VMs) several times, or even rebuild VM or benchmark binaries more than once. All this repetition costs time to complete experiments. Currently, many evaluations give up on sufficient repetition or rigorous statistical methods, or even run benchmarks only in training sizes. The results reported often lack proper variation estimates and, when a small difference between two systems is reported, some are simply unreliable.
In contrast, we provide a statistically rigorous methodology for repetition and summarising results that makes efficient use of experimentation time. Time efficiency comes from two key observations. First, a given benchmark on a given platform is typically prone to much less non-determinism than the common worst-case of published corner-case studies. Second, repetition is most needed where most uncertainty arises (whether between builds, between executions or between iterations). We capture experimentation cost with a novel mathematical model, which we use to identify the number of repetitions at each level of an experiment necessary and sufficient to obtain a given level of precision.
We present our methodology as a cookbook that guides researchers on the number of repetitions they should run to obtain reliable results. We also show how to present results with an effect size confidence interval. As an example, we show how to use our methodology to conduct throughput experiments with the DaCapo and SPEC CPU benchmarks on three recent platforms.
NOTE: this version corrects the ISMM 2013 version
Kalibera, T. et al. (2012). A Black-box Approach to Understanding Concurrency in DaCapo. in:Dwyer, M. and Leavens, G. T. eds.Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA).Tucson, AZ, USA: ACM. Available at: http://www.cs.kent.ac.uk/pubs/2012/3246.
Increasing levels of hardware parallelism are one of the main challenges for programmers and implementers of managed runtimes. Any concurrency or scalability improvements must be evaluated experimentally. However, application benchmarks available today may not reflect the highly concurrent applications we anticipate in the future. They may also behave in ways that VM developers do not expect. We provide a set of platform independent concurrency-related metrics and an in-depth observational study of current state of the art benchmarks, discovering how concurrent they really are, how they scale the work and how they synchronise and communicate via shared memory.
Singer, J. and Jones, R. (2011). Economic Utility Theory for Memory Management Optimization. in:Rogers, I. ed.Proceedings of the workshop on Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems.ACM, pp. 182-196. Available at: http://www.cs.kent.ac.uk/pubs/2011/3156.
In this position paper, we examine how economic theory can be applied to memory management. We observe the correspondence between the economic notion of a consumer and an instance of a virtual machine running a single program in an isolated heap. Economic resource consumption corresponds to the virtual machine requesting and receiving increased amounts of heap memory from the underlying operating system. As more memory is allocated to a virtual machine's heap, there is additional benefit (cf. economic utility) from the extra resource. We also discuss production and cost functions, which might assist in efficient memory allocation between multiple virtual machines that are competing for a fixed amount of shared system memory.
Kalibera, T. and Jones, R. (2011). Handles revisited: optimising performance and memory costs in a real-time collector. in:Proceedings of the 10th International Symposium on Memory Management (ISMM).ACM, pp. 182-196. Available at: http://www.cs.kent.ac.uk/pubs/2011/3122.
Compacting garbage collectors must update all references to objects they move. Updating is a lengthy operation but the updates must be transparent to the mutator. The consequence is that no space can be reclaimed until all references have been updated which, in a real-time collector, must be done incrementally. One solution is to replace direct references to objects with handles. Handles offer several advantages to a real-time collector. They eliminate the updating problem. They allow immediate reuse of the space used by evacuated objects. They incur no copy reserve overhead. However, the execution time overhead of handles has led to them being abandoned by most modern systems. We re-examine this decision in the context of real-time garbage collection, for which several systems with handles have appeared recently. We provide the first thorough study of the overheads of handles, based on an optimised implementation of different handle designs within Ovm's Minuteman real-time collector. We find that with a good set of optimisations handles are not very expensive. We obtained zero overhead over the widely used Brooks-style compacting collector (1.6 and 3.1 on two other platforms) and 9 increase in memory usage. Our optimisations are particularly applicable to mark-compact collectors, but may also be useful to other collectors.
Singer, J. and Jones, R. (2010). The Economics of Garbage Collection. in:Vitek, J. and Lea, D. eds.Proceedings of the 2010 International Symposium on Memory Management.Toronto, Canada: ACM, pp. 182-196. Available at: http://www.cs.kent.ac.uk/pubs/2010/3013.
This paper argues that economic theory can improve our understanding of memory management. We introduce the allocation curve, as an analogue of the demand curve from microeconomics. An allocation curve for a program characterises how the amount of garbage collection activity required during its execution varies in relation to the heap size associated with that program. The standard treatment of microeconomic demand curves (shifts and elasticity) can be applied directly and intuitively to our new allocation curves. As an application of this new theory, we show how allocation elasticity can be used to control the heap growth rate for variable sized heaps in Jikes RVM.
Hellyer, L., Jones, R. and Hosking, A. (2010). The Locality of Concurrent Write Barriers. in:Vitek, J. and Lea, D. eds.Proceedings of the 2010 International Symposium on Memory Management.Toronto, Canada: ACM, pp. 182-196. Available at: http://www.cs.kent.ac.uk/pubs/2010/3012.
Concurrent and incremental collectors require barriers to ensure correct synchronisation between mutator and collector. The overheads imposed by particular barriers on particular systems have been widely studied. Somewhat fewer studies have also compared barriers in terms of their termination properties or the volume of floating garbage they generate. Until now, the consequences for locality of different barrier choices has not been studied, although locality will be of increasing importance for emerging architectures. This paper provides a study of the locality of concurrent write barriers, independent of the processor architecture, virtual machine, compiler or garbage collection algorithm.
Singer, J. et al. (2008). An Information Theoretic Evaluation of Software Metrics for Object Lifetime Prediction. in:2nd Workshop on Statistical and Machine learning approaches to ARchitectures and compilaTion (SMART'08).Goteborg, Sweden.
Accurate object lifetime prediction can be exploited by allocators to improve the performance of generational garbage collection by placing immortal or long-lived objects directly into immortal or old generations. Object-oriented software metrics are emerging as viable indicators for object lifetime prediction. This paper studies the correlation of various metrics with object lifetimes. However, to date most studies have been empirical and have not provided any information theoretic underpinning. We use the information theoretic calculation of normalized mutual information to measure correlation. We assess which metrics are most useful for prediction and construct some simple yet accurate object lifetime predictors.
Jones, R. and Ryder, C. (2008). A Study of Java Demographics. in:Blackburn, S. ed.Proceedings of the 2008 International Symposium on Memory Management (ISMM'08).Tucson, AZ: ACM Press, pp. 121-130.
Researchers have long strived to exploit program behaviour in order to improve garbage collection efficiency. For example, by using a simple heuristic, generational GC manages short-lived objects well, although longer-lived objects will still be promoted to an older generation and may be processed repeatedly thereafter. In this paper, we provide a detailed study of Java object lifetimes which reveals a richer landscape than the generational view offers. Allocation site has been claimed to be a good predictor for object lifetime, but we show that object lifetime can be categorised more precisely than 'short-lived/long-lived/immortal'. We show that (i) sites allocate objects with lifetimes in only a small number of narrow ranges, and (ii) sites cluster strongly with respect to the lifetime distributions of the objects they allocate. Furthermore, (iii) these clusterings are robust against the size of the input given to the program and (iv) are likely to allocate objects that are live only in particular phases of the program's execution. Finally, we show that, in contrast to previous studies, (v) allocation site alone is not always sufficient as a predictor of object lifetime distribution but one further level of stack context suffices.
Jones, R. (2007). Dynamic Memory Management: Challenges for Today and Tomorrow. in:International Lisp Conference.Cambridge: Association of Lisp Users, pp. 115-124.
Garbage collection (GC) is a key component of almost all modern programming languages. The advent of conventional object-oriented languages supported by managed run-times (e.g. Java, C# and even Managed C++) has brought GC into the mainstream and, as memory manager performance is critical for many large applications, brought GC to the attention of programmers outside its traditional functional programming language community. In this talk, I shall start by reviewing how GC got to where it is today, why it is desirable, what performance you might reasonably expect and I shall outline the directions in which GC research is moving. In particular, I'll look at some of the challenges facing modern GC, in contexts ranging from GC for high-performance, multiprocessor systems to GC for real-time systems and limited devices, from better integrating with its operating environment to supporting specific applications. I shall speculate wildly on future directions for research.
Marion, S., Jones, R. and Ryder, C. (2007). Decrypting The Java Gene Pool: Predicting Objects' Lifetimes with Micro-patterns. in:International Symposium on Memory Management (ISMM07).Montreal, Canada: ACM, pp. 67-78.
Pretenuring long-lived and immortal objects into infrequently or never collected regions reduces garbage collection costs significantly. However, extant approaches either require computationally expensive, application-specific, off-line profiling, or consider only allocation sites common to all programs, i.e. invoked by the virtual machine rather than application programs. In contrast, we show how a simple program analysis, combined with an object lifetime knowledge bank, can be exploited to match both runtime system and application program structure with object lifetimes. The complexity of the analysis is linear in the size of the program, so need not be run ahead of time. We obtain performance gains between 6-77% in GC time against a generational copying collector for several SPEC jvm98 programs.
Jones, R. and Ryder, C. (2006). Garbage Collection Should Be Lifetime Aware. in:Zendra, O. ed.International Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS'2006).
We argue that garbage collection should be more closely tied to object demographics. We show that this behaviour is sufficiently distinctive to make exploitation feasible and describe a novel GC framework that exploits object lifetime analysis yet tolerates imprecision. We argue for future collectors based on combinations of approximate analyses and dynamic sampling.
Jones, R. and King, A. (2005). A fast analysis for thread-local garbage collection with dynamic class loading. in:Source Code Analysis and Manipulation, 2005.IEEE Computer Society, pp. 129-138. Available at: http://dx.doi.org/10.1109/SCAM.2005.1.
Long-running, heavily multi-threaded, Java server applications make stringent demands of garbage collector (GC) performance. Synchronisation of all application threads before garbage collection is a significant bottleneck for JVMs that use native threads. We present a new static analysis and a novel GC framework designed to address this issue by allowing independent collection of thread-local heaps. In contrast to previous work, our solution safely classifies objects even in the presence of dynamic class loading, requires neither write-barriers that may do unbounded work, nor synchronisation, nor locks during thread-local collections; our analysis is sufficiently fast to permit its integration into a high-performance, production-quality virtual machine.
Printezis, T. and Jones, R. (2002). GCspy: An Adaptable Heap Visualisation Framework. in:Proceedings of OOPSLA'02 ACM Conference on Object-Oriented Systems, Languages and Applications.Seattle, WA.: ACM Press, pp. 343-358. Available at: http://dx.doi.org/10.1145/582419.582451.
GCspy is an architectural framework for the collection, transmission, storage and replay of memory management behaviour. It makes new contributions to the understanding of the dynamic memory behaviour of programming languages (and especially object-oriented languages that make heavy demands on the performance of memory managers). GCspy's architecture allows easy incorporation into any memory management system: it is not limited to garbage-collected languages. It requires only small changes to the system in which it is incorporated but provides a simple to use yet powerful data-gathering API. GCspy scales to allow very large heaps to be visualised effectively and efficiently. It allows already-running, local or remote systems to be visualised and those systems to run at full speed outside the points at which data is gathered. GCspy's visualisation tool presents this information in a number of novel ways. Deep understanding of program behaviour is essential to the design of the next generation of garbage collectors and explicit allocators. Until now, no satisfactory tools have been available to assist the implementer in gaining an understanding of heap behaviour. GCspy has been demonstrated to be a practical solution to this dilemma. It has been used to analyse production Java virtual machines running applications of realistic sizes. Its use has revealed important insights into the interaction between application program and JVM and has led to the development of better garbage collectors.
Blackburn, S. et al. (2002). Beltway: Getting Around Garbage Collection Gridlock. in:Hendren, L. J. ed.Proceedings of PLDI'02 Programming Language Design and Implementation.Berlin: Association for Computing Machinery, pp. 153-164.
We present the design and implementation of a new garbage collection framework that significantly generalizes existing copying collectors. The Beltway framework exploits and separates object age and incrementality. It groups objects in one or more increments on queues called belts, collects belts independently, and collects increments on a belt in first-in-first-out order. We show that Beltway configurations, selected by command line options, act and perform the same as semi-space, generational, and older-first collectors, and encompass all previous copying collectors of which we are aware. The increasing reliance on garbage collected languages such as Java requires that the collector perform well. We show that the generality of Beltway enables us to design and implement new collectors that are robust to variations in heap size and improve total execution time over the best generational copying collectors of which we are aware by up to 40%, and on average by 5 to 10%, for small to moderate heap sizes. New garbage collection algorithms are rare, and yet we define not just one, but a new family of collectors that subsumes previous work. This generality enables us to explore a larger design space and build better collectors.
Jones, R. (2000). Memory Management Session Overview. in:Kirby, G. N. C., Dearle, A. and Sjoberg, D. I. K. eds.Persistent Object Systems: Design, Implementation, and Use.Lillehammer, Norway: Springer, pp. 84-86.
Chilimbi, T., Jones, R. and Zorn., B. (2000). Designing a Trace Format for Heap Allocation Events. in:Hosking, T. ed.ISMM2000 International Symposium on Memory Management.Minneapolis, MN: ACM Press., pp. 35-49.
Dynamic storage allocation continues to play an important role in the performance and correctness of systems ranging from user productivity software to high-performance servers. While algorithms for dynamic storage allocation have been studied for decades, much of the literature is based on measuring the performance of benchmark programs unrepresentative of many important allocation-intensive workloads. Furthermore, to date no standard has emerged or been proposed for publishing and exchanging representative allocation workloads. In this paper, we describe a preliminary design of a trace format for such workloads and investigate its effectiveness at representing large allocation traces. Our proposal allows for a flexible encoding of information in the trace to achieve greater compression. We evaluate our preliminary design in two dimensions. First, we measure how effective these encodings are at reducing trace size. Second we consider how a meta-level specification language could be used to describe such formats and to generate trace readers and writers automatically.
Jones, R., Beckett, D. and Fincher, S. (2000). Meeting Diverse User Needs: Implementation of a Departmental Information Strategy. in:Franklin, S. D. and Strenski, E. eds.IFIP TC3 WG3.2/3.6 International Working Conference on Building University Electronic Educational Environments.Kluwer, pp. 125-139.
Any higher education department must provide a variety of information to several disparate audiences, recognising their diverse needs. Student expectations of learning support structures, institutional needs for quality assurance and enhancement and the demands of accountability from funding agencies continue to grow. The widespread use of IT, both at home and in schools, has led students to expect a rich networked computing environment; the widened pattern of access to higher education has simultaneously increased the level of support needed and limited access to the campus for many of them. Managing these challenges and changes demands flexibility and auditability: we must be able both to respond to new demands and opportunities and also to account for our actions. At the same time, government-imposed efficiency gains [sic] reduce the resources available to implement change. Technology can provide the mechanism to meet these information needs, educational challenges and to manage changing social and political demands. In this paper we offer pointers abstracted and generalised from our experience for organisational change in similar environments. We explain the development of a large web-based information system (CSWeb) at the Department of Computer Science at the University of Kent at Canterbury designed to meet these challenges. Our goal has been not only to improve our site but also to embed it at the heart of the department's culture. We describe the technical innovations required to manage this large-scale site and that permit the flexibility needed to respond to change. Finally we suggest measures by which the success of enterprises of this kind may be measured. We analyse quantitative data that not only includes page hit counts but also reports users' patterns of activity through the site. Staff acceptance of the system is also measured by the degree to which they have provided further content to the site. Evidence of changes in student and staff expectations, behaviour and study patterns is presented.
Rodrigues, H. and Jones, R. (1998). Cyclic Distributed Garbage Collection with Group Merger. in:Jul, E. ed.Proceedings of 12th European Conference on Object-Oriented Programming, ECOOP98.Brussels: Springer, pp. 249-273.
This paper presents a new algorithm for distributed garbage collection and outlines its implementation within the Network Objects system. It supersedes the earlier version, /pubs/1997/548. The algorithm is based on a reference listing scheme augmented by partial tracing in order to collect distributed garbage cycles. Our collector is designed to be flexible thereby allowing efficiency, expediency and fault-tolerance to be traded against completeness. Processes may be dynamically organised into groups, according to appropriate heuristics, in order to reclaim distributed garbage cycles. Unlike previous group-based algorithms, multiple concurrent distributed garbage collections that span groups are supported: when two collections meet they may either merge, overlap or retreat. The algorithm places no overhead on local collectors and suspends local mutators only briefly. Partial tracing of the distributed graph involves only objects thought to be part of a garbage cycle: no collaboration with other processes is required.
Rodrigues, H. and Jones, R. (1996). A Cyclic Distributed Garbage Collector for Network Objects. in:Babaoglu, Ö. and Marzullo, K. eds.Tenth International Workshop on Distributed Algorithms WDAG'96.Bologna, Italy: Springer, pp. 123-140.
This paper presents an algorithm for distributed garbage collection and outlines its implementation within the Network Objects system. The algorithm is based on a reference listing scheme, which is augmented by partial tracing in order to collect distributed garbage cycles. Processes may be dynamically organised into groups, according to appropriate heuristics, to reclaim distributed garbage cycles. The algorithm places no overhead on local collectors and suspends local mutators only briefly. Partial tracing of the distributed graph involves only objects thought to be part of a garbage cycle: no collaboration with other processes is required. The algorithm offers considerable flexibility, allowing expediency and fault-tolerance to be traded against completeness.
Bowman, H., Derrick, J. and Jones, R. (1994). Modelling Garbage Collection Algorithms --- Extend abstract. in:Proceedings of Principles of Distributed Computing'94.
We show how abstract requirements of garbage collection can be captured using temporal logic. The temporal logic specification can then be used as a basis for process algebra specifications which can involve varying amounts of parallelism. We present two simple CCS specifications as an example, followed by a more complex specification of the cyclic reference counting algorithm. The verification of such algorithms is then briefly discussed.
Jones, R. and Utting, I. (1994). Teaching Electronic Publishing: Learning Software Engineering. in:Teaching Electronic Publishing.pp. 71-83.
Weighted Reference Counting is a low communication distributed storage reclamation scheme for loosely-couple multiprocessors. The algorithm we present herein extends weighted reference counting to allow the collection of cyclic data structures. To do so, the algorithm identifies candidate objects that may be part of cycles and performs a tricolour mark-scan on their subgraph in a lazy manner to discover whether the subgraph is still in use. The algorithm is concurrent in the sense that multiple useful computation processes and garbage collection processes can be performed simultaneously. Keywords: Memory management, Distributed memory, Reference counting, Garbage collection. Introduction Computation on distributed systems involving several processors is already a reality. In a distributed multiprocessor system each processor is responsible for allocating and reclaiming structures residing in its local memory; interprocessor communication is far less efficient than local memory access...
Jones, R. and Lins, R. (1993). Cyclic Weighted Reference Counting without Delay. in:Proceedings of PARLE'93.pp. 712-715.