Traces of all objects allocated

For each benchmark, I plot

s/c: graphs comparing the site rental and volumes, cumulatively, of the top 32 sites and partitions, ordered by site rental.
3D plots of (time of death, lifetime, volume) of the immortal partition and the other top 15 partitions. The yellow "shadows" point back to the time at which the objects were allocated. The boxes are an attempt to group objects that die together.
residency profile, showing the volume of live objects in each cluster at any time during the programs execution. I suggest that it is noticeable how this reveals
1. clusters are active in different phases,
2. phases are hierarchical, with some clusters only active in subphases.

Commentary

Partitioning at the natural sample size

Using 'natural' sample size. I.e. since we have used 2000 buckets for our histograms, the sample size is 2000. For each of our benchmarks in the table of links below, the first line uses the natural sample size. The table below summarises the number of partitions and sites required to cover 95% of allocation by space rental and by volume. Only the larger sizes are shown (speed 100 for SPECjvm, large for DaCapo except hsqldb which is the default size.

benchmark	s/c	sr at 95+%		vol at 95+%		delta
compress	6	7	19	14	26	0.05
jess	7	20	62	2	118	0.05
raytrace	7	10	68	8	159	0.05
db	5	7	19	12	24	0.05
javac	5	40	97	74	485	0.05
mpegaudio	9	15	40	50	142	0.05
mtrt	7	9	65	9	163	0.05
jack	7	18	54	18	124	0.05
antlr	20	13	47	18	291	0.09
bloat	11	10	48	5	810	0.05
fop	21	11	91	26	708	0.20
jython	15	15	47	12	92	0.06
pmd	9	24	82	22	191	0.06
ps	23	7	36	5	105	0.06
hsqldb	15	9	36	6	124	0.07

Note that the bucket size for larger benchmarks is larger than for the small ones; this may be a reason why they cluster better at the default 1%, 2000 bucket view.
In all cases, the bucket size is less than 1MB.
We seem to get good coverage for space rental and volume (e.g. better than 95%) for most benchmarks provided we have enough clusters (mostly less than 20, but javac needs 32/33).
Just looking at sites doesn't help except for for compress and db.

Partitioning assuming a GC frequency of no less than 1MB

In most cases, the effect of using 2000 buckets is that the interval between buckets is very small (the s/c graphs about specify the interval as "granularity"). No stop-the-world collector would operate at such a fine grain. For the second line of each benchmark in the table of links below, the sample assumes that the interval between GCs is at least 1MB. Unless the total allocated is very large (>2000MB), this means that we have a smaller sample size, and therefore that our clustering analysis is more tolerant of differences between lifetime distributions. The effect, we believe, is to cluster looking through rather coarser grained glasses (as if we had re-bucked the data). For small inputs (e.g. SPECjvm98 speeds less than 100), the tolerance is much too high, but for speed 100, it seems to makes sense. Here we only give s/c for the large input size, and include all partition/site numbers.

benchmark	s/c	sr at 95+%		vol at 95+%		delta
compress	13	6	19	11	26	0.22
jess	9	12	62	2	118	0.14
raytrace	13	8	68	3	159	0.19
db	11	5	19	10	24	0.25
javac	11	13	97	22	485	0.16
jack	13	12	54	6	124	0.15
antlr	20	13	47	18	291	0.09
bloat	N/a: default freq > 1MB
fop	21	11	91	26	708	0.20
hsqldb	15	9	36	6	124	0.07
jython	15	15	47	12	92	0.06
pmd	9	24	82	22	191	0.06
ps	23	7	36	5	105	0.06

The change only affects the partition values: the site values are not affected.
The site:cluster ratio is mostly higher except for those benchmarks with high allocations for which the sample size change was small, hence delta is unchanged (i.e. most of the DaCapo benchmarks).
There is some improvement in coverage (i.e. fewer partitions needed to cover most of the allocation).

Partitioning assuming a GC frequency of no less than 4MB

For the third line of each benchmark in the table of links below, the sample assumes that the interval between GCs is at least 4MB. This is suitable only for the very largest benchmarks. If a benchmark is too small, it would aggregate all non-immortal sites into the same partition. Thus, the only sites worth looking at at this granularity are those in the table below (chosen to have delta <0.2).

benchmark	s/c	sr at 95+%		vol at 95+%		delta
antlr_large	34	9	47	9	291	0.18
bloat_large	14	7	48	2	810	0.08
hsqldb	21	7	36	7	124	0.14
jython_large	20	12	47	7	92	0.11
pmd_large	14	15	82	14	191	0.12
ps	31	5	44	3	101	0.19
ps_large	27	6	36	5	105	0.11

The change again only affects the partition values: the site values are not affected.
The site:cluster ratio is again higher.
There is improvement in coverage (i.e. fewer partitions needed to cover most of the allocation).

An appropriate question to ask about partitioning seems to be, to what extend does more aggressive partitioning (i.e. a larger GC frequency) aggregate into one partition sites whose behaviour is 'too' different? Just by looking at the graphs for the large inputs for antlr, bloat, hsqldb, jython, pmd and ps (and ps default), the answer seems to be that they are all either much the same (modulo some reordering of the partitions by space rental) or at least broadly similar (hsqldb, pmd and ps default). It does not seem to be the case that more aggressive clustering conflates different behaviours. One conclusion that might be drawn is that a delta (gap between the cumulative distribution function curves) of less than 0.20 works well.

The plots

Here's how to read our 3D plots. Time of death is plotted horizontally (from right 0% to left 100%). Age is plotted from back o% to front 100%. Volume that died is plotted vertically. Note that it is impossible for any point to fall SE of the green line (its age would be greater than its time of death). The plots have been annotated with coloured rectangles that group objects that seem to live and die together, i.e. with opposing corners at (phase_end-max_age,min_age) and (phase_end,max_age).

benchmark	clusters (0=immortal)