Traces of all objects allocated

For each benchmark, I plot

  1. s/c: graphs comparing the site rental and volumes, cumulatively, of the top 32 sites and partitions, ordered by site rental.
  2. 3D plots of (time of death, lifetime, volume) of the immortal partition and the other top 15 partitions. The yellow "shadows" point back to the time at which the objects were allocated. The boxes are an attempt to group objects that die together.
  3. residency profile, showing the volume of live objects in each cluster at any time during the programs execution. I suggest that it is noticeable how this reveals
    1. clusters are active in different phases,
    2. phases are hierarchical, with some clusters only active in subphases.

Commentary

Partitioning at the natural sample size

Using 'natural' sample size. I.e. since we have used 2000 buckets for our histograms, the sample size is 2000. For each of our benchmarks in the table of links below, the first line uses the natural sample size. The table below summarises the number of partitions and sites required to cover 95% of allocation by space rental and by volume. Only the larger sizes are shown (speed 100 for SPECjvm, large for DaCapo except hsqldb which is the default size.

benchmark s/csr at 95+%vol at 95+%delta
compress6 7 19 14 26 0.05
jess7 20 62 2 118 0.05
raytrace7 10 68 8 159 0.05
db5 7 19 12 24 0.05
javac5 40 97 74 485 0.05
mpegaudio9 15 40 50 142 0.05
mtrt7 9 65 9 163 0.05
jack7 18 54 18 124 0.05
antlr20 13 47 18 291 0.09
bloat11 10 48 5 8100.05
fop21 11 91 26 708 0.20
jython15 15 47 12 92 0.06
pmd9 24 82 22 191 0.06
ps23 7 36 5 105 0.06
hsqldb15 9 36 6 124 0.07
  1. Note that the bucket size for larger benchmarks is larger than for the small ones; this may be a reason why they cluster better at the default 1%, 2000 bucket view.
  2. In all cases, the bucket size is less than 1MB.
  3. We seem to get good coverage for space rental and volume (e.g. better than 95%) for most benchmarks provided we have enough clusters (mostly less than 20, but javac needs 32/33).
  4. Just looking at sites doesn't help except for for compress and db.

Partitioning assuming a GC frequency of no less than 1MB

In most cases, the effect of using 2000 buckets is that the interval between buckets is very small (the s/c graphs about specify the interval as "granularity"). No stop-the-world collector would operate at such a fine grain. For the second line of each benchmark in the table of links below, the sample assumes that the interval between GCs is at least 1MB. Unless the total allocated is very large (>2000MB), this means that we have a smaller sample size, and therefore that our clustering analysis is more tolerant of differences between lifetime distributions. The effect, we believe, is to cluster looking through rather coarser grained glasses (as if we had re-bucked the data). For small inputs (e.g. SPECjvm98 speeds less than 100), the tolerance is much too high, but for speed 100, it seems to makes sense. Here we only give s/c for the large input size, and include all partition/site numbers.

benchmark s/csr at 95+%vol at 95+%delta
compress13 6 19 11 26 0.22
jess9 12 62 2 118 0.14
raytrace13 8 68 3 159 0.19
db11 5 19 10 24 0.25
javac11 13 97 22 485 0.16
jack13 12 54 6 124 0.15
antlr20 13 47 18 291 0.09
bloatN/a: default freq > 1MB
fop21 11 91 26 708 0.20
hsqldb15 9 36 6 124 0.07
jython15 15 47 12 92 0.06
pmd9 24 82 22 191 0.06
ps23 7 36 5 105 0.06
  1. The change only affects the partition values: the site values are not affected.
  2. The site:cluster ratio is mostly higher except for those benchmarks with high allocations for which the sample size change was small, hence delta is unchanged (i.e. most of the DaCapo benchmarks).
  3. There is some improvement in coverage (i.e. fewer partitions needed to cover most of the allocation).

Partitioning assuming a GC frequency of no less than 4MB

For the third line of each benchmark in the table of links below, the sample assumes that the interval between GCs is at least 4MB. This is suitable only for the very largest benchmarks. If a benchmark is too small, it would aggregate all non-immortal sites into the same partition. Thus, the only sites worth looking at at this granularity are those in the table below (chosen to have delta <0.2).

benchmark s/csr at 95+%vol at 95+%delta
antlr_large34 9 47 9 2910.18
bloat_large14 7 48 2 8100.08
hsqldb21 7 36 7 1240.14
jython_large20 12 47 7 920.11
pmd_large14 15 82 14 1910.12
ps31 5 44 3 1010.19
ps_large27 6 36 5 1050.11
  1. The change again only affects the partition values: the site values are not affected.
  2. The site:cluster ratio is again higher.
  3. There is improvement in coverage (i.e. fewer partitions needed to cover most of the allocation).

An appropriate question to ask about partitioning seems to be, to what extend does more aggressive partitioning (i.e. a larger GC frequency) aggregate into one partition sites whose behaviour is 'too' different? Just by looking at the graphs for the large inputs for antlr, bloat, hsqldb, jython, pmd and ps (and ps default), the answer seems to be that they are all either much the same (modulo some reordering of the partitions by space rental) or at least broadly similar (hsqldb, pmd and ps default). It does not seem to be the case that more aggressive clustering conflates different behaviours. One conclusion that might be drawn is that a delta (gap between the cumulative distribution function curves) of less than 0.20 works well.

 

The plots

Here's how to read our 3D plots. Time of death is plotted horizontally (from right 0% to left 100%). Age is plotted from back o% to front 100%. Volume that died is plotted vertically. Note that it is impossible for any point to fall SE of the green line (its age would be greater than its time of death). The plots have been annotated with coloured rectangles that group objects that seem to live and die together, i.e. with opposing corners at (phase_end-max_age,min_age) and (phase_end,max_age).

benchmarkclusters (0=immortal)