This page describes the phases so far completed within the CXXR project to
refactor the R engine into C++. Each phase is placed within the Subversion
tags directory, with a name of the form 0.00-2.5.0,
where 0.00 indicates the phase, and 2.5.0 indicates
the R release to which that phase is intended to correspond.
0.00-2.5.0In this phase all .cpp files within src/main are
renamed to .cpp, with the following exceptions:
complex.c: This file uses the C99 complex types, which are
not (under the current C++ standard) understood by a C++ compiler;gram.c: This file is automatically generated by
yacc/bison;regex.c: The source of this file is very insistent that it
is C, not C++: it gives a #warning if you attempt to compile
it with a C++ compiler.(Subsequently, RNG.c was also reverted to C, to respect Knuth's
copyright statement.)
The result of this phase does not build correctly; however, it is useful as a baseline for seeing the subsequent changes.
0.01-2.5.0Make such changes to the result of Phase 0 to enable the .cpp
files to compile without warning using -Wall with
gcc-4.1.3, retaining C linkage conventions for everything defined
in .h files. Ensure that the whole of R will build correctly and
pass make check.
A desirable side effect of enforcing C linkage was that the linkage editor
picked up several instances where the source file implementing a function
failed to #include the appropriate header file, and consequently
generated a function with C++ linkage: see below.
This needed to address the following issues:
Rboolean is different from C++ bool.
Rboolean is an enumeration with elements FALSE=0
and TRUE=1; bool is a primitive type, with values
false and true. (Also, there are
#defines of FALSE to 0 and TRUE to 1 lurking
around in the R code, just to confuse matters.) In particular an
Rboolean is a different size from a bool. It was
necessary to introduce many explicit conversions from bool
(resulting in C++ from evaluating Boolean expressions) or integer types to
Rboolean.
In connection with this, defined a macro RBOOL(x) within
Rinlinedfuns.h expands to x in C and
Rboolean(x) in C++.
class, new,
private and this were used as identifiers; these
had to be renamed, e.g. class changed to
connclass.connections.cpp, a
void* was implicitly converted to another type of pointer.
These conversions were made explicit, and flagged
/*CCAST*/.datetime.cpp and memory.cpp used statements of
the form i -= d; where i is of integer type and
d is an expression evaluating to a floating point type. This
was converted to the form i = int(i - (d)); to avoid a
compiler warning. This interpretation complies with sec. 6.5.12.2 of the
C99 standard ISO:IEC
9899:1999.NewDevDesc defined in
GraphicsDevice.h contains a number of pointers to functions as
members, and the types of these functions were specified without giving the
number and types of the function arguments. This was rectified. It was also
necessary to give this structure a tag (_NewDevDesc) because
most of these functions included a pointer to a NewDevDesc
among their arguments.R_ext/GraphicsEngine.h, in particular the definition of
R_GE_context, into a new header file
R_ext/GraphicsContext.h, to avoid reciprocal dependencies
between GraphicsEngine.h and
GraphicsDevice.h.CCODE, defined in
Defn.h, was redefined to make the number and type of its
arguments explicit, as follows:
typedef SEXP (*CCODE)(SEXP, SEXP, SEXP, SEXP);
__MAIN__ is defined, libextern.h
#defined extern to the empty string, which could
play havoc with the extern "C" used in C++ to enforce C-style
linkage. This #define was commented out, and instead a new
macro extern1 was #defined within
Defn.h.reinterpret_casts in various
places in memory.cpp, scan.cpp,
serialize.cpp and vfonts.cpp. (In future it is
the intention to get rid of as many of these as possible, as well as
getting rid of all C-style casts.)Defn.h, the whole declaration extern FUNTAB
R_FunTab[]; was made #ifndef __R_Names__, not just the
word extern.const pointer. We can expect much
more of this later, but this may have been premature.sysutils.cpp (conditionally) contained an
extern declaration of environ; the compiler
considered this to have C++ linkage, conflicting with the C-linkage
definition in unistd.h (subsequently #included
into sysutils.cpp). This extern declaration has
been itself replaced by a (conditional) #include of
unistd.h.eval.cpp,
format.cpp, memory.cpp,
platform.cpp, printutils.cpp, and
library/methods/src/methods_list_dispatch.c; they were
commented out, and flagged with the comment "Use header files!". Needed
prototypes that didn't appear in any header file were generally placed at
the end of Defn.h.
A particularly obscure example of this kind concerns
R_CHAR. This is declared as a pointer to a function in
Rinternals.h, and implemented in memory.cpp. Now
memory.cpp does #include
Rinternals.h, but it does so with USE_RINTERNALS
defined, as a result of which the R_CHAR declaration in the
header file isn't seen by the compiler, and so the implemented function got
C++ linkage. I modified the header file by moving the R_CHAR
declaration outside the #ifndef USE_RINTERNALS.
print.cpp of functions intended to be
called from FORTRAN needed to be surrounded by extern "C"{ ...
}.deparse.cpp:1191 used & where
&& was surely intended; character.cpp:738
similarly used | instead of ||.-Wall complains about attempts to compare signed with
unsigned. This required explicit conversions in numerous places. Generally
(but not always) I did this by converting unsigned to signed. In other
places it was clear that the same effect could be achieved without
deleterious side effect by changing the type of a variable.
In connection with this, the macro AGE_NODE in
memory.cpp had to be changed to make an__g__
unsigned.
0.02-2.5.0In a subsequent phases (possibly starting in Phase 3) it is our objective to
replace the SEXPREC union by a hierarchy of C++ classes. This phase prepares
for that by reorganising the material in the header files in
src/include. This involves creating a new subdirectory
src/include/CXXR, and within that creating a new header file
RObject.h (ultimately to include a base class RObject
for the new hierarchy), and further header files RClosure.h,
REnvironment.h, RInternalFunction.h,
RPairList.h, RPromise.h, RSymbol.h and
RVector.h, corresponding respectively to
closxp_struct, envsxp_struct,
primsxp_struct, listsxp_struct,
promsxp_struct, symsxp_struct and
vecsxp_struct, which will eventually be derived classes. The
material in these new headers comes predominantly from
Rinternals.h, but to some extent (in the case of
RInternalFunction.h) from Defn.h. All of the new
header files, with the exception of RInternalFunction.h, are also
installed in $(rincludedir)/CXXR.
Function prototypes moved into the new header files are documented using doxygen. Where is was clearly consistent with
the semantics, some of the argument types of the functions were changed, either
by adding const, or by converting int into
Rboolean (however, see the issues below regarding the latter).
The following are implementational details and issues that arose:
SEXPREC (though still the unchanged C
code) was made visible only to C++ programs. This is to get advance warning
of potential problems when the implementation is changed to
C++.USE_RINTERNALS was defined, and otherwise as a function. It
has been the intention in this phase to replace the macros with C++ inline
functions: these would automatically also generate a non-inlined form, so
the separate definition (usually in memory.cpp) could be
dispensed with.
This was all very well where the function form was implemented in CR simply by invoking the macro; however in some cases the function form carried out some error checking before invoking the macro. Trying to convert the macro to an inline function would then result in two distinct functions with the same name, which the compiler and/or linker would certainly reject.
In the end it was decided to leave the macros in place for the time being: they'll have to be changed when the C++ implementation rolls out anyway.
USE_RINTERNALS compilation
conditions, but decided to retain it to mark out material (usually
currently in the form macro definitions) that will in the future need
privileged access to a C++ class. Only memory.cpp now
#defines USE_RINTERNALS.Rinternals.h contained many #defines of
function names to the same name prefixed by Rf_: this appears
to correspond in C++ terms to putting these functions in a namespace. I
split these #defines out into a separate header file
Rf_namespace.h, which is #included by
RObject.h (which is in turn included by the other new
headers). There are various similar #defines scattered around
other CR header files, which may need to be moved into
Rf_namespace.h in due course.RInternalFunction.h or RPrimitiveFunction.h.
Usage in the CR code (e.g. primsxp_struct) suggests the
latter, and the R Internals document speaks of internal and primitive
functions as being mutually exclusive, but fails to give a more general
name covering any function handled via R_FunTab. But it seems
to be reasonable to regard primitive functions as a special case of an
internal function, hence the eventual choice of
RInternalFunction.h.Rdynload.cpp and dotcode.cpp
each give compiler warnings under -pedantic because they
attempt to cast function pointers to void*. The source code of
the former already contains a comment saying that it's illegal even in C.
Not easy to fix, so leave for now.LGLSXP) should
contain items of type Rboolean rather than of type
int, and consequently that the macro/function
LOGICAL(SEXP) should return Rboolean* rather than
int*. I made some attempt to do this, but backed out of it for
the following reasons:
.C interface expects these vectors to contain
ints;gcc happens to use
int for Rboolean).MAYBE value in
the enumeration, perhaps Rboolean is best thought of as
'bool for C', rather than having any capability to handle
NAs.Possible new policy: within functions visible from C, use
Rboolean as a substitute for C++ bool, possibly
constrained to be 32Â bits long to avoid the enum
implementation dependencies noted above. However, R logical vectors will
continue to be represented using ints. (One day we might
define an Rlogical class - a wrapper round an int
- to handle logical vectors within C++, while C programs simply see
typedef int Rlogical;.)
0.03-2.5.0The primary objective of this phase was to redefine R_NilValue
as a null (i.e. zero) pointer of type SEXP.
R_NilValue is widely used within CR as a stub, i.e. to signify
that something that might be present is absent, in much the same way that a
null pointer is used within C or C++. However, in CR it is actually implemented
in effect as an element of a pairlist (i.e. struct listsxp), whose
CAR, CDR, TAG and attributes all point to itself. This would cause difficulties
in CXXR when we reimplement the SEXPREC union as a type hierarchy,
because pairlist elements will need to be of a specific type within the
hierarchy. If R_NilValue were given this type, it would preclude
its use as a general-purpose stub. But zero is a possible value for a pointer
of any type, so if we equate R_NilValue to zero this will sidestep
the problem.
Another disadvantage of the CR definition of R_NilValue is that
it needlessly introduces a cyclic data structure.
The following are implementational details and issues that arose in carrying out this change:
CAR, CDR, TAG and
ATTRIB on a SEXP that may in fact be
R_NilValue, expecting in this case for each of these functions
to return R_NilValue. These functions were reimplemented to
preserve this behaviour: i.e. each of them returns a null pointer if passed
a null pointer. At the same time the macro forms were abolished: they are
now implemented as inline functions for C++, and ordinary functions if
called from C.OBJECT and IS_S4_OBJECT
have been reimplemented to return FALSE if passed a zero
pointer. They too are now implemented as inline functions for C++, and
ordinary functions if called from C.NAMED: the policy here is
that the calling code should be modified as necessary to prevent it being
invoked for a null pointer. Deal similarly with invocations of
SET_NAMED, PRINTNAME,
NODE_IS_MARKED, SET_ATTRIB,
SET_OBJECT, and LENGTH. (This last case is
interesting because LENGTH is meant to be applied to vector
objects, i.e. components of the SEXPREC union different from
struct listsxp.) The calling sites concerned were determined
by running make check at top-level: doubtless many have
slipped through the net!memory.cpp were replaced by inline functions.A secondary objective of this phase was to get rid of C-style casts within the C++ code, wherever the appropriate remedy was reasonably obvious and straightforward. The following kinds of C-style casts were left in place pending further work:
DL_FUNC);DevDesc and GEDevDesc);(void*)(-1);R_varloc_t;Addendum 2007/08/06: although make check works with this
release, make check-devel doesn't.
0.04-2.5.1The primary objective of this phase was to update the program to parallel
release 2.5.1 of R. This proved to be straightforward, except that it was
necessary to install a later version of svn_load_dirs.pl to cope
with filenames containing @ signs. (However, I was surprised to
discover that svn merge doesn't track renames.)
Other changes were as follows:
make check-devel were fixed. In general
this was done by modifying certain functions to behave reasonably if passed
a null pointer, namely LENGTH (returns 0), NAMED
(returns 0) and SET_NAMED (does nothing). These changes
obviated some of the changes made leading up to svn revision 49 (see
Phase 3 above), and these changes were accordingly reversed. make
check-all also now works, but it was time-consuming to run and
revealed no bugs.autoconf working properly, and accordingly
backed out of some configuration kludges I had made previously.0.05-2.5.1The aim of this phase was to create a branch entitled const, to
explore to what extent the R code is amenable to 'constifying': i.e. converting
pointers and C++ references wherever possible to const pointers.
Two preliminary steps, carried out in the trunk, were as follows:
Similar changes were made to the header files under
src/include: however, the pattern here was to convert a macro
to an inline function if the header files was #included into a
C++ file, and to an out-of-line call to the same function if the header
file was #included into a C file.
This macro conversion was counterindicated in the following circumstances:
###define INC(x) ++(x)
(Using C++ reference arguments to get round this is not as straightforward as it might seem.)
SEXPREC
was defined along the following lines:
typedef struct SEXPREC { ... } SEXPREC;
with the first occurrence of SEXPREC being what in C would
have been a structure tag. This has now been changed to:
typedef struct RObject { ... } SEXPREC;
exploiting the fact that in C++ RObject is a fully-fledged
class name. The header files in src/include/CXXR now generally
refer to RObject rather than SEXPREC.
Having established the const branch, constification was set in
train by the brute force measure of redefining SEXP to mean
const RObject* rather than simply RObject*; a new
typedef mapped vSEXP onto plain
RObject*. In the same spirit 'v' variants of many of
the accessor functions were introduced: for example now CAR takes
a SEXP argument and returns a SEXP, while
vCAR takes and returns a vSEXP. (Since these accessor
functions are required to be callable from C, we can't simply overload
CAR.)
I then attempted to recompile various files, inserting 'v's
wherever the compiler demanded it. It quickly became apparent that these
'v's were highly contagious: for example, both
NA_STRING and R_EmptyEnv had to be declared as
vSEXPs rather than SEXPs. This led me to the
conclusion that it was premature to attempt constification until I understand
the evaluation process better.
At the time of tagging this release, the following files compile without
warnings in the const branch: memory.cpp,
envir.cpp and names.cpp. eval.cpp gives
one compilation error, when do_function attempts a non-const
operation on its op argument: fixing this would mean changing the
signature of all the do_ functions.
0.06-2.5.1In CR, each SEXPREC has a node class in the range 0 to 7. Nodes
of non-vector SEXPTYPE (i.e. not of types CHARSXP,
LGLSXP, INTSXP, REALSXP,
CPLXSXP, STRSXP, VECSXP,
EXPRSXP, WEAKREFSXP or RAWSXP) are all
in class 0, and are 28 bytes long. Class 7 is used for vector nodes whose
vector data amount to more than 128Â bytes; the remaining classes are used for
smaller vectors, classified according to their size. Nodes of class 7 are
allocated directly using malloc; nodes of the remaining classes
are allocated from 'pages' about 2Â kB in size, with each node class having its
own pages. In CXXR it is intended to replace SEXPRECs with an
extensible class hierarchy (rooted at RObject), so it will not be
feasible to put a tight upper bound on the size of non-vector nodes.
Another feature of CR is that in vector nodes, a single block of memory
contains the data of the vector preceded by a SEXPREC and
information about the length of the header. This is quite incompatible with the
design philosophy of C++, which is that the size of an object must be deducible
from its (C++) type: in particular ::operator delete relies on
this.
The purpose of Phase 6 was to circumvent these problems, and at the same time to endeavour to decouple the code for allocating memory from the code managing garbage collection. This comprised the following changes:
CXXR::Heap was created to handle allocation and
deallocation of blocks of memory. This parallels CR to the extent that
requests for large blocks are passed on directly to ::operator
new, while requests for small blocks are satisfied by allocating
fixed-sized cells carved out of 'superblocks'. However, this is an
implementational detail and is not visible to the remainder of CXXR: only
the total number of bytes and the total number of blocks allocated
via CXXR::Heap are visible (using static member
functions).
It is intended that CXXR::Heap will serve as a back-end to
implementations of operator new and to an STL-compatible Allocator class.
Note in particular that the blocks allocated from CXXR::Heap
are not exclusively used to create RObjects, but may be used
for any purpose where rapid allocation/deallocation of small blocks is
required.
CXXR::Heap. (CR
deallocates only large vector nodes.)CXXR::Heap; a data member m_data of
RObject (in due course to be factored out into a derived
class) points to this block. For non-vector objects, and vectors of size
zero, m_data is a null pointer. (CR appears to allocate at
least 8 bytes of vector data even when the nominal size of the vector is
zero.)CXXR::Heap, divided by 8.
I was strongly tempted to base GC exclusively on (b), and to ignore the number of nodes - after all, we're talking about a single resource here: memory. I'd welcome opinions about this.
0.07-2.5.1The purpose of this phase was to encapsulate all the garbage-collection
logic within C++ classes. Five such classes were introduced, namely
GCManager, GCNode, GCEdge,
GCRoot and WeakRef, as now described.
GCManager, as the name implies, carries out high-level
management of garbage collection. It has no non-static data or methods.
When CXXR::Heap indicates (via a callback) that it is
on the point of requesting additional memory from the operating system,
method GCManager::gc() decides whether to carry out a garbage
collection, and if so how many generations to collect. As comtemplated at
tag 0.06-2.5.1, this decision is now based only on the total memory
allocated via CXXR::Heap, and not on the number of
nodes allocated. If GCManager decides to carry out a garbage
collection, this is carried out by calling GCNode::gc(),
specifying the number of generations to be collected.GCNode is intended to be the base class for all
objects subject to garbage collection; RObject is now derived
from GCNode. All GCNodes are threaded on circular
doubly-linked lists according to their generation, managed via the
static private vector s_genpeg. Element 0 of this vector
represents the 'new' generation of nodes that have not yet been exposed to
the garbage collector; nodes that survive garbage collection are moved into
successively higher generations.GCEdge<T>, where T
(defaulting to RObject*) is a pointer to a class type derived
from GCNode, represents a directed edge within the directed
graph whose nodes are the GCNodes. Whenever an object of a
type derived from GCNode wishes to refer to another such
object, it should do so by incorporating a GCEdge
encapsulating an appropriate pointer, rather than by incorporating the
pointer directly. The class provides for GCEdge<T> to be
implicitly converted to T in contexts which require this.
GCEdge contains the logic for ensuring that a node in a
higher generation never includes a reference to an object in a younger
generation. If any attempt is made to direct a GCEdge from an
older node to a younger node, that younger node is immediately promoted to
the the generation of the older node, and this change is propagated through
the outgoing GCEdges of the younger node, and so on
recursively. (In other words, it implements the
EXPEL_OLD_TO_NEW logic that can be configured into CR (but is
not the default for CR).)
GCRoot<T>, where T
(defaulting to RObject*) is a pointer to a class type derived
from GCNode, is intended to protect GCNodes from
the garbage-collector. A GCNode pointed to by a
GCRoot will not be garbage collected for as long as the
GCRoot object exists. The constructor and destructor of this
class therefore perform similar functions to the
PROTECT/UNPROTECT macros of CR, but within a C++
idiom, in which the programmer is spared the need to check that
PROTECTs are balanced by UNPROTECTs. (However,
PROTECT and UNPROTECT continue, and will
continue, to be available within CXXR.) The class provides for
GCRoot<T> to be implicitly converted to T
in contexts which require this.
The implementation of GCRoot uses an internal stack, and
consequently requires (and checks) that GCRoots are destroyed
in the reverse order of their creation. This should cause no problem as
long as only variables with automatic or static storage duration are
declared as GCRoots.
Despite successful experiments, the deployment of this class has been
deferred, pending the replacement of
setjmp/longjmp within CXXR by C++ exceptions.
This is because destructors of C++ automatic variables are not called when
the stack is unwound by longjmp (see ISO14882:2003 sec. 18.7);
they are when the stack is unwound by a C++ exception.
WeakRef implements weak references (SEXPTYPE
WEAKREFSXP) in a way intended to be functionally identical to CR.
Each weak reference has a key and, optionally, a value and/or a finalizer.
The finalizer may either be a C/C++ function or an R object.
The garbage collector will consider the value and finalizer to be reachable provided the key is reachable. If, during a garbage collection, the key is found not to be reachable then the finalizer (if any) will be run, and the weak reference object will be 'tombstoned', so that subsequent calls to key() and value() will return null pointers. A weak reference object with a reachable key will not be garbage collected even if the weak reference object is not itself reachable.
Note that, in CXXR, weak references are not implemented as four-element
vectors, and the class has separate, appropriately typed fields for R and
C/C++ finalizers (though at most one of these fields may be used in any
particular WeakRef object).
0.08-2.5.1setjmp and longjmp (and
sigsetjmp and siglongjmp) within directory
main have been removed, and replaced by using
JMPException, a C++ exception class designed as far as
possible to be a drop-in replacement for
setjmp/longjmp. This is to ensure that the
destructors of C++ objects are invoked as the stack is unwound following an
exceptional condition.
Use of JMPException should be regarded as an interim
measure. Normal C++ coding practice is for throw simply to
report the exceptional condition that has arisen, rather than - as with
JMPException - in effect requesting a specific subsequent flow
of control.
GCNodes from the
garbage collector is now to use the templated class GCRoot.
GCRoot's constructor will protect the GCNode in
question, and its destructor will unprotect it; there is therefore no need
for the programmer to remember to balance out the use of
PROTECT and UNPROTECT as in CR.
The facilities of CR's pointer protection stack (using e.g.
PROTECT and UNPROTECT) remain available, but the
underlying implementation has been rewritten in C++ as part of the
GCRootBase class. CXXR makes the additional requirement that
when UNPROTECT or REPROTECT are applied to a
pointer, this is carried out in the same context (RCNTXT) as
that in which the pointer was PROTECTed. This is to help pick
up mispairing between PROTECT and UNPROTECT.
Rinternals.h and
Defn.h, contain macro definitions of the form
#define func Rf_func
These serve to avoid name clashes (at least at the linker level) with
third-party packages; a similar purpose would be achieved in C++ by placing
the function func in a namespace Rf. (In Phase 2
these macros were generally shifted into a separate header file
Rf_namespace.h, but this change has now been reversed.) Using
the preprocessor to modify program tokens in this way is something that
many C++ programs will shun, especially since some of the tokens concerned
(e.g. length) are likely to be widely used. However abolishing
these macros altogether would break much existing code. Nevertheless,
reliance on them is now deprecated within CXXR, and in particular all
header files within src/include have been modified as
necessary to include the Rf_ prefix explicitly where it is
needed.
0.09-2.6.1The primary objective of this phase was to update the program to parallel release 2.6.1 of R.
Other changes were as follows:
include/CXXR, and at the same time to add doxygen
documentation. This has now been modified into a policy of copying
the prototypes into the relevant CXXR header file, and adding documentation
there, but leaving the prototype also in the CR header file. This will make
it easier to track changes in function signatures when we upgrade to future
releases of R. To this end a script allincludes.pl has been produced. This
generates an (otherwise trivial) C++ source file that
#includes all the header files under src/main and
src/include; compiling this file checks that the prototypes in
the CXXR header files are consistent with those in the CR headers.
In the light of this change, the policy regarding the Rf_
prefix described under Phase 8 has been modified. Whilst all header files
in the CXXR directory should use the Rf_ prefix
explicitly, header files derived from CR (e.g. Rinternals.h
and Defn.h) should normally omit the prefix if the
corresponding CR file does so.
CXXR directory.SGN_DFL
are defines as macros in terms of C-style casts, so main.cpp
still gives warnings if compiled using gcc with
-Wold-style-cast.)0.10-2.6.1The primary objective of this phase was to reimplement all vector data types
as C++ classes derived (directly or indirectly) from RObject,
rather than using vecsxp_struct within the RObject::u
union. vecsxp_struct has not yet been eliminated entirely,
however, because of some straggling uses of truelength.
Other changes were as follows:
R_alloc and kindred functions
are no longer implemented as objects inheriting from RObject.
Instead these blocks are managed separately via a new class
RAllocStack. When the stack size is reduced using
vmaxset, the memory blocks are released immediately, rather
than being left to the garbage collector.GCNode is immune from garbage collection while it
is being constructed, leading to considerable simplification.GCEdge was abolished: it was felt that
the advantage of encapsulating the write barrier within a single class was
outweighed by various knock-on obscurities.HASHASH, SET_HASHASH and
SET_HASHVALUE abolished: the new class
CXXR::String will compute and cache hash values automatically
on demand.0.11-2.6.2The primary objective of this phase was to update the program to parallel
release 2.6.2 of R. Errors and warnings given by make check-devel
were also corrected.
0.12-2.6.2The primary objective of this phase was to eliminate the
RObject::u union completely, replacing its remaining elements with
classes derived from RObject. This entailed the creation of the
following classes: BuiltInFunction, ByteCode,
Closure, DottedArgs, Environment,
Expression, ExternalPointer, PairList,
Promise, SpecialSymbol and Symbol.
Several loose ends remain to be tied up, however; in particular, the remaining
data members of RObject ought all to be private.
Other changes were as follows:
CXXR::Heap has been renamed
CXXR::MemoryBank to avoid confusion with standard data
structures called heaps.GCNode and GCRootBase are now
initialized using a Schwarz counter, thus enabling certain standard objects
(e.g. the 'not available' string, and the global environment) to be
declared as static class members: it is no longer necessary to wait until
InitMemory() has been called before creating them. This in
turn simplifies the implementation of the garbage collection algorithm,
which no longer has to treat these objects specially. Concomitant with this
change, the R interpreter now terminates by throwing an exception of class
ExitException, which ensures that all GCRoot
objects are destroyed in the reverse order of their creation.String objects now belong to one of two subclasses,
CachedString and UncachedString, with the former
being the preferred implementation. At any time, at most one
CachedString with given text and encoding will exist; to
enforce this, the class constructor is private, and instead clients use the
static method obtain() (accessible from C via the function
mkChar()) to get a pointer to a CachedString
object with specified text and encoding. The implementation of the cache is
different from that used in CR, and is based on the C++ standard library;
it has the advantage that cached strings do not need any special handling
by the garbage collector. There are no facilities for modifying the text or
encoding of a CachedString once it has been created; in
particular the function CHAR_RW() can be used only on
UncachedString objects.0.13-2.6.2This phase was an attempt - less successful than was hoped! - to close the gap in speed between CR and CXXR. Principal changes were:
CellHeap. CellHeap
differs from CellPool (used previously for this purpose) in
that whenever a memory block is requested from a CellHeap, the
allocated block will always be the one with the lowest address among the
available blocks. This is achieved using a skew heap data structure, and is
intended to increase the spatial localisation of successively allocated
blocks. Where the underlying OS provides posix_memalign(), the
superblocks from which memory blocks are allocated are aligned with memory
pages.MemoryBank now uses CellHeaps with more closely
spaced block sizes than were used previously, to avoid wasting space in
cache lines.GCNode object has its generation changed as a result
of write barrier enforcement or by being exposed to the garbage collector,
it is no longer immediately shifted to the list appropriate to its new
generation. Instead this is deferred until the sweep phase of a garbage
collection visits the node. This avoids pulling nodes into the processor
cache unnecessarily, and paves the way for the following change.GCNode manages garbage
collection are now singly-linked rather than doubly-linked. This and other
changes mean that the size of a PairList node (cons cell) has
been reduced (on 32-bit architecture) from 40Â bytes to 32Â bytes.
DumbVector nodes have been reduced in size by 12Â bytes.CellHeap works particularly efficiently if memory blocks are
released in decreasing address order.GCNode objects are
exposed to the garbage collector has been simplified and streamlined to
avoid pulling nodes into the cache unnecessarily. First,
GCNode::expose() exposes only the node for which it is
invoked; it does not look for unexposed descendants of this node. Secondly,
protecting a node from the garbage collector (e.g. using
GCRoot<T> or PROTECT()) no longer
automatically exposes the node. (However, write barrier enforcement will
continue to expose nodes if an exposed node is modified to refer to an
unexposed node, and this exposure will propagate to descendants: this falls
out automatically from the write barrier enforcement algorithm.)GCNode::operator new no longer zeroes the memory it
allocates.0.14-2.7.1The objective of this phase was to update CXXR to parallel release 2.7.1 of R. However, other changes are:
dynamic_cast from the
'glue layer' between code inherited from CR and new CXXR code.
(dynamic_cast can be surprisingly slow.)SET_TYPEOF() has been abolished.R_NilValue is now defined as a macro expanding to
NULL (which will in turn typically expand to
(void*)0 in C and simply to 0 in C++). Previously
it was defined as
SEXP R_NilValue = 0;
which necessitated unnecessary memory fetches.
0.15-2.7.1The objective of this phase was to tidy up the class hierarchy rooted at
RObject, and in particular to give RObject itself a
more distinctive class identity, i.e. for it to be less of a ragbag for things
that hadn't yet been accommodated elsewhere. Principal changes were:
RObject now controls attributes more closely. The
attributes (if present) must now be a PairList, each of whose
elements must have a distinct symbol as its tag. No attribute may have a
null value. The m_has_class field is automatically set
according to whether or not there is a class attribute; consequently
SET_OBJECT() has been abolished. However, the class interface
does not yet enforce all necessary consistency conditions on attributes;
these are still applied by the code in attrib.cpp.m_debug field of RObject has been
abolished. Instead the Closure and Environment
classes each contain a field controlling debugging.m_trace field of RObject has been moved to
a new class FunctionBase, from which the Closure
and BuiltinFunction classes are now derived.m_flags field of RObject, which replaced
the gp ('general purpose') field within
sxpinfo_struct, has been abolished. It has been replaced by
various special-purpose fields, placed as far down the class
hierarchy as is practical at present. A virtual function
packGPBits() is used to reconstitute the old gp ('levels')
word for the sole purpose of serialization; virtual function
unpackGPBits() is correspondingly used during deserialization.
(However, not all of the fields that have replaced m_flags
need to be serialized/deserialized.)HandlerEntry, defined locally within
errors.cpp, is used to handle error handler entries, rather
than using a ListVector for this purpose. This avoids the
former use of the m_flags field here.const pointers
to objects that really ought to be immutable, R_UnboundValue
for example. To counter this, RObject now has a Boolean field
m_frozen: non-const member functions in the
RObject hierarchy can now apply a run-time check that their
object has not been frozen. In particular, attempting to change the
attributes of a frozen object gives rise to an error.String is now an abstract class. CachedString
objects are now frozen by the constructor. R_NaString is also
frozen.SpecialSymbol has now been merged into
Symbol. Entities such as R_UnboundValue, which
were formerly implemented as SpecialSymbol objects, are now
implemented as frozen Symbols.0.16-2.7.2The objective of this phase was to update CXXR to parallel release 2.7.2 of R.
uncxxr.pl. Where a source file inherited from CR -
foo.c, say - has been adapted for CXXR (and changed into a C++
file foo.cpp in the process), this script endeavours as far as
possible to reverse systematic changes (e.g. the conversion of C-style
casts into C++ casts) to generate a quasi-C file foo.bakc. (We
say 'quasi-C' file because the resulting file may not be syntactically
correct C: it is intended for human eyes only.) Updating to a new release
of R is facilitated by using a 3-way visual diff between the release of
foo.c currently shadowed by CXXR, the new release of
foo.c, and foo.bakc. This helps to highlight
where the significant changes are in the new release of foo.c,
and where they might conflict with changes made in CXXR. (A similar 3-way
comparison using foo.cpp instead of foo.bakc
throws up too much 'noise'.)uncxxr.pl.
However, this has so far only been done for C++ source files that needed to
be changed in any case as part of the upgrade to 2.7.2.0.17-2.7.2The primary purpose of this phase was to reimplement the functionality of
duplicate1() in duplicate.cpp using class copy
constructors and a virtual function RObject::clone(),
reimplemented as necessary in derived classes. The following changes were
associated with this:
GCNode::expose() is once again recursive in effect, thus
reversing a change made in Phase 13. Cloning a node often requires cloning
an entire subgraph of the node graph, via recursive calls of
clone() to copy subobjects. The approach taken is that while
the copy subgraph is under construction, none of its constituent nodes is
exposed to the garbage collector: in particular clone() itself
does not expose the objects it creates to the collector. Only when the copy
subgraph is complete is the whole subgraph exposed, and to do this the code
that called to 'topmost' clone() must then apply the
newly-recursive expose() function to the pointer that
clone() returned. (Trying to expose nodes individually as the
construction proceeded meant that they were at risk of being snatched away
by the garbage collector before the subgraph was complete: it is difficult
to work around this in a way that sits easily with C++ programming
idioms.)GCNode::devolveAge(), used in enforcing the write barrier,
has been renamed propagateAge(), and this function remains
recursive in effect. However, at the time of call, propagateAge(const
GCNode* node) changes the generation number only of
node (if necessary); the recursive propagation of
this change is deferred until the start of the next garbage collection.
(Unfortunately the same technique cannot be applied to
expose() for a reason explained in its documentation.)RObject are clonable, and for
unclonable types, clone() returns a null pointer. When a copy
constructor copies a pattern object containing a subobject of an unclonable
type, the object constructed will at the appropriate point simply contain a
pointer to the subobject of the pattern object, rather than to a clone of
that subobject. This copying logic is encapsulated in a templated 'smart
pointer' type RObject::Handle<T>, and for example the
'car' pointer of a PairList object is now a
Handle<RObject>. Similarly, the former templated class
EdgeVector<T> has been replaced by
HandleVector<T> which - as the name suggests - is
implemented using a
std::vector<CXXR::RObject::Handle<T>Â >.0.18-2.8.1The objective of this phase was to update CXXR to parallel release 2.8.1 of R.
uncxxr.pl script (see Phase 16) has been somewhat
further developed, and a larger number of C++ files derived directly from
CR have been tweaked so that uncxxr.pl can back-convert them
more accurately to their CR form.reinterpret_cast
has been replaced by static_cast wherever this possible
without artifice. This has been facilitated by the introduction of a
function CXXR_alloc, which does the same job as
R_alloc, but - like malloc but unlike
R_alloc - returns void* rather than
char*. (uncxxr.pl converts
CXXR_alloc back to R_alloc.)0.19-2.8.1The primary purpose of this phase was to refactor environments, to pave the way for introducing provenance-tracking features into R. The following changes were associated with this:
Symbol class now enforces the requirement that
(except for certain special Symbols), there is at most one
Symbol with a given name. (CR enforces a similar requirement,
but less comprehensively, using the install() function.) To
facilitate this, it is now a requirement that a Symbol's name
be a CachedString object, rather than any String
object.SYMSXP objects contained a
pointer to an arbitrary object, which was considered to be the
Symbol's value within R's base environment and base namespace.
Objects of the C++ Symbol class no longer contain such a
pointer, and the base environment and base namespace are implemented in
exactly the same way as other Environment objects.SYMSXP objects
contained a pointer to an R object of a function type, which was used when
the Symbol was used as the name of a function invoked via R's
.Internal() interface. Objects of the C++ Symbol
class no longer contain such a pointer; instead the relevant mapping is
defined by the C++ class DotInternalTable.Environments on the search path has
been abolished, at least for the time being.Frame has been introduced, inheriting from
GCNode but not from RObject. A Frame
defines a mapping from Symbol objects to arbitrary
RObjects.Environment object now contains a pointer to a
Frame object, which defines its 'local frame'. The base
environment and the base namespace have the same Frame.Frame itself is an abstract class, allowing different
implementations along the lines provided by the RObjectTables package to
be achieved simply by class inheritance. In most cases, however, the
concrete class StdFrame is used, in which the mapping from
Symbols to RObjects is provided by a hash table,
implemented using class unordered_map from the TR1 extensions
to the C++ standard library. This implementational detail is not made
visible to R code.MemoryBank::allocate() has been changed to
allow the caller to specify that the call shall not result in a garbage
collection. Class CXXR::Allocator uses this to ensure that
manipulations of standard containers using CXXR::Allocator do
not result in reentrant calls to the standard library code, which might
otherwise happen if the garbage collector attempted to delete objects
handled by the container.0.20-2.8.1The purpose of this phase was extensively to reengineer garbage collection.
This was to pave the way to experimentation with reference-counting approaches
to garbage collection; however, release 0.20-2.8.1 itself still
uses generational mark-sweep. A major change has been in the way of
implementing 'infant immunity', whereby nodes that are under construction are
not liable to garbage collection; the following is a summary of the way in
which this has evolved. The phrase 'infant nodes' means nodes that are either
under construction, or whose construction is complete but which have not yet
been exposed to garbage collection by calling GCNode::expose().
PairList copy
constructor, for example, the copied list was created working forwards
along the pattern list, but then the whole structure of the copied list
would then need to be traversed again to expose its nodes to garbage
collection. (This was achieved by having GCNode::expose()
automatically recurse to subobjects.)0.20-2.8.1 was to regard infant nodes as reachable during
mark-sweep. So, during a mark-sweep garbage collection, all the infant
nodes and their descendants would automatically be marked. So the
PairList copy constructor can expose the second and subsequent
nodes of the copied list immediately it has created them, leaving only the
head of the list unexposed, and thus conferring immunity from garbage
collection on the whole structure. There is no longer any need for
expose() to recurse to subobjects. The snag with this approach
was that during the mark phase, the Marker visitor could
invoke the visitReferents() method of objects whose
construction is not yet complete, and which may therefore contain junk
pointers. Obviously, if a visitor was directed to a junk address, that
would probably crash the interpreter. The workaround for this was to have
GCNode::operator new zero out the memory it allocated for new
GCNode objects, so that instead of junk pointers, an object
under construction would contain null pointers, which
visitReferents() could readily detect. However, this zeroing
of memory was time consuming (and wouldn't immediately be portable to some
strange hardware architectures in which null pointers are not represented
by binary zero).GCNode to
keep a count of the number of infant nodes, and not to initiate a
mark-sweep garbage collection while any infant nodes exist. This has the
advantages of the second approach, but without the disadvantage:
visitReferents() will never be called for a node whose
construction is incomplete, and there is consequently no need for zeroing
memory. It also simplifies the handling of the case where an exception is
thrown within the constructor of an object derived from
GCNode.Other changes are as follows:
GCEdge<T> (which was abolished at
Phase 10) has been reinstated, and encapsulates the write barrier.
RObject::Handle<T> now inherits from
GCEdge<T>.GCRoot<T> has been renamed
GCStackRoot<T>, and its implementation simplified. These
objects remain subject to the restriction that they must be destroyed in
the reverse order of their creation, and are therefore best suited to
declaration as automatic variables (i.e. variables on the processor stack).
A new templated class GCRoot<T> has been introduced:
this does a similar job to GCStackRoot (i.e. it is a smart
pointer providing protection from garbage collection), but is not subject
to creation/destruction order restrictions. However, construction and
destruction of GCRoots is more time consuming than for
GCStackRoots, so the latter should be preferred where
possible. CR's 'precious list' has been reimplemented as part of the base
class of GCRoot. The ExitException class has been
abolished, since the new GCRoots make it unnecessary.MemoryBank no longer contains any logic related to
garbage collection, and in particular there are no callbacks from
MemoryBank into the garbage-collection code. The decision
about whether to initiate a mark-sweep collection is now taken in
GCNode::operator new.0.21-2.8.1This phase changes the approach used for garbage collection. Previous phases
used a generational mark-sweep collector, like CR itself. As of Phase 21, the
principal method of garbage collection is reference counting. The principal
motivation for this is to make better use of the processor caches: with
reference counting, the memory occupied by objects that become garbage is
quickly recycled into productive use, very likely while this memory is still
mapped in cache.
To implement reference counting, each GCNode object contains a
one-byte reference count, which is automatically adjusted by the
GCEdge<T>, GCRoot<T> and
GCStackRoot<T> smart pointers, and by the traditional CR
PROTECT/UNPROTECT mechanism. (If a node's reference
count ever reaches 255, it sticks at that value, and that node can only be
garbage-collected by the mark-sweep mechanism.) When a GCNode's
reference count falls to zero, it is declared 'moribund'. When
GCNode::operator new is called upon to allocate memory for a new
GCNode object, it first looks through class GCNode's
internal list of moribund nodes. Any nodes on the list which still have a
reference count of zero are deleted; nodes whose reference count has risen back
above zero - accounting for about one in four of the nodes on the moribund list
- are returned to the 'live' list.
To cope with cycles in the node graph (i.e. the directed graph whose nodes
are GCNodes and whose edges are GCEdges), this
reference counting scheme is backed up by a simple (i.e. non-generational)
mark-sweep scheme. However, this runs much more rarely than CR's garbage
collections, and uses a simpler logic to manipulate the threshold at which
mark-sweep collection takes place. Not having node generations means that there
is no longer a need to implement the 'write barrier'; this in turn means that
the GCEdge<T> templated class can have a C++ assignment
operator defined, which enables it to be more freely used in connection with
the container types in the C++ standard library.
Weak reference (WeakRef) objects need special handling during
garbage collection, and consequently each WeakRef object now
includes a pointer to itself, to stop it being deleted by the reference
counting mechanism.
0.22-2.9.1The purpose of this phase was to update CXXR to parallel release 2.9.1 of CR. (Unfortunately, it was overtaken by release 2.9.2 of CR.)
uncxxr.h now defines a macro CXXRconvert(type,
expr), which expands to type(expr), but which
uncxxr.pl replaces simply by expr. This macro is
now widely used in code inherited from CR in cases where C++ requires an
explicit type conversion but C does not.0.23-2.9.2The purpose of this phase was to update CXXR to parallel release 2.9.2 of CR. This proved straightforward.
0.24-2.9.2This phase represented the first stage of refactoring the interpreter's evaluation logic into C++, and included the following principal changes:
CXXR::Evaluator has been introduced to carry out
general services and housekeeping in support of evaluation.
Rf_eval() is now simply a wrapper round
Evaluator::evaluate().RObject now defines a virtual function
evaluate(), which Evaluator::evaluate() uses to
evaluate a particular object. By default this simply returns a pointer to
the RObject for which it was invoked, but this behaviour is
overridden in various classes (e.g. Expression,
Symbol and Promise) to provide substantive
functionality.FunctionBase now defines an abstract
virtual function apply(), which is invoked by
Expression::evaluate() to apply a function to a specific set
of actual arguments.BuiltInFunction now has subclasses
OrdinaryBuiltInFunction (corresponding to
SEXPTYPE BUILTINSXP) and
SpecialBuiltInFunction (SPECIALSXP). (It is
possible that these classes will be abolished in the future, with their
respective functionalities - which differ only slightly - being moved into
BuiltInFunction.)BuiltInFunction::apply(), through to
the invocation of the appropriate do_ function, is now fully
handled within the CXXR core. do_internal() has also been
absorbed into the CXXR core. For the time being, however,
Closure::apply() is simply a wrapper round CR's
Rf_applyClosure().R_FunTab in CR, is now a private static
data member of class BuiltInFunction. This class now uses a
Schwarz counter, which automatically initialises the function table on
program start-up.0.25-2.9.2This phase continued with refactoring the interpreter's evaluation logic into C++, and comprised the following principal changes:
Closure::apply() has now been reimplemented within the CXXR
core, making use of a new class ArgMatcher to carry out
argument matching. For the time being the function
Rf_applyClosure() remains in existence, but it is now used
only in connection with method dispatch.OrdinaryBuiltInFunction and
SpecialBuiltInFunction have been abolished, and their
functionalities absorbed into BuiltInFunction.RObject,
has been defined and put into practice regarding the use of const
T*, where T is RObject or a class
inheriting from it. This policy aims to resolve as far as possible an
inherent tension between the way CR is implemented and the
'const-correctness' that forms part of C++ programming style.WeakRef) objects has
been improved and tidied up in various ways. In particular, when the key
object of a WeakRef is found to be unreachable, it is now
guaranteed that the weak reference's finalizer (if any) will be run as part
of the same mark-sweep garbage collection that collects the key.0.26-2.10.1The purpose of this phase was to update CXXR to parallel release 2.10.1 of CR.
0.27-2.10.1This phase comprised the following principal changes:
SET_ENCLOS() has been superseded by new mechanisms for
manipulating the enclosing relationships of Environments,
which ensure that acyclicity is preserved.Symbol bindings found along the search
list has been introduced, similar to that used in CR.R_isMissing() reimplemented as
CXXR::isMissingArgument(); unlike the previous CXXR
implementation, it no longer requires any memory allocations.GCNode class can now optionally include diagnostic code
to identify cycles within the GCNode/GCEdge
graph.0.28-2.10.1This phase was concerned with refactoring contexts (CR's
RCNTXT), and involved teasing apart the numerous distinct
functions that this struct plays in CR:
Evaluator::Context.longjmp targets from
the destination to the point where longjmp is called. C's
setjmp and longjmp are incompatible with C++
exception handling, and were removed from CXXR at Phase 8. At that stage,
however, they were simply replaced by an exception class
JMPException, which was designed simply to ape the behaviour
previously achieved with longjmp. JMPException
has now itself been abolished, and replaced with three exception classes
LoopException (servicing R functions break and
next), ReturnException (which services the R
function return and various other indirect flows of control)
and CommandTerminated (raised in response to unhandled errors
or user interrupts). These new exception classes are used in a way
consistent as far as possible with C++ programming idioms; in particular,
the class Evaluator::Context plays no direct role in
controlling their propagation, and the CR function
findcontext() no longer exists.longjmp). For the time being, this save/restore functionality
has been retained within the Evaluator::Context class, though
in some cases the functionality is achieved by incorporating an object of
some other class, such as ProtectStack::Scope or
RAllocStack::Scope, within an Evaluator::Context
object.
In all cases this save/restore functionality is now achieved, following
a standard C++ idiom, by the constructor of a stack-based object saving
state, and then its destructor restoring it. This automatically copes both
with the normal flow of control and with exceptions, so there is now no
need for CR's R_restore_globals() function.
In the future, it is likely that some of the save/restore functions now
carried out by the Evaluator::Context class will be factored
out into new classes with more specific responsibilities.
on.exit expressions. This
function is now also encapsulated within the
Evaluator::Context class. Any on.exit expressions
attached to a Context object are evaluated automatically by
the object's destructor. This automatically copes both with the normal flow
of control and with exceptions, so there is now no need for CR's
R_run_onexits() function.break, next and return) are used
only in circumstances where there is an appropriate destination. In CXXR
this is now accomplished using the classes
Environment::LoopScope and
Environment::ReturnScope.Browser.Other changes in this phase were:
Promise stack has been abolished, the necessary
functionality now being achieved with C++ try-catch logic.R_RestartToken, R_ReturnedValue and
R_Toplevel. (CR's TOPLEVEL contexts have been
replaced by Evaluator objects.)0.29-2.10.1The primary purpose of this release was to define the baseline for the results on add-on packages reported at useR! 2010. The changes are mainly bugfixes, but with the following more substantive changes:
RObject hierarchy may evaluate R expressions. This has
entailed a change to the implementation of
PairList::construct(), which was previously not reentrant; in
the new implementation, this function never gives rise to garbage
collection.RObject concerned with setting and
examining attributes are all now either virtual or implemented via
calls to virtual functions. This means that classes within the
RObject hierarchy can apply their own consistency checks to
attribute settings, and also override or augment the way in which attribute
values are stored within the C++ object.0.30-2.11.1The primary purpose of this phase was to update CXXR to parallel release 2.11.1 of CR. This included the following corrections to significant preexisting bugs:
COMPLEX(), INTEGER(),
LOGICAL(), RAW(), REAL(),
R_CHAR(), STRING_ELT(),
SET_STRING_ELT(), VECTOR_ELT(),
SET_VECTOR_ELT(), XVECTOR_ELT() and
SET_XVECTOR_ELT() now verifies not only that its vector
argument is a pointer to an RObject of the correct type, but
also that this argument is not a null pointer.
SET_STRING_ELT() also now verifies that the pointer to the new
String value is not null. These changes bring the behaviour of
these functions back into line with CR. These non-null checks are applied
even if CXXR is built with the preprocessor variable
UNCHECKED_SEXP_DOWNCAST defined (which causes the type checks
to be elided).do_browser() correctly
saves and restores the restart handler stack, and to ensure that the
browser can be invoked at top-level. (There is however still a problem that
typing Q into the browser does not work as described in the
manual page: it simply returns to the browser prompt.)0.31-2.11.1This phase included extensive changes:
Evaluator::Context class is now the
root of a hierarchy of classes. A Context object of some kind is now
created for every R function invocation (this no longer depends on whether
profiling is in progress), but the intention is that these Context objects
are lightweight, and contain only information relevant to the particular
function invocation.return and break functions are handled by C
setjmp/longjmp. Since these are incompatible with
the orderly stack unwinding that C++ requires, at Phase 8 CXXR everywhere
replaced invocations of longjmp by throwing C++ exceptions.
Unfortunately the propagation of C++ exceptions incurs a considerable
overhead.
An R function such as return is now implemented so that it
creates an object of a class inheriting from Bailout. The
basic idea is that this object is then passed as a return value up the
chain from called function to caller, until it reaches the intended
destination of the indirect flow of control. However, this passing up the
call chain happens only if the caller has indicated, by wrapping its call
in a BailoutContext, that it is able to propagate the
Bailout object correctly. If that is not the case, then the
called function will invoke the throwException() method of the
Bailout object, which - as the name suggests - will complete
the indirect flow of control by throwing a C++ exception.
This change has greatly reduced the number of C++ exceptions that are thrown, with corresponding benefits for performance.
ArgList. Rf_applyClosure() and
R_execClosure() have been abolished, their functionality now
being incorporated into the Closure class. However much
remains to be done.MemoryBank and
CellPool) using Valgrind client requests. This instrumentation
was controlled by the preprocessor variable VALGRIND_LEVEL.
Unfortunately the instrumented CXXR ran under Valgrind with glacial
slowness, making it useless for practical purposes. Under the new approach,
VALGRIND_LEVEL has been abolished. Instead, when Valgrind
(+memcheck) is to be used, the file MemoryBank.cpp should be
recompiled with the preprocessor variable NO_CELLPOOLS
defined, and CXXR rebuilt. (Only this one file needs to be recompiled.)
When NO_CELLPOOLS is defined, class MemoryBank
routes all requests for memory blocks directly to ::operator
new (which no doubt in turn calls malloc()). This means
that Valgrind's internal malloc() substitute comes into play,
and the result runs at an entirely usable speed.
CXXR has also been changed to carry out a more thorough clean-up at
program exit; in particular all objects of a class derived from
GCNode are deleted, and the tables of Symbols and
CachedStrings are deleted. This suppresses a lot of the
'possibly lost' reports that Valgrind's leak check would otherwise
report.
0.32-2.11.1This phase consisted of changes to improve the speed of CXXR. The principal changes were as follows:
GCNode falls to zero, it is
designated as 'moribund'. Previously moribund nodes were moved onto a
separate doubly-linked list of nodes (and moved back again if the reference
count was found subsequently to have risen). Now instead the
GCNode class maintains a vector of pointers to moribund nodes.
Also, the moribund flag within a GCNode object is now
incorporated into the same byte as the saturating reference count.PairList objects have now been squeezed into 32 bytes (on
32-bit architecture) - with some resulting inelegances in encapsulation -
and Frame::Binding objects have been reduced to 16 bytes
(again on 32-bit architecture). Class CellPool now allocates
its 'superblocks' on 4096-byte boundaries. These changes make for better
utilisation of the processor caches.VectorFrame has been introduced, and used to
implement the local Environments of Closure calls
instead of the StdFrames used previously. As the name
suggests, VectorFrame is an implementation of the
Frame abstract type which holds its constituent
Frame::Bindings as a vector. Although look-up time is
asympotically linear in the number of Bindings, as compared with the
logarithmic performance of StdFrame, it has a shorter
construction and destruction time than StdFrame, and is better
localised in memory. These factors make VectorFrame more
efficient in implementing small Frames with a short
lifetime.0.33-2.12.1The purpose of this phase was to update CXXR to parallel release 2.12.1 of
CR. In the course of this, the use of UncachedString objects was
largely replaced by the use of CachedString objects, a change that
has lagged behind the corresponding change in CR.
0.34-2.12.1This phase was marked by a wider use of C++ generic programming techniques, both to simplify the internal code, and to make this code available in a flexible form to add-on packages. In particular:
FixedVector.Subscripting and associated
functions.)VectorOps.ElementTraits namespace.0.35-2.12.1This release is intended to clear the decks prior to an upgrade to R 2.13.1, and includes only small changes in the development trunk:
Subscripting has now been extended to cover
subassignment to matrices and arrays.GCNode has been modified,
reducing its administrative data to a single byte.(The main activity in the period leading up to this release has been the
introduction of the lazycopy branch, which is exploring methods
for managing object duplication automatically via the RHandle
smart pointer, and eliminating the need for NAMED() and
SET_NAMED(). Verdict so far is mixed: it basically works, but has
performance issues, and breaks somewhat more existing code than I'd like. A
plus point is that it better achieves C++ 'const correctness' than the
development trunk.)
0.36-2.13.1The purpose of this phase was to upgrade CXXR to parallel release 2.13.1 of
CR. This includes making bytecode interpretation available in CXXR for the
first time, though not yet in the 'threaded code' implementation (which is the
CR default when using gcc).
The code also now builds correctly when configured with
--enable-memory-profiling. (Thanks to Doug Bates for pointing out
that previously it didn't.) However, the functionality of tracemem
and kindred R functions (untracemem and retracemem)
is currently unavailable in CXXR even when it is configured with memory
profiling enabled.
0.37-2.13.1This release contains only minor changes:
tracemem and kindred functions has been
reinstated.gcc (as in CR).0.38-2.13.1This release clears the decks prior to an upgrade of CXXR to R 2.14.1.
The principal change regards garbage collection. The reference-counted approach to garbage collection primarily used by CXXR can bring speed advantages when dealing with large datasets, but the housekeeping involved in diddling reference counts up and down as required is surprisingly time-consuming, and this is a major contributor to the speed penalty of CXXR compared with CR when dealing with small datasets, a penalty that has grown greater with the advent of the bytecode interpreter. This release incorporates the following changes:
GCNode::gclite()) on every call to GCNode::operator
new. This is still the case if CXXR is built with the preprocessor
variable AGGRESSIVE_GC defined (as is the case in the default
configuration), but otherwise gclite() is invoked only when
the number of bytes allocated has risen by a certain margin (currently
10,000) since the previous call of gclite().GCStackRoot class template are now
in either a non-protecting or protecting state, with newly created
GCStackRoots being non-protecting. Only if a
GCStackRoot is in the protecting state does it increment the
reference count of its target. GCNode::gclite() switches all
GCStackRoots into the protecting state before starting garbage
collection. Taken in conjunction with the first change, this means that
many GCStackRoot pointers will complete their lifecycle
without ever being switched into the protecting state.ProtectStack) and the bytecode
intepreter's node stack, both of which are now implemented using the new
class NodeStack.A side effect of the above changes is that when AGGRESSIVE_GC
is defined, CXXR's garbage collection is even more aggressive than it
was in previous releases, and this has revealed a number of GC-protection gaps
(e.g. in code inherited from CR) that had previously 'slipped through the
net'.
Another significant change is that the CXXR distribution no longer holds the
'Recommended' packages in compressed tar form
(.tar.gz), but instead contains the untarred package directories
themselves. This will make it easier to carry forward any CXXR-specific tweaks
to these packages from one R release to the next. (Such tweaks are rare, and
often due to a latent GC-protection bug in the CR package code.)
0.39-2.14.1The purpose of this phase was to upgrade CXXR to parallel release 2.14.1 of CR. This entailed substantial changes to the bytecode interpreter, both to track changes in CR and to correct errors in the previous CXXR implementation. In the course of preparing this release, numerous GC-protection gaps were discovered in the CR code (including the Recommended packages) and corrected within CXXR.
CXXR's bytecode interpreter does not yet implement the cache of symbol bindings used in CR.