Ideas for Refactorings, August 2003

Simon Thompson

Structural

Swap Arguments: Swap the position of two arguments of a function. Requires all argument lists to be swapped too. Can cause problems with partial applications.
It could be a special case of a Permute Arguments refactoring; on the other hand, all permutations can be given by a sequence of 2-cycles.
Permute Arguments: Permute the arguments of a function. Requires all argument lists to be permuted too. Can cause problems with partial applications.
Curry/uncurry Arguments: A subsequence of n arguments of a function is selected for conversion to an n-tuple. Will require the conversion of all argument lists. Uncurrying can cause problems with partial applications.
Constant becomes an argument: Transform the use of a constant within a definition into an argument, with the constant supplied as the value of the argument at all call sites (outside the definition itself). This is a variant of add argument.
Unfolding (beware!): It's not always possible to unfold in Haskell: concern about overloaded identifiers.
Convert between let and where: Right to left is easier than left to right.
Introduce pattern match over an argument position: For a function with a variable in a particular argument position in all its defining equations, replace the variable by an exhaustive set of patterns over the type of the variable.
Replace pattern matching equations by a single equation containing a case statement: To replace a multi-argument pattern match by a single case expression requires that the function is first uncurried; alternatively, a series of nested case statements, one per argument, will be required.
The reverse is possible only if the case switch is over a variable (or a collection of nested cases of the right form).
Replace guards with if ... then ... else ...: Requires there to be an otherwise case in the guard. No restriction in the reverse direction.
Coalesce equations: This can follow the introduction of field names for an algebraic type; common field names for fields in different summands allow similar behaviour in the different summands to be expressed in terms of the field names. (Example: all the binary constructors of a type of propositions have a left and right-hand argument; in calculating the size of the expression that's all we need to express the calculation.) Note: related to simple/layered data types.
Introduce class and instance: This can be done by identifying a type and a collection of functions over that type.
Introduce overloading: Identify a type, a class definition and a collection of functions over that type which are to become the body of the instance declaration. See also Rename functions as a class instance.
Rename functions as a class instance: Existing functions are identified as providing the instance of a class over a particular type. All their uses have to be modified to use the renamed functions, and the functions that use them will (potentially) have their types generalised, to work over the class whose instance has just been introduced. See also Introduce overloading.
Replace explicit recursion by application of a recursion combinator: This is an example of folding (in the Burstall/Darlington sense) against the application of a higher-order function.
Introduce composed function: The scenario is one in which f and g are defined separately, yet everywhere used in the combination f.g. A new function h = f.g is introduced; further transformations (see below) allow the definition of h directly, without using the definitions of f and g. The inverse is split definition below.
Function/Value level definition: Transform a function-level definition like h = f.g into one with paramter(s): h x = (f.g) x, or the reverse.
Unfold according to a definition: Given the scenario h x = ... f x ... it is possible to unfold the definition of h according to the definition of f, which might perform the pattern matches [] and (x:xs) on x giving the two equations h [] = ... and h (y:ys)@x = ...
Could this be achieved by transforming f's definition into one using case before unfolding the function application f x, and subsequently turning the case split back into a set of pattern matching equations? Answer: probably yes.
Name duplicated sub-expressions: (Composite refactoring) Give a name for the sub expression and identify each of its uses (verifying that they are all identical). This can be given by introduce definition followed by a sequence of foldings.
Introduce HOF: It's common to write a first-order definition and then to make it higher-order by lifting out a particular piece of functionality into a functional argument. An example is given by trans (x:xs) = x+2 : trans xs. Selecting the sub expression x+2, which has a single free variable, will cause the functional parameter to be unary, and to be instantiated to (+2) at all calls of the (original) trans. If the chosen sub-expression has n free variables, then the functional parameter is n-ary. See also Introduce a function with overloading.
Introduce a function with overloading: The common functionality in two definitions fi = ... gi ... is rendered in a single function f = ... g ... and two instance declarations g = gi for a class with g in its interface.
Note that this is an alternative to Introduce HOF where the g becomes a parameter.
Eliminate duplicate function definitions: Two or more functions are identical. Eliminate all but one, renaming all uses of the others to the function that remains.

Modules

Split module: A single module is split into two. Potentially control the interfaces of the two new modules (rather than adopting the default export and import options).
Move a declaration between modules: Will need to amend import and export statements.

Types

Name a type using type: Identify uses of a type in a particular way by making them instances of a type synonym. This has no semantic effect on the program.
Note that type synonyms cannot be the subject of instance declarations.
Name a type using data or newtype: Names introduced in this way change the meaning of the program, since the old and new types are different. The constructor of the type has to be introduced or removed to convert between the original and the new type.
Add/remove field names in a data type: Field names give names to selector functions. If field names are removed, then the corresponding selector functions have to be defined explicitly.
See also Introduce Abstract Data Type.
Add/remove discriminator functions for a data type.: Discriminator functions decide which part of a sum a value belongs to. A canonical naming scheme calls the discriminators isBlah where Blah is the corresponding constructor.
See also Introduce Abstract Data Type.
Add/remove explicit constructor functions for a data type.: That is, add a function mkBlah with the same signature as the constructor Blah.
See also Introduce Abstract Data Type.
Introduce/eliminate Pattern Matching.: Given a concrete data type, introduce field names and discriminator functions, as above.
It is then possible to eliminate pattern matching in favour of selectors and discriminators.
Introduce/eliminate an abstract data type (ADT).: First Eliminate Pattern Matching as above. Then introduce explicit constructor functions to replace constructors.
The interface of the ADT inlcudes selectors, discriminators and constructor functions. Other functions selected by the user; see Migrate Functionality below.
Migrate Functionality through Interface: A function is moved across an ADT interface. A function defined 'outside' the implementation capsule can be moved inside (and then reimplemented in a more efficient way); a function from inside the capsule can be moved outside, if its definition only uses functions in the ADT interface.
Replace function by constructor: A constructor (with a small 'c') function over an algebraic type is replaced by a Constructor. This allows pattern-matching definitions to take account of this particular case, but on the other hand forces all definitions to have this extra case.
Convert between algebraic and existential types.: This trades off pattern matching (on the elgebraic type) against extensibility, à la OOP. Will not work well with binary methods.
This can be seen as a sequence of simpler refactorings, including splitting the sum into its constituent parts, introducing an interface (class) for the functions over the types and so forth.
Simple vs layered data types: Best seen in the context of an example. In representing a type of arithmetic expressions, can either have one expression constructor per operator (simple), or an expression constructor which has an extra field from an enumerated type which represents operators.
Change implementation of an ADT: In the most general sense, this will require that a semantically equivalent implementation is produced. One might implement sets by lists, or other means.
A specific example comes from the memoisation of a value within the implementation of a data type: e.g. an extra field in a tree which records the depth of a tree. This sort of memoisation can be introduced automatically.
Replace a finite set of constants with enumerated type: Using a fixed set of constant values to represent a finite set of cases can be replaced with values from an enumerated type.
Change in type annotation: It is possible to give more specific or more general type annotations for existing definitions. Constraints will be placed by the context and use of the definition in question.
There's also the issue of moving between Int and Integer.
Type change: convert Maybe to List or to Either, or Bool to Maybe.: This conversion forces all definitions over the type in question to be modified.
Type change: convert between tuples and (one constructor) algebraic types; between tuples and (homogeneous) lists.: This conversion forces all definitions over the type in question to be modified.

More complex refactorings ... challenges?

Memoise: Memoise the values of a functions. This would be very neat. equivalent implementation is produced. One might implement sets by lists, or other means.
A specific example comes from the memoisation of a value within the implementation of a data type: e.g. an extra field in a tree which records the depth of a tree.
Deforestation: Remove intermediate lists from lazy functional programs. Note: Olaf Chitil has worked in this area.
Split definition: A function does two things: for instance, it calculates a value and formats it in some way. This refactoring should turn the original function into the composition of the two distinct operations.
Could the place to split be indicated by its type? Could this in some sense be seen as a reforestation transformation?
Introduce Monad: From an instance of computation within a particular monad, introduce an explicitly monadic computation.
Alternatively, it would be possible to introduce a sequentialised (monadic) computation of any particular expression.
Introduce exception handling: This can be introduced instead of using the error function, or it can be introduced ab initio.

Not quite refactorings

Function template

From a given definition of a function, extract a template for defining a new function.
Similarly, could build a template from the form of a data type or types.

Check invariants and pre-/post-conditions

Is this strictly a refactoring? Automatically add checks for certain conditions to every function call, or data invariance checks for an ADT.

Modify data type

Various possibilities here:

Add a constructor (and modify all definitions over the type).
Add a field to (a subset of) the constructors of a data type (and modify all definitions over the type).

Perhaps these are refactorings?

Last modified 9/8/03