Evaluation of student work  
 

 

The University of Glasgow assessment code requires that assessment items for which ‘true’ numerical marks cannot be given should be marked in “bands”. ‘True’ numerical assessment means that there is a clear achievement difference between consecutive marks, so that getting 45/50 means something different from getting 44/50 or 46/50 (examples include multiple choice questions, and programs assessed against a suite of test cases). Items where numerical marking is inappropriate include essays, design exercises and presentations – in this case it is hard to say that there is a clear achievement difference between getting 44/50 and 45/50. These assessment items must, under the university guidelines, be marked in grades with secondary bands. There are 22 bands:

A1 A2 A3 A4 A5 B1 B2 B3 C1 C2 C3 D1 D2 D3 E1 E2 E3 F1 F2 F3 G1 G2 H

I welcomed this change when it was introduced five years ago, but it presented us with a new problem: how to create a marking scheme for an assessment item that is to be marked directly in bands. For numerical assessment, we can associate marks for every evident unit of achievement and add them all up at the end; we cannot do the same with bands. In addition, providing feedback on how numerical marks have been awarded is easy as students can see where they have gained and lost marks. Giving students simply a single band for their work does not help them understand where it was deficient.

The attached ‘bundle’ shows the method I have used for such non-numerical assessment in HCI classes, through the use of a matrix of criteria and achievement levels. This method allows for the recording of students’ achievement in each of the important criteria, and gives students useful feedback on how their overall band has been determined. (See  http://www.cs.kent.ac.uk/national/EPCOS/essay.pdf) for a description of bundles and their motivation).

Bundle Title: Who needs numbers anyway?

Problem Statement: It is difficult to mark a piece of assessment directly into grades and secondary bands.

The Bundle: Assessing a piece of assessment for which numerical marks are inappropriate.

The way it works is: A matrix is drawn up, with the important criteria for the assessment item as the column headings. If necessary, a proportional weight is associated with each criterion. There are eight rows, each associated with a grade, A through H. Each cell contains a description of the performance expected for the criterion in the column for the grade in the row; for example, a B level of achievement for the ‘Presentation Skills ’ criterion might be “Fluent, easy to understand ”. While marking the submitted assessment item, the marker places tick in each column, indicating the extent of achievement for each criterion. Ticks may be placed at the bottom of the cell (so as to represent that the criterion has only just reached that level of achievement), or at the top (so as to indicate that the student’s work nearly made it to the next grade). At the end of the assessment process, the marker can write general comments below the matrix. Once all criteria have been assessed, an overall band can be awarded by observation of the pattern of ticks. This is not a computational process: it is done by observation, consideration of criteria weighting, and the marker’s judgement.

It works better if: The students have been given the assessment matrix in advance.

It doesn't work if: There are more than eight or so important criteria, as this can make it difficult to come up with an overall judgement at the end. If there are many criteria, it might be possible to create a numerical marking scheme.

Solution Statement: Using achievement descriptors for each grade and for each criterion enables systematic non-numeric marking, and provides useful feedback.

 

An example of this bundle in practice is shown below, for an HCI design and evaluation assignment at the MSc level (with all criteria of equal weight):

 

Evidence of iterative process

Design documents

Prototypes

Evaluation methods

Design process

Introduction, conclusion, reflective discussion

A

At least three complete iterations, and all iterations completely documented

Sufficient for prototype development

Clearly adequate for the purposes of evaluation

Appropriate choice of methods, well conducted, results clearly stated and discussed

Appropriate, complete, and thoughtful use of evaluation results

Insightful, addressing several relevant issues

B

At least three complete iterations, and all iterations completely documented

Mostly sufficient for prototype development

Mostly adequate for the purposes of evaluation

Appropriate choice of methods, mostly well conducted, results clearly stated and discussed

Appropriate and complete use of evaluation results

Reasonable attempt, but omitting some important points

C

Two iterations, or three iterations incompletely documented

Almost sufficient for prototype development

Almost adequate for the purposes of evaluation

Mostly reasonable choice of methods, mostly well conducted, results unclear

Appropriate use of some of the evaluation results

Some relevant issues highlighted and discussed

D

Two iterations, and iterations incompletely documented

Not sufficient for prototype development

Not adequate for the purposes of evaluation

Mostly reasonable choice of methods, poorly conducted, results unclear

Inappropriate or incomplete use of evaluation results

Limited reflective discussion

E,F,G

Less than two iterations, and iterations incompletely documented

Not sufficient for prototype development

Not adequate for the purposes of evaluation

Inappropriate choice of methods, poorly conducted, results unclear

No obvious consideration of evaluation results

No appropriate reflective discussion

H

No iterations

No obvious effort

Wholly inadequate

No evaluation

No obvious effort

No reflective discussion

 

Note that the requirements for A and B under "Evidence of iterative process" are identical; this is deliberate, and does not cause a problem. It means that no more than the three iterations that were requested in the assignment specification are expected for at least a B grade: more iterations will not increase the grade for this criterion, but better documentation will. In practise, it gives the marker a greater range with which to assess the extent of achievement with respect to documentation.

Below is an example of one of these matrices in use. In this case it was used for evaluating student presentations in a General Readings course, when the students chose their own topic for their presentation. [Note: although the copy quality is unfortunately poor, this example does help to illustrate the overall idea.]

Students typically welcome the feedback that is provided by these matrices, but they are often unsure how to use them in advance of submission. The matrices are very easy to use when marking. It can be slightly tricky to determine an overall band if the criteria weighting vary greatly, but this is not usually the case. This method has been adopted by several colleagues in my department.