Working Group Proposal – How Shall we Assess This?
working group proposal outlined here addresses two of the three themes of the
ITiCSE 2003 conference: the use of technology in supporting computer science
teaching and learning, and the practice of teaching computer science.
We propose to investigate the
assessment practices, particularly the use of automated assessment, in Computer
Science education. This includes determining the scope and usage of automated
assessment techniques in educational institutions worldwide; the educational
soundness of the techniques used, and the impact of these techniques on areas
including student achievement, plagiarism, and staff workloads.
With increasing class sizes in educational establishments worldwide, the practice of assessment is becoming a problematic issue; increased numbers make it more difficult to assess student attainment. If assessments are graded manually, educators must either set fewer assessment tasks or resign themselves to a greatly increased marking load. In order to cope with increasing student numbers automated assessment is becoming increasingly important in many courses. The number of papers related to the topic that have been presented at ITiCSE conferences in recent years [e.g. 7, 10, 11, 13, 19, 20] reflects this increasing interest. Automated assessment saves time and human resources but its adoption must be pedagogically sound. Current research suggests that students initially prefer to be taught by a human, finding a machine too impersonal and a disincentive to learning, but that once the initial stages are completed a machine is an acceptable teacher .
All assessments should follow sound educational principles and the most widely adopted epistemology within the CS arena is that of constructivism. Constructivist principles of educational development suggest that:
· Students are active participants in the process of their own learning.
· All learning takes place within a context – usually the classroom – where shared meanings and understandings can be created.
· Students require time to reflect upon the work that they are doing.
· They require the space to be allowed to make mistakes and to learn from these mistakes.
Ben-Ari  notes that learning should be active, not passive;
students are being called upon to build mental models of abstract conceptions
of how computers work, the nature of variables in programming, and so on.
Computer Science in particular is a deeply practical subject, and providing as
much opportunity for practical work as possible will presumably help to develop
students’ understanding of the principles behind the subject, as long as this
work is undertaken in concert with human assistance to overcome misconceptions
and refine mental models. The downside
of this, for educators teaching large numbers of students, is that each piece
of practical work needs to be marked.
Plagiarism is an increasing problem within all academic disciplines. A recent UK survey suggests that the incidence is actually much higher than many academics realize . Whilst the interviewing of all students to ensure that they can reproduce the work they submit  may be practical for small numbers of students producing a few pieces of work, it does not scale to large numbers of students producing regular weekly solutions.
On-line plagiarism detectors such as JPLAG  and MOSS  already rely upon electronically submitted work, and it is logical to consider automated marking of such submissions. Another potential benefit of automated assessment techniques is that they can assist in avoiding plagiarism by presenting students with randomly chosen problem sets or individualized exercises. Although students can collaborate on solving the problems they have been set, they are no longer able to submit verbatim copies of solutions obtained by others.
Computer Science as a subject area is well positioned to benefit from automated assessment. Those who teach the subject can often use their expertise to develop systems that help to reduce their workload without compromising student learning.
Several systems have been reported which deal with assessment on programming courses, for both coursework and formal examinations. Multiple-choice tests have been widely used ; other work has included systems for assessing work in diagrammatic form , for automated assessment of web pages produced in a web design course , and some keyword-based free text marking systems . Other work has spanned the spectrum from formative to summative assessment systems, from fully-automated to partially-automated marking systems.
Although there are several commercially available systems that support some form of automated assessment (e.g. Blackboard and WebCT ), these tend to be limited in scope to fairly simple multiple-choice type tests. Systems developed at educational institutions (e.g. CourseMaster  and BOSS ) are often more flexible in the range of assessment types they perform, but are not in such widespread use. There are also a number of in-house systems that are only used at the institution where they were developed and which have not reached the stage where they could be distributed more widely.
The four major stages involved in addressing the aims of the proposal are outlined here. We need to determine what is actually current practice regarding assessment techniques and locate it within educational theory. Once this is established we will be in a position to suggest a set of internationally applicable guidelines for educators setting automated CS assessments.
Any work in this area must be underpinned by educational theory and must aim to conform to, or further refine, what is deemed to be ‘best practice’ in the area of assessing students’ work. We will undertake a literature review of the current status of theory and practice. This will enable us to locate the remainder of the work.
This will be obtained on two levels. Firstly, a detailed picture of the assessment systems currently used within our own institutions. This will encompass interviews with academics that set assessments; an analysis of which aspects of Bloom’s taxonomy  are incorporated in the assessments that are set (recent work suggests that Bloom’s liberal arts taxonomy fits very badly with CS work ); a statistical analysis of correlations between results obtained by students undertaking different courses which adopt different assessment approaches; and the prevalence of plagiarism within courses adopting different practices [4, 6, 9].
Secondly, we will create a web-based questionnaire to collect similar but less in depth data from CS academics from different institutions in different countries. The questionnaire will be publicized widely by the participants e.g. LTSN-ICS mailing list in the UK.
The analysis of the data collected in stage 3.2 combined with the principles suggested in 3.1 will form a basis for the principles that will be created here. It has been argued, for example, that the majority of university level courses offer a similar experience to each student taking them . This practice is complicated by the increasingly widely differentiated past experiences that students bring to university. Any assessment system should not disadvantage any identifiable portion of the cohort; it is well documented that male and female students pursue electronic debates in subtly different fashions , so designing an assessment that rewards typical male behavior and punishes (by lack of marks) typical female behavior  is to be avoided.
The two major factors to consider when providing guidelines for automated assessment are previous experience and curricular issues:
What areas of the curriculum are
existing systems used for? What are they most suitable for? What are their
· What areas of existing curricula are not targets for automatic assessment at present, and could existing automatic assessment techniques help in these areas?
· To what extent does automated assessment help to deal with the problems of plagiarism in different areas of the curriculum?
Information relating to each of these points will emerge from the data collected in stage 3.2 and will form the basis for the guidelines, within the framework of sound educational practice.
The major output will be a survey of, and guidelines for the use of, automated assessment within CS teaching and learning, although several minor (but nevertheless important) findings will also emerge. The envisaged outputs are:
· A taxonomy of CS assessment
· Guidelines for assessing CS materials
· Guidelines for the use of automated assessment within CS
· The provision of sound pedagogic principles for the adoption of appropriate assessment methods for the scenario
· A picture of the correlation between a student’s understandings of the different topic areas taught within a CS degree program.
These outputs will be presented within the working group report that will be produced at the conference and on a website dedicated to the work of the group. Appropriate links to currently available materials and sources will be provided. Visitors to the website will be encouraged to suggest examples of good or bad automated practice that they have encountered within the classroom to add to the current body of knowledge. It is envisaged that the website will eventually become a first port of call for any CS academic wishing to improve, alter or adopt automated assessment practices.
Back to working group main page.