VOCABULARY AQUISITION SOFTWARE: USER PREFERENCES AND TUTORIAL GUIDANCE

R.E. Cooley

The Computing Laboratory,

The University of Kent at Canterbury,

Canterbury, Kent CT5 1EH, UK

Abstract

What should be the role of AI in computer supported vocabulary acquisition? This paper presents a software system to aid the teaching and learning of vocabulary in the light of this question. It discusses the need to strike a balance between, on the one hand, providing tutorial guidance based on the knowledge and expertise of experienced language teachers, and on the other hand, providing facilities that a learner or teacher can use to control the acquisition process. The software system incorporates a tutoring module, and it has a user interface which is based on computer simulations of flash cards. The tutoring module is able to help either a teacher or learner specify various aspects of the selection, presentation and sequencing of lexical items. The flash card design is extended to allow the user a range of functions that can control the sequence of items and the amount of information that is presented.

Introduction

At an elementary level, learning the vocabulary of a foreign language can strike the learner as being both difficult and time consuming. A desire to ease the learning process has attracted much software development, some of a rather indifferent standard, (Green,D and Meara, P 1995[1]). Moreover, language teachers have not consistently given vocabulary acquisition a prominent place in the syllabus[2]. This is perhaps because the traditional manner of teaching vocabulary by means of lists of words paired with their translations is associated with behaviourism, which now has few if any followers. However, drills, rote memorization and related techniques can still be seen as useful, (N.Schmitt, 1997[3]). Work by Groot compared teaching vocabulary using bilingual lists with presenting words in appropriate contexts using a computer program called CAVOCA (Groot, P.J.M. 2000[4]). The design of this program is based on theories of first language vocabulary acquisition that recognise that several stages may be involved. Three are distinguished: 1)observation, 2) storage and linkage, 3) consolidation. It is assumed that it will benefit learners of foreign languages if they experience the same three stage process. This is operationalised in CAVOCA so that the user spends some two minutes on average learning and using a word before passing on to the next. Groot’s experimental results are interpreted as supporting the staged theory of vocabulary acquisition. But Groot also suggests that a student learning a foreign language may be able to exploit their knowledge of conceptual categories of their own language when extending their foreign language vocabulary. He concludes that “a simple bilingual presentation followed by some rehearsal practice may be more efficient” (Groot, P.M.J. 2000).

Groot’s theory of learning and its implementation is not unique. Goodfellow’s system “Lexica” has the same underpinning (Goodfellow, R 1995[5]). The system described in this paper, in contrast to CAVOCA and Lexica, does not build on the idea of a staged learning process, and does not attempt to tackle the problem of conceptual differences between languages. A range of functions are provided by the system, which contains a database of lexical items and has an interface modelled on flash cards. There is a tutorial module, which is primarily concerned with the selection of items to be learnt and the pacing of subsequent learning.

Flash cards, made of light cardboard with, at a minimum, a single word from the target language on one side and its translation into the user’s first language on the other, are simple, cheap and widely used as an aid to rote learning. It is very easy to write a simple program to simulate flash cards on a computer, and it must be this sort of software that Mark Warschauer is criticising in his brief sketch of the history of Computer-Assisted Language Learning. He calls it behaviourist “drill and practice or …. drill and kill” software, (Warschauer, 1996[6]). In this context of the first generation of CALL software, the word “tutor” is used to describe the role of the computer since it unflaggingly leads the learner through a drill. Of course, the computer is here only the delivery mechanism, it is simulating pieces of cardboard rather than a human tutor.

Though not the focus of professional attention, flash card software has evolved, though slowly, in recent years. A good account of this evolution can be traced in Goodfellow’s review, (Goodfellow,T 1995[7]). The most significant innovation has been the addition of sound. It is no longer necessary for learners to struggle with the rewind button on tape recorders and CDs to hear repetitions of a word or phrase. They just have to click a button on the VDU screen. Voice input, could also be included in flash card software. It is already found in a range of language tutoring software. This feature allows a comparison to made between the learners’ efforts to pronounce a word and a model pronunciation. From an AI-ED perspective, perhaps the most interesting innovation has been the incorporation of automatic revision strategies within flash card programs, (Zhao J. et al. 1998[8], Houser C. et al. 2000[9], Wozniak, P[10]). The strategies are not based on ideas of language teaching, but on early psychological studies of memory (Ebbinghaus, H 1911)[11].

The work described in this paper is an attempt to augment the flash card program with a tutoring module. It builds on both recent results in language learning and developments in the application of AI to education. It also reflects current thinking in the design of computer interfaces. Work by Schneiderman has shown the potential advantages of systems that allow direct manipulation of data as opposed to systems that change or adapt in use, (Schneiderman, B. 1997[12]). The intensity of a learner’s motivation, which is a difficult factor to measure in experimental or indeed other settings, is more likely to be enhanced by a learning process which places the user in control, than by one which, though adaptive, may well from time to time strike the user as inconsistent, perverse or unhelpful.

Direct Manipulation

User preference can be interpreted as aspects of learning strategies. Language learning strategies have been usefully reviewed both in general and in connection with the development of “intelligent CALL (ICALL)” systems by Susan Bull, (Bull, S. 1997[13]). But as well as facilitating users’ natural desires to control their own learning, it seems civilised to avoid arbitrary restrictions, and at a minimum allow the users as much manipulative freedom with computer mediated flash cards as they would have with a cardboard version. This view motivates:

· the need to permit sub-sets of a pack to be created,

· for the order of presentation of the target language word and the word in the users’ first language to be changed,

· for the audio content to be suppressed or activated,

· for the order of the cards within the pack to be changed by reversing it, or by shuffling the cards in a random way or for it to remain constant,

for the cards to be annotated by the user.

Freedom to make textual annotations on cards is constrained by the format of the display. Pictorial annotations are currently not available. It would be in keeping with the design philosophy of the system to permit their inclusion, though concern with space limitations argues against the inclusion of arbitrary image files in the database.

These facilities succeed in allowing the users to implement those strategies that would be possible with cardboard flash cards. It seems natural to augment these features with those that might reasonably be expected in any PC application: a count of the number of cards, the “current” position in the pack, the number of cards whose words have been learnt, and those that have been viewed but which the learners feel the need to revise.

Stevick, though not an enthusiast for Flash Cards, makes two creative suggestions for their use, (Stevick, E.W. 1982[14]). As well as recommending that learners annotate their cards, he suggests a strategy for dealing with words that learners find difficult. Rather than just replacing the card bearing the difficult word in its original place in the sequence, he suggests advancing its position in the pack so that it will be re-encountered after just a few intervening words. Although this could be implemented in a straight forward fashion, it is rejected in favour of another approach. The user may specify the number of items on which they wish to concentrate. The group of items so specified may be viewed repeatedly, perhaps until mastered. Then a further group of the same size becomes the focus of attention. A related facility is the option to record whether or not a user “knows” an item. When the translation is presented, the user may change this recorded assessment. At any stage the user may opt to be retested on those words recorded as being “unknown”. Using cardboard flash cards, the same effect is achieved by separating the known from the unknown cards. The main advantage of the computerised facility is that the information can be stored and accessed independently of the physical arrangement of the items.

A similar feature that a user might also wish to control is the amount of information that they can viewed about individual items. Houser et al (2000) stress the value of single word translations of the Kanji, but there are obvious linguistic objections to this practice. In many cases there would be a strong desire to record synonyms and phrasal examples on a flash card. This is the style that McNaughton and Li adopted in their book, which has a very close similarity to a pack of flash cards: the characters are printed on the left hand side of flash card sized panels, and the translation is given on the right (McNaughton and Li Ying, 1999[15]). The amount of information made available originally is determined by the author of the vocabulary. However, the results of Laufer and Hill’s study of the use of CALL dictionaries indicate that “different people have different lookup preferences and that the use of multiple dictionary information seems to reinforce retention” (Laufer, B. and Hill, M. 2000[16]). To cater for this, there is provision for a URL to be recorded with every lexical item. This can be used for several purposes including accessing dictionary entries and web pages that present a lexical item in context. The following screen dump of the interface illustrates the user interface. The buttons below the display area are used for learners’ self assessment both before and after the presentation of a translation.

Figure 1: The user interface

Tutorial strategies

For a single learning session, it is necessary to select the items to be learnt from some syllabus. In the absence of a reliable model of vocabulary acquisition, (Meara, P 1997), the best than can be done is to enable teachers and learners to devise programmes that match their own needs. The selection may be defined by some external requirement, possibly by a teaching strategy or by the need to fit in to the strategy of an adopted text book. However, if the selection is not defined externally, it may be advisable to choose items for learning up to a predetermined workload. The general approach is to select items from externally specified thematic categories in accordance with specified quotas, and ranking principles.

Item difficulty

The notion of a workload implies a weighting system that recognises that some items will be found to be more difficult than other. This is in accord with the work of de Groot AMB and Keijzer, R (2000)[17], who found cognate and concrete words were both easier to learn and less likely to be forgotten than abstract and non-cognate words. They also found that word frequency, (i.e. the prominence of words in frequency of occurrence lists), had no effect. Items in the flash card system are allocated a default category of either “easy”, “middling” or “difficult”. The person carrying out the classification might choose a category based on de Groot and Keijzers criteria or upon individual experience.

State of learning

The user interface of the system allows users to record their responses to an item. This is either (a) they “know” the item at the level of detail at which it is presented, (b) the do not “know” the item, these items are said to be “seen”. This information is recorded along with a time stamp. The user can specify that a selection of items can contain a quota, specified as a percentage, of items which are “seen” and of items tagged “known” which are to be revised.

Topic and Utility

The selection of items may be prioritised on the basis of their expected utility to the learner. Items are classified as “essential”, “central” and “peripheral”, and they are also classified by thematic content. Although the centrality of a word might well depend on the theme of discourse under consideration, the size of the task of specifying a utility category for every word in every thematic area is too large to be contemplated. Moreover, it seems sensible to use the utility category as a proxy for frequency, but also to recognise that particular syllabus needs may make it desirable to study some items that have a very low frequency of occurrence.

Level of presentation

Learning lexical items is complicated, depending on the language, by a range of factors such as polysemy and morphological variation not all of which are amenable to presentation on flash cards. The system has no specific mechanism for handling such complexities, but it does provide two general presentation devices which may be used. Firstly, as mentioned above, provision is made to enable URL’s to be linked with lexical items. Secondly, items may be presented in a staged fashion. Currently, the system uses two levels; and items have to have the status “known” at the first level before they can be presented at the second. Second level presentations have their own “item difficulty” category, which can be used to control the priority of selection.

Collocation and Semantic Fields

Collocation, semantic fields, antonyms and similar relationships that exist between a lexical item and other items in the syllabus vocabulary are handled by a uniform mechanism. Any item may be paired with a list of other items, without any restriction. During selection, the inclusion of an item increases the desirability of including any other item from its associated list. Currently, this as implemented as a small fixed percentage increase. This percentage is chosen so that preference is only effective for items with the same utility category.

Selection

The user may fully and explicitly determine the items which are presented for learning. However, in many cases users will benefit from tutorial guidance in selecting items to be studied; and in almost all cases will benefit from the pre-categorisation of the lexical items. To take advantage of this guidance, a user will need to specify a workload in terms of the equivalent number of lexical items with unit weight. The following selection constraints must be specified:

The topics to be represented,

The percentage of “known” items to be included,

The percentage of “seen” items to be included,

The percentage of easy, middling and difficult items to be included.

The selection algorithm enforces user specified priorities for “Level of difficulty” and “Utility”. The specification is a list such as the following:

“essential”, “easy”, “middling”, “central”, “peripheral”

Which is interpreted as meaning:

Select all the “essential” words, and from what remains,

next select the “easy” words, and from what remains,

next select the “middling” words, etc.

At any of the stages the collocation weighting influences the process.

Database Preparation

The system is designed to be used with the collaboration of an experienced teacher who specifies the syllabus and is responsible for the categorisation of individual items. The teacher also has the task of specifying the priority list and recommending appropriate selection constraints. This enables the workload of every learner to match their own level of knowledge as well as complying with any collective strategy of the teacher. The detailed ordering of items within a study session is under the control of the learner.

Conclusion