Chapter 1 Introduction to English Language Testing
Whether
we realize it or not, we test every day in virtually every cognitive
effort we make. When we read a book, listen to the news on TV, or
prepare a meal, we are testing hypotheses and making judgments. Anytime
we “try” something – a new recipe, a different tennis racquet, a new
pair of shoes – we are testing. We are formulating a judgment about
something on the basis of a sample of behavior. The foreign language
learner is testing his newly acquired forms of language almost every
time he speaks. He devises hypotheses about how the language forms are
structured and how certain functions are expressed in forms. On the
basis of the feedback he receives, he makes judgments and decisions.
Language teachers also test, informally and intuitively, in every
contact with learners. As a learner speaks or writes or indicates either
aural or reading comprehension, the teacher makes a judgment about the
performance and from that judgment infers certain competence on the part
of the learner. Classroom-oriented informal testing is an everyday and
every common activity in which teachers engage almost intuitively.
1.1 TESTING AND TEACHING
A
language test which seeks to find out what candidates can do with
language provides a focus for purposeful, everyday communication
activities. Such a test will have a more useful effect on the learning
of a particular language than a mechanical test of structure. In the
past event good tests of grammar, translation or language manipulation
had a negative and even harmful effect on teaching. A good communicative
test of language, however, should have a much more positive effect on
learning and teaching and should generally result in improved learning
habits.
1.2 WHAT IS A TEST?
A test, in plain words, is a method of measuring
a person’s ability or knowledge in a given domain. The definition
captures the essential components of a test. A test is first a method.
It is a set of techniques, procedures, and items that constitute an
instrument of some sort that requires performance or activity on the
part of the test-taker (and sometimes on the part of the tester as
well). The method may be intuitive and informal, as in the case of a
holistic impression of someone’s authenticity of pronunciation. Or it
may be quite explicit and structured, as in a multiple-choice technique
in which correct responses have already been specified by some
“objective” means.
Next,
a test has the purpose of measuring. Some measurements are rather broad
and inexact, while others are quantified in mathematically precise
terms. The difference between formal and informal assessment exists to a
great degree in the nature of the quantification of data.
A test measures a person’s ability
or knowledge. Care must be taken in any test to understand who the
test-takers are. What is their previous experience and background? Is
the test appropriate for them? How are scores to be interpreted for
individuals?
Finally, a test measures a given domain.
In the case of a proficiency test, even though the actual performance
on the test involves only a sampling of skills of a language. Other
tests may have more specific criteria. A test of pronunciation might
well be a test only of a particular phonemic minimal pair in a language.
One of the biggest obstacles to overcome in constructing adequate tests
is to measure the desired criterion and not inadvertently include other factors.
1.3 WHY TEST?
The
function indicated in the preceding paragraph provides one of the
answers to question; why test? But it must be emphasized that the
evaluation of student performance for purposes of comparison or
selection is only one of the functions of a test. Furthermore, a good
classroom test will also help to locate the precise areas of difficulty
encountered by the class or by the individual student. Just as it is
necessary for doctor first to diagnose the patient’s illness, so it is
equally necessary for the teacher to diagnose the student’s weakness and
difficulties. Unless the teacher is able to identify and analyze the errors a student makes in handling the target language.
The
test should also enable the teacher to ascertain which parts of the
language program have been found difficulty by the class. In this way
the teacher can evaluate the effectiveness of the syllabus as well as
the methods and materials he or she is using. The test result may
indicate, for example, certain areas or the language syllabus which have
not taken sufficient account of foreign learner difficulties or which,
for some reason, have been glossed over. A test which sets out to
measure students’ performance as fairly as possible without in any way
setting traps for them can be effectively used to motivate them.
1.4 WHAT SHOULD BE TESTED AND TO WHAT STANDARD?
Before
a test is constructed, it is important to question the standards which
are being set. What standards should be demanded of learners of a
foreign language? For example, should foreign language learners after a
certain number of months or years be expected to communicate with the
same ease and fluency as native speakers.
Examinations
in the written language have in the past artificial standards even for
native speakers have often demanded skill similar to those acquired by
the great English essayists and critics. In imitating first language examinations
have proved far more unrealistic in their expectations of the
performances of foreign learners, who have been required to write some
of the greatest literary masterpieces in their own words or to write
original essays in language beyond their capacity.
1.5 TESTING THE LANGUAGE SKILLS
Four
major skills in communicating through language are often broadly
defined as listening, listening and speaking, reading and writing. In
many situation, where English is taught to perform as many genuinely
communicative tasks as possible. Where it is important for the test
writer to concentrate on those types of test items which appear directly
relevant to the ability to use language for real-life communication,
especially in oral interaction. Thus, questions which test the ability
to understand and respond appropriately to polite requests, advice,
instructions, etc. would be preferred to test of reading aloud or
telling stories. In the written section of a test, questions requiring
students to write letters, memos, reports and messages would be used in place many of the more traditional composition used in the past. In listening and reading tests, questions in which students show their ability to extract specific information
of a practical nature would be preferred to questions testing the
comprehension of un important and irrelevant details.
Ways of assessing performance in the four major skills may take the form of tests of:
- listening (auditory) comprehension, in which short utterances, dialogues, talks and lectures are given to the testees;
- speaking
ability, usually in the form of an interview, a picture description,
role play, and a problem-solving task involving pair work or group work;
- reading
comprehensions, in which questions are set to test the students’
ability to understand the gist of a text and to extract key information
on specific points in the text; and
- writing ability, usually in the form of letters, reports, memos, messages, instructions, and accounts of past events, etc.
it is the test constructor’s task to assess the relative importance of
these skills at the various levels and to devise an accurate means of
measuring the student’s success in developing these skills.
1.6 TESTING LANGUAGE AREAS
In an attempt to isolate the language areas learnt, a considerable number or tests include section on:
- grammar and usage
- vocabulary (concerned with word meanings, word formation and collocations);
- phonology (concerned with phonemes, stress and intonation)
1.6.1 Test of grammar and usage
These tests measure students’ ability to recognize appropriate grammatical forms and to manipulate structures.
Although it (1) …….
quite warm now. (2) …… will change later today. By tomorrow morning, it
(3) ……… much colder and there may even be little snow …. (etc.)
(1) A. seems B. will seem C. seemed D. had seemed
(2) A. weather B. the weather C. a weather D. some weather
(3) A. is B. will go to be C. is going to be D. would be
Note
that this particular type of question is called a multiple-choice item.
The term multiple-choice item is used because the students are required
to select the correct answer from a choice of several answers. The word
item is used in preference to
the word question because the latter word suggests the interrogative
form; many test items are, in fact, written in the form of statements.
1.6.2 Test of vocabulary
A
test of vocabulary measures students’ knowledge of the meaning of
certain words as well as the patterns and collocations in which they
occur. Such a test may test their active vocabulary (the words they
should be able to use in speaking and in writing) or their passive
vocabulary (the words should be able to recognize and understand when
they are listening to some one or when they are reading) obviously, in
this kind of test the method used to select the vocabulary items(=sampling) is of the outmost importance.
In the following item students are instructed to circle the letter at the side of the word which best completes the sentence.
Did you …….. that book from the school library?
A. beg B. borrow C. hire D. lend E. ask
In another common type of vocabulary test students are given a passage to read and required to replace certain words at the end of the passage with their equivalents in the passage.
1.6.3 Test of Phonology
Test items
designed to test phonology might attempt to assess the following such
skills; ability to recognize and pronounce the significant sound
contrasts of a language, ability to recognize and use the stress
patterns of a language, and ability to hear and produce the melody or
patterns of the tunes of a language (i.e. the rise and fall of the
voice).
In the following item, students are required to indicate which of the three sentences they hear are the same;
Spoken:
Just look at that large ship over there.
Just look at that large sheep over there.
Just look at that large ship over there.
Although
this item, which used to be popular in certain tests, is now very
rarely included as a separate item in public examinations, it is
sometimes appropriate for inclusion in a class progress or achievement
test at an elementary level. Successful performance in this field,
however, should not be regarded as necessarily indicating an ability to
speak.
1.7 RECOGNITION AND PRODUCTION
Methods of testing the recognition of correct words and forms of language often take the following form in tests:
Choose the correct answer and write A,B,C, or D
I’ve been standing here ……… half an hour.
A. since B. during C. while D. for
This
multiple-choice test item tests students’ ability to recognize the
correct form: this ability is obviously not quite the same as the
ability to produce and use the correct form in real-life situations.
However, this type of item has the advantage of being easy to examine
statistically.
If the four choices were omitted, the item would come closer to being a test of production:
Complete each blank with the correct word.
I’ve been standing here …….. half an hour.
Students
would then be required to produce the correct answer (=for). In many
cases, there would only be one possible correct answer, but production
items do not always guarantee that students will deal with the specific
matter the examiner had in mind (as most recognition items do).
A
good language test may contain either recognition-type items or
production-type items, or a combination of both. Each type has its
unique functions, and these will be treated in detail later.
1.8 AVOIDING TRAPS FOR THE STUDENTS
A
good test should never have constructed in such a way as to trap the
students into giving an incorrect answer. When techniques of error
analysis are used, the setting of deliberate trap or pitfalls for unwary
students should be avoided many testers, themselves, are caught out by
constructing test items which succeed only in trapping the more able
students. Care should be taken to avoid trapping students by including
grammatical and vocabulary items which have never been taught.
In
the following example, students have to select the correct answer (C),
but the whole item is constructed so as to trap them into making choice B
or D. When this item actually appeared in a test, it was found that
the more proficient students, in fact chose B and D, as they had
developed the correct habit of associating the tense forms have seen and have been seeing with since and for.
When I met Tim yesterday, it was the first time I ………. him since Christmas.
A. saw B. have seen C. had seen D. have been seeing
To
summarize, all tests should be constructed primarily with the intention
of finding out what students know – not of trapping them. By attempting
to construct affective language tests, the teacher can gain a deeper
insight into the language he or she is testing and the language learning
process involved.
1.9 KINDS OF TEST AND TESTING
This
we use test to obtain information. The information that we hope to
obtain will of course vary from situation to situation. It is possible,
nevertheless, to categorize tests according to a small number of kinds
of information being sought. This categorization will prove useful both
in deciding whether an existing test is suitable for particular purpose
and in writing appropriate new tests where these are necessary. The four
types of test which we will discuss in the following sections are:
proficiency test, achievement tests, diagnostic tests, and placement
tests.
1.9.1 Proficiency tests
Proficiency tests are designed to measure people’s ability in a language regardless of any training
they may have had in that language. The content of a proficiency test,
therefore, is not based on the content or objectives of language courses
which people taking the test may have followed.
In the case of some proficiency tests, ‘proficient’ means having sufficient command of the language for
a particular purpose. An example of this would be a test designed to
discover whether someone can function successfully as a United Nations
translator. Another example would be a test used to determine whether a student’s English is good enough to follow a course of a study at a British University. Such a test may follow courses in particular subject areas.
Despite
differences between content and level of difficulty, all proficiency
tests have in common the fact that they are not based on courses that
candidates may have previously taken.
1.9.2 Achievement tests
Most
teachers are unlikely to be responsible for proficiency tests. It is
much more probable that they will be involved in the preparation and use
of achievement tests. In contrast to proficiency tests, achievement
tests are directly related to language courses, their purpose being to
establish how successful individual students, groups of students, of the
courses themselves have been in achieving objectives. They are of two
kinds final achievement tests and progress achievement tests.
Final
achievement tests are those administered at the end of a course of
study. They may be written and administered by ministries of education,
official examining boards, or by members of teaching institutions.
Clearly the content of these tests must be related to the courses with
which they are concerned, but the nature of this relationship is a
matter of disagreement amongst language testers.
Progress
achievement tests as their name suggests, are intended to measure the
progress that students are making. Since ‘progress’ is towards the
achievement of course objectives, these tests too should relate to
objectives. But how? One way of measuring progress would be repeatedly
to administer final achievement test, the (hopefully) increasing scores
indicating the progress made. This is not feasible, particularly in the
early stages of a course. The alternative is to establish a series of
well-defined short term objectives. These should make a clear
progression towards the final achievement tests based on course
objectives.
1.9.3 Diagnostic tests
Diagnostic
tests are used to identify students’ strengths and weakness. They are
intended primarily to ascertain what further teaching is necessary. At
the level of broad language skill is reasonably straightforward. We can
be fairly confident of our ability to create tests that will tell us
that a student is particularly weak in, say, speaking as opposed to
reading in a language. Indeed existing proficiency tests may often prove
adequate for this purpose.
We
may be able to go further, analyzing samples of a student’s performance
in writing or speaking in order to create profiles of the student’s
ability with respect to such categories as ‘grammatical accuracy’ or
‘linguistic appropriacy.’
1.9.4 Placement tests
Placement tests as their name suggests, are intended to provide information which will help
to place students at the stage (or in the part) of the teaching program
most appropriate to their abilities. Typically they are used to
assigned students to classes at different levels.
Placement tests can be bought, but this is not to be recommended unless
the institution concerned is quite sure that the test being considered
suits its particular teaching program. No one placement test will work
for every institution, and the initial assumption about any test that is
commercially available must be that it will not work well.
1.9.5 Direct versus indirect testing
Testing
is said to be direct when it requires the candidate to perform
precisely the skill which we wish to measure. If we want to know how
well candidates can write compositions, we get them to write
compositions. If we want to know how well they pronounce a language, we
get them to speak. The tasks, and the texts which are used, should be as
authentic as possible. The fact that candidates are aware that they are
in a test situation means that the tasks cannot be really authentic.
Nevertheless the effort is made to make them as realistic as possible.
Direct
testing is easier to carry out when it is intended to measure the
productive skills of speaking and writing. The very acts of speaking and
writing provide us with information about the candidate’s ability. With
listening and reading, however, it is necessary to get candidates not
only to listen or read but also to demonstrate that they have done this
successfully. The tester has to devise methods of eliciting such
evidence accurately and without the method interfering with the
performance of the skills in which he or she is interested.
Direct
testing has a number of attractions. First, provided that we are clear
about just what abilities we want to assess, it is relatively straight
–forward to create the conditions which will elicit the
behavior on which to base our judgements. Secondly, at least in the
case of the productive skills, the assessment and interpretation of
students’ performance is also quite straightforward. Thirdly, since
practice is likely to be a helpful backwash effect.
Indirect
testing attempts to measure the abilities which underlie the skills in
which we are interested. One section of the TOEFL, for example, was
developed as an indirect measure of writing ability. It contains items
of the following kind
At first the old woman seemed unwilling to accept anything that was offered her by my friend and I.
Where
the candidate has to identify which of the underlined elements is
erroneous or inappropriate in formal standard English. While the ability
to respond to such items has been shown to be related statistically to
the ability to write compositions (though the strength of the
relationship was not particularly great), it is clearly not the same
thing.
The
main problem with indirect tests is that the relationship between
performance on them and performance of the skills in which we are
usually more interested tends to be rather weak in strength and
uncertain in nature.
1.9.6 Discrete point versus integrative testing
Discrete
point testing refers to the testing of one element at a time, item by
item. This might involve, for example, a series of item each testing a
particular grammatical structure. Integrative testing, by contrast,
requires the candidate to combine many language elements in the
completion of a task. This
might involve writing a composition, making notes while listening to a
lecture, taking a dictation, or completing a cloze passage. Clearly this
distinction is not unrelated to that between indirect and direct
testing. Discrete point tests will almost always be indirect, while integrative tests will tend to be direct. However, some integrative testing methods such as the cloze procedure, are indirect.
1.9.7 Communicative language testing
Much
has been written in recent years about ‘communicative language
testing’. Discussions have centered on the desirability of measuring the
ability to take part in acts of communication (including reading and
listening) and on the best way to do this. It is assumed in this book
that it is usually communicative ability which we want to test. As a
result, what I believe to be the most significant points made in
discussions of communicative testing are to be found throughout. A
recapitulation under a separate heading would therefore be redundant.