Jumat, 28 Januari 2011

Language Testing


There is an essential difference between the traditional “grammar” test for native speaker of English and the kind of structure test appropriate for the foreign learner. Inasmuch as it can generally be assumed that the native speaker of the language has mastered a grammatical system largely or wholly acceptable for informal discourse, “grammar” tests at least on the high school and college levels have usually concentrated on matters of style and diction appropriate for rather formal written English. On the other hand, structure tests for foreign students will have as their purpose the testing of control of the basic grammatical patterns of the spoken language.

The preparation of a structure test should always begin with the setting of a detailed outline of the proposed test content, but the percentage of items to be written around each problem. This outline may have to be modified somewhat on the basis of the results of pretesting, but great care must be taken to ensure that the final form of the best includes abroad range of relevant grammatical problems in proportion which reflect their relative importance.
Selection of the structures to be included in an achievement test is relatively easy inasmuch as the class text can and should be used as the basis for our test. As a rule, the test should include the full range of structures that were taught in the course and each structural type should receive about the same emphasis in the test that it receives in the classroom.

1.   Completion (Multiple-choice). The most common type of multiple-choice structure item presents a context which one or more words are missing, followed by several alternative completions.
2.   Sentence alternative (Multiple-choice). Another item type does away with the item stem altogether simply presents several sentences from which the examinee chooses the acceptable version.
3.   Sentence interpretation (multiple-choice). A third type of structure item presents a stimulus and then asks for an interpretation. This becomes a kind of reading comprehension task in which the crucial clues are structural.
4.   Scrambled sentence (multiple-choice). For the testing of word order, test writers sometimes use the device of the scrambled sentence in which the examinee rearranges a jumbled series of elements so as to form from an acceptable sentence.
As a classroom exercise or informal test on an elementary level, this device probably has  some merit, younger students in particular being intrigued by its puzzle solving aspects. On a more advanced level, however, this item type has several drawbacks. First, it is extremely difficult to compose item of just the right level of difficulty: the problems tend to be very easy unless the sentences are made rather long and complex, in which event the task may become more a test of intelligence than of simple structural control. Secondly, with all but the simplest sentences it’s hard to avoid scrambled word groups that can’t be assembled in a variety of acceptable ways, making the scoring time-consuming when large numbers of papers are involved. And in multiple-choice testing there is the problem of devising a clear and simple way for answers to be recorded on the answer sheet and to be scored but more important than any of the above, it seems doubtful whether anything is really accomplished by the scrambled-sentence technique that can’t be more effectively and economically achieved by other methods.
5.   Completion (supply type). Returning to type 1, we may use the completion item type as a fill-in exercise.
This item type is extremely useful in informal classroom-testing situations. Such items are much easier to prepare than the multiple-choice type, and they require a certain amount of composition on the part of the students. Their disadvantages for a large-scale testing are the same as with all supply types: They are much more time-consuming to score than multiple-choice items and their may be several possible correct answers to some of the items so that difference scorers might judge the some response differently.
6.   Conversation (Supply types). Another popular type of short-answer structure test requires the examinees to convert or transform a series of sentences in a specified manners by changing them from present to past tense, from active to passive voice, from singular to plural, and so forth. The comments given above for items type 5 may be applied to the conversion type as well.

1.      The language of the dialogue should read like spoken English. Common constructions should be employed wherever they would normally occur in speech avoid contraction usually found only formal writing.
2.      The second’s part of the dialogue should sound like a natural response to the first part. Avoid responses that sound like artificial classroom drills.
3.      All this structures should be definitely non-English; care must therefore be taken not to present regional or social variants of English as “wrong” answer.
4.      No distracters should include “errors” which would appear in writing but not in speech.


Item Types
1.      Word sets in isolation. In the simplest form of objective sound discrimination test, the examiners pronounces pairs of words and asks the examinees to indicate whether the two words in each pair are the same or are different.
2.      Words in context. The next step is to insert the minimal pair problems into complete sentences, that is, to use sentences which might be misunderstood because of the examinees’ failure to perceive one phonemic contrast. Sometimes these tests make use of pictures.

General Nature of the Tests
In foreign-language testing, auditory comprehension tests are designed to measure the accuracy with which the subjects are able to decode samples of speech in the largest language. These samples may be one-sentence requests, question, or statement of fact; they may be brief, simulated conversation; or they may be extended stretches of expository discourse.
Use of recording s versus a live voice
In the preparation of formal tests of auditory comprehension, the test maker must decide whether the utterances will be put on tape or record or be delivered “live” by the examiner.

Item Types
1.      Directions requiring action responses. In the testing of young children, an effective test can be constructed using a series of oral directions or instructions eliciting simple action responses.
The advantages of this type of test are, first, that it does not require responses which involve another language skill such as reading or speaking, and, second that the simple listen-and-do formula is one that is easily understood by children and does not call for elaborate general explanations or the manipulation of the usual testing apparatus of pencils and test booklets.
2.      Questions and statements (multiple-choice). For examinees who are able to read easy English sentences, an effective and relatively uncomplicated comprehension test can be constructed using verbal stimuli and printed alternatives. The stimuli consist of short questions and statements which the candidate hears (once) but can’t see. The questions items are to be answered by the selections of the one logical answer from among several printed in the test booklet. The statements are answered by the selection of the one printed alternative which accurately paraphrases the statements heard.
3.      Dialogues (multiple-choice). Another type of auditory test item using oral stimuli and printed alternatives consists of a brief dialogue followed by a comprehension question asked in a third voice. The question is answered by the selection of the correct answer from among several printed in the test booklet.
4.      Lectures (multiple-choice) advanced-level auditory comprehension tests have also been devised to test college applicants’ second language. The value of such a measurement is obvious: foreign students who enroll in an institution where English is the medium of instruction will begin with a serious handicap if they can’t comprehend, and keep up with, lectures delivered in the more formal style academic lecturer and characterized by long stretches of uninterrupted discourse crammed with significant data. The lecture test attempts to realistically as possible the typical lecture situation.
Suggestion for writing items
1.      Both the stimulus and the item choices should sound as much as possible like informal, spoken English (except, of course, in the simulated lectures).
2.      The oral stimulus should include only high-frequency lexical items.
3.      To minimize the reading factor, printed answer choices should be brief (p.eferably five to six words, not as a rule over eight to ten) and lexically and grammatically simple.
4.      When, as in dialogues, a sequence of utterances is being tested, the problem should hinge on an understanding of the relationship of the utterances to one another.


1.      Word counts are usually based on the written language only; therefore, many words that are extremely common in the oral language will receive low frequency ratings in the word lists.
2.      The word lists classify words according to relieve frequencies rather than absolute difficulty.
3.      Word frequency in English does not serve as a good guide to the probable difficulty of lexical items for which there are cognate forms in the foreign student’s native language.
4.      Some of the word lists don’t differentiate among the various meanings of a word.
5.      Unless the word lists are based on very recent surveys of frequency, they are likely to contain items whose status is currently quite different from what it was as the time the data were collected.
6.      Some word lists are based on a sample of written materials quite unlike those which the typical foreign learner of English is likely to have read.

Item Types
1.      Definitions (multiple-choice). What might be called the “classic” type of vocabulary item consist of a test word followed by several possible definitions or synonyms.
2.      Completion (multiple-choice). A second item type places the problem words in context. This item type has the advantage of placing the problem words in a contextual setting, a procedure felt by many to provide a better measure of candidate’s ability.
3.      Paraphrase (multiple-choice). A third method of testing vocabulary, combining elements of two of previously discussed devices is to underlined a word in context and provide several possible meanings.
4.      Paraphrase (supply types). A variation of type 3, requiring a structured short answer supplied by the examinee, is highly useful in informal classroom testing.
5.      Picture (objective). In the testing of children who have not yet reached the reading stage, vocabulary may be measured with picture.

1.      The definitions should be expressed ion simple words readly comprehensible to all examinees.
2.      All the alternative should be on approximately the dame level of difficulty.
3.      Whenever possible, all choices should be related to the same general area or kind of activity.
4.      The choices in each items should be approximately the same length or be paired by length. No single choice should attract attention merely because it looks unlike the others in the set.
5.      Item should be kept free of extraneous spelling problems. That is, no attempts should be made to mislead examinees with structures that look or sound like possible right answers.

