A classic book-length survey about vocabulary testing: the research literature, design considerations, and its usage in education. Many linguistic questions lie at the core of vocabulary testing and don’t have clear-cut answers, like what counts as a word family? (eg: socialize and socializing should probably as a single word, but socialism is quite different). What does it mean exactly to know a word? What do we do about multi-word expressions? Should we be even testing vocabulary at all, or communicative capabilities instead?
Much of the initial research in vocabulary testing originated in research on vocabulary acquisition before being adapted into educational settings, so the book starts here. An effective way to acquire vocabulary is by incidental exposure while reading or listening to the language, although many specific questions are difficult to answer in a controlled experiment. The meaning of a word may sometimes be inferred from context but a lot of the time, there are not enough helpful clues. If learners encounter a lexical gap, they can employ meta-linguistic skills to work around it, so vocabulary knowledge is not the same as communicative capability.
Next, the author moves to research and design considerations on vocabulary testing. The test must decide what words to test and how to assess whether a word is known. Depth of vocabulary is desirable but takes a long time to measure, so practically you can only test a small number of words in depth. Tests are constrained by how much effort is required to grade them, so multiple-choice tests are popular. The design of a test depends on whether the student cares about a high score or not: high-stakes tests need to be a lot more robust to guessing strategies, and educators need to consider whether the washback effects of studying for the test are beneficial or not. Finally, a good test should have evidence for validity by being correlated with other language skill metrics.
The book covers the design of four English tests with different types of questions (Vocabulary Levels, EVST, VKS, and TOEFL). Checklists are good for testing a lot of words quickly, you need to add fake words and penalize checking fake words, but these tend to overly penalize students who check too many words. Matching words to definitions is useful but you need to be careful otherwise students may be able to eliminate choices based on morphology. Cloze tests (fill in the blank) require the student to read a lot of context, thus testing reading comprehension skills in addition to the vocabulary, which may or may not be desirable.
The last section is about metrics for readability and scoring students’ written and spoken output, which is closer to communicative ability than discrete vocabulary items. Automated metrics are challenging to use because there’s no good way to control for length, and for spoken output, it must be transcribed so these are more often used in research settings. In tests, students are usually graded subjectively on several different dimensions.
Overall, a pretty comprehensive survey of issues related to vocabulary assessment. Some drawbacks of the book are that (1) it only considers English as the foreign language, (2) the writing is sometimes unfocused making it hard to tell what point is being made, and (3) it hasn’t been updated in two decades so I wonder if any new results from research has changed the conclusions. I suspect not so much because there are more questions than answers in this field, and a lot of subjective tradeoffs that have to be made.