Tests can be powerful instruments of educational policy; thus testing can be seen as a political activity entailing tactics, intrigue and maneuvering within institutions that may not be political, but rather commercial, financial or educational.
Test development is a complex project bound up with myriads of agendas and considerations. In language testing the impression given is that all we have to do is to improve tests on the technical aspects -developing better measuring instruments, design appropriate specifications, commission suitable test tasks, devise suitable procedures for piloting and analysis, train markers, and let the system get on with things. Innovation in tests is subject to exclusively technical issues not social.
Given the importance of tests in society and their role in educational policy, it is not surprising that one area of concern has been the standards of testing. One common meaning of standards is that of the levels of proficiency –what LEVEL you have reached. Another meaning is that of procedures followed for ensuring quality of content and delivering.
In order to establish common levels of proficiency, tests must be comparable in terms of quality, level, and common standards applied to their production. Mechanisms for monitoring, inspecting or enforcing such a code of practice do not yet exist. Therefore teachers, test takers and parents should be skeptical about the validity and reliability of the tests offered to them.
Testing should be viewed as contributing to and furthering language learning. We usually say that tests provide essential
feedback to teachers on how their learners are progressing, but frankly, few tests do. Teacher-made tests on the other hand are poorly designed, provide little meaningful information and serve more as a disciplinary function than a diagnostic one.
At the centre of testing for learning purposes is the question; what CAN we diagnose? Diagnosis is essentially done for individuals not groups. Given what we know or suspect about the variation across individuals on tests, what confidence can we have in our knowledge of which ability or process underlies a test taker’s response to an item? How do we know if we are accurate? We don’t. But now, with the help of technology, we can have detailed item-level scores and responses.
The advantages of computer-based assessment are evident. Not only are they more user-friendly but they are also compatible with language pedagogy. Computer-based testing removes the need for fixed delivery dates and locations normally required by traditional paper-and-pencil-based testing.
Access to large databases of items means that test security can be greatly enhanced, since tests are created by randomly accessing items in the database producing different combinations of items.
Authenticity of Tests
One more thorny issue in testing is authenticity. Authenticity is a long-standing concern in language testing as well as in teaching, with the oft-repeated mantra that if we wish to test and predict a candidate’s ability to communicate in the real world, then texts and tasks should be as similar to the real world as possible.
However authenticity in tests is a debatable concept that’s perceived quite differently across and between various groups of stake holders.
Discrepancies in levels and scores
All testing issues could be explored and better understood if testing organisations could reveal what criteria they apply to aligning their exams to the CEFR. Yes, they all say that their exams are aligned to the CEFR but who checks? And how can discrepancies in test scores and pass rates be justified if the level is the same? This is the only way to achieve normal distribution. Normal distribution of what? Not just of test scores, but also of power, of income and wealth, of access and opportunities, of expertise and of responsibilities.
According to the Common European Framework for Languages developed by the Council of Europe language users at the C2 proficiency level:
Can understand with ease virtually everything heard or read. Can summarise information from different spoken and written sources, reconstructing arguments and accounts in a coherent presentation. Can express [themselves] spontaneously, very fluently and precisely, differentiating finer shades of meaning even in more complex situations.
How many C2 candidates can do these?•