Live Form Creation
The next stage of development is assembling items into exams and global deployment.
Each test has two forms, and each form holds 60-100 items, chosen from all of the test objectives. If candidates fails one form and retake the exam, they receive the second form for the retake. If they fail again, they get the first form on their third attempt.
The VUE test engine randomly orders the questions of each form when someone takes the exam. If two candidates sat down next to each other at a VUE test center and both wound up with Form 1, the order of questions would be random and they would not view the questions in the same sequence.
Initial Exam Publishing
Once the LPI psychometric staff has determined the composition of forms, the exam must be converted from text-based items into the actual exam file format to deployed globally through LPI's network of testing centers.
The exam now enters a period of initial testing where the goal is to determine if the questions are in fact measuring skills and competencies. Within the testing industry, this period is often referred to as the initial, pilot or research stage of testing. In IT certification, this period is known as the beta testing period.
During this time, candidates can register for tests and complete them at local testing centers. They receive credit, but candidates do not receive scores back immediately after the exam.
For the second release of Level 1 in August 2002, LPI used a new form of beta exam called a seeded beta. This type of beta allows a candidate to be scored on the questions that have completed a prior beta evaluation. New beta items are seeded into the exam to collect data on the performance of those specific questions. In a seeded beta, exams may have more than the 60 -100 questions. However, additional time is usually allotted for the candidates to complete these additional test questions. The results of these questions do not affect the candidates’ scores.
Before any scores can be set on an exam with new questions, the passing score needs to be set. Several simultaneous processes determine the cut score.
Obtaining enough exams
Before the passing score can be set, LPI had to accumulate an adequate number of exams taken by people who are similar to the target job description.
The target number was at least 100 results for each form of an exam. LPI publicized an incentive program, offered discounts, and used the test center at Linux Business Expo in spring 2000 to obtain the necessary number. As our support has grown, our target data numbers are considerably greater, helping to generate the most accurate results.
As part of the beta exam process, we also collected demographics. How long had test takers worked with Linux? Did they do system administration daily? How much had they prepared for the test? Demographics are taken into account by psychometric staff when reviewing the validity of questions.
Reviewing the questions
As tests results roll in, psychometric staff start to examine the data. Are there questions that everyone gets correct? Are there questions that everyone fails? (Both situations are indicators that something might be wrong with the question.) What are the comments being submitted by exam takers?
The exam included a mechanism for comments, and LPI received plenty.
LPI shifted through all the comments and addressed questions and concerns, and the comments submitted by exam takers helped us find many of the errors. Despite the comprehensive review, some technical errors were included on the beta exam. A few questions needed to be thrown out.
Modified Angoff study
While psychometric staff reviewed incoming data, a separate pool of subject-matter experts simultaneously participated in a modified Angoff study. Their goal: provide the psychometric staff with additional data to validate questions and assist in setting the passing score.
- The experts receive copies of the exam questions on each form.
- The experts look at each question independently and in consultation with each other and make judgments about how likely a minimally qualified person meeting the job requirements described in a specification sheet would be able to answer the question correctly. In other words, the experts consider the question from the perspective of someone who is at the bottom of the competence scale for job performance.
- The experts rate each question with their estimate of what percentage of people will answer correctly, keeping in mind that on multiple-choice questions, some people will get it right by virtue of guessing.
To illustrate, let's say the experts estimate that candidates may get a question right 30 percent of the time. If exam data shows that 90 percent of candidates are getting the question right, then the question needs to be examined to see if the answer is being given away by the wording of the question, or perhaps the answer is provided in another question on the exam. Conversely, if the experts think all candidates should know a particular question, they might rate it at 95 percent. If exam shows that only 10 percent of test takers are getting it correct, the item is reexamined to see if it is phrased poorly or has some other issue.
Ideally, the results from the Angoff study should parallel to a certain degree the actual results from the exams in the beta period.
Beyond validating item performance, the results of the Angoff study are also used in helping to establish the passing score for exams. As an example, let's say the Angoff study determined that all the questions were difficult for a given form, and the average percentage rating was 30 percent. This information would suggest to the psychometric staff that they need to set the passing score lower, because exam questions are that much tougher.
Distributing Score Results
After all of the data collection, the analysis and the Angoff study, the psychometric staff set a passing score, and distributed scores to exam takers .
After the work in beta, the passing score has been set, bad items have been removed or fixed, and the exam is ready to be re-published. This work involves significant review and can take a month or more to complete.