How Written Exam Questions Are Developed

In our first Demystifying the ARE post, we took a look at who develops ARE questions. This week, we’ll break down how a question makes it way on to the exam. It takes about two years for an item (also known as a question, for those who don’t speak exam development lingo) to move through the development process to become a scored item in a test center. In this post, we’ll look at the process for developing written items, including multiple choice, check-all-that-apply, and fill-in-the-blank questions.

Creating a written item:

  1. Deciding what items to write.
    Each division is broken down into content areas, which are outlined in every exam guide. Each year, we conduct an analysis of these content areas to determine which questions need refreshing. These areas are then assigned to our volunteers who will write new questions.
  2. Drafting initial questions.
    NCARB relies on the help of hundreds of volunteers to help develop our programs. Volunteers who serve on our examination committees are given training on the standards for writing questions as well as a few content areas to focus on. Each item a volunteer develops contains the necessary question statement along with the various answer options. They must identify the correct answer, the key, as well as either a reference that supports the key or a written rationale justifying the answer and why the other options are incorrect.
  3. First review of draft questions.
    The initial question is reviewed by a mentoring writer who has served on the item writing committee before. They can work with the initial writer to tweak the item if needed before moving it on in the process. After the item passes initial review, the question is then forwarded to a professional item editor with Alpine Testing to ensure it meets grammatical standards and formatting.
  4. Group review of draft questions.
    After professional editing, the question is shared with the entire workgroup of individuals (generally 5-8 people) working on the division. All members of the workgroup are able to review and comment on the question to ensure the group comes to a consensus on it meeting the necessary content area requirements. If the question survives this review, it is released and gets ready for pretesting.
  5. Questions are pre-tested as part of regular exams.
    Released items are then included in the pool of exam questions the following year. Questions are exposed to several hundred candidates to capture and determine their performance. Pretest questions are mixed with scored questions for delivery, but do not impact the candidate’s pass/fail decision.
  6. Pretest results are analyzed.
    After an item has been tested by a significant number of candidates, the results are analyzed by psychometricians. The two primary classical test statistics determined are the P-value (percentage of candidates answering the question correctly) and the point bi-serial (a correlation between how well a candidate performs on the individual item against the candidate’s overall performance on the exam).
  7. Final item review
    The question must perform within acceptable ranges on each parameter to become a scored item in the future. Questions that don’t meet performance standards are reviewed by the workgroup once again to see if some aspect of it can be modified to improve performance. If it is believed the question can be salvaged, it is modified and pretested again. Poor performing questions are simply deleted and never become operational. The success rate of questions from initial draft to becoming operational can seem daunting to a first-time volunteer. Initially authored questions often have less than a 90 percent success rate of even making it through the first couple committee reviews. Pretest item performance rates fair worse, as only 70 percent of pretest items survive statistical scrutiny and make it to operational status. Finally, scored questions don’t remain part of the ARE item bank forever. Each one is monitored for its ongoing performance. As performance on a question changes and begins to move out of acceptable ranges, the question is retired. Also, a comprehensive review of the question pool is completed every few years to ensure each question’s content is still valid.

In the next Demystifying the ARE post, we’ll look at how vignettes are developed. Stay tuned!

Who Develops the ARE?