CTB/McGraw-Hill Research Ensures Your Assessments Are Valid and Reliable
CTB is a distinguished leader in educational assessment and reporting. With more than 80 years of proven experience in the field, CTB provides high-quality and innovative assessment solutions that help learners of all ages meet their potential.
CTB is guided by a clear mission—to help the teacher help the child—the same today as when the company was founded in 1926. This long-standing tradition of support and dedication to quality research means educators receive assessments that quickly provide the data needed to inform teaching and improve learning at every level—from the individual student to the classroom, school, and district.
To make sound decisions about student progress, educators need assurance that an assessment is valid and reliable. That's why technical excellence is the foundation of CTB.
Acuity is designed to meet the high standards of validity and reliability. It is based on statistically sound measurement models, and extensive research and development. With a staff of doctoral-level research scientists and psychometricians, statisticians, and educational measurement specialists, CTB applies the same best practices used in item development for high-stakes testing to the development of Acuity. In addition, CTB continually updates Acuity assessments to meet the changing needs of today’s classrooms and reflect current state standards.
High-Quality Assessment Items
All assessment items undergo a rigorous development and review process to ensure Acuity Assessments comprise the highest quality items.
To create even a single item, meticulous planning and development is required. The process starts with careful construction by assessment and content experts, followed by an examination for bias and fairness, and an evaluation to ensure reliable inferences can be drawn from the item about student strengths and instructional needs. Empirical data from students in your state are used to select the best functioning items for final Acuity forms. Numerous item reviews, specific to state content standards, ensure accurate skill assessment and measurement of student growth relative to state standards.
Predicting to State Accountability Assessments
The value of the Acuity predictive information resides in its ability to indicate student growth and progress toward meeting end-of-year goals relative to a given state’s accountability exam.
Acuity helps prepare students to achieve relative to state assessments using item content that reflects the structure and format of state accountability exams. For example, each predictive assessment includes similar content and format of items found on state exams.
In addition to providing students with valuable practice, Acuity Predictive Assessments deliver critically important predictions of how students will likely perform relative to the state exam. This information supports targeted classroom instruction—while there is still time to improve student achievement. It also preserves classroom instructional hours by helping to focus teaching where students most urgently need intervention.
A Strong Foundation of Research
CTB utilizes proven, industry-standard methods to develop, analyze, and score Acuity Assessments. These methods are the same scientific, research-based, and documented methods used for most state large-scale assessments.
Scaled Predictive Assessments
Acuity Predictive Assessments include three scaled predictive forms—A, B, and C—that reflect state accountability test blueprints for each content area and grade level. The forms are designed to be administered approximately six to eight weeks apart and prior to the state accountability assessment. Each form includes content that is developmentally appropriate for the specific time of the assessment. This helps ensure the assessment accurately measures what students truly know and can accomplish.
Acuity delivers powerful interpretations of student performance with predictive information provided in important ways.
Acuity Assessments provide scale scores* to measure student longitudinal growth in each content area within and across grades. This scale can be applied to all students taking a given test, regardless of student characteristics or the time of year. Scale scores can be added, subtracted, and averaged across test levels. This makes it possible to compare scores on different forms over time, or to make direct comparisons among individual students or groups on the same form in a way that is statistically valid. This cannot be done with number-correct or raw scores, percentiles, or grade-level equivalents.
Since Acuity Predictive Reports are based on a single common scale, educators can measure the longitudinal growth of students, classrooms, schools, and districts—at different times of the school year and across school years.
To provide predictions, Acuity assessment results are matched with student NCLB test data. This matching enables Acuity to provide predictive performance levels and confidence bands that indicate the expected level of achievement on the state NCLB test. The prediction includes the corresponding performance level on the state exam (e.g. Below Basic, Basic, Proficient, or Advanced or other performance descriptions are used on state exam).
This prediction is displayed in a concordance table*. This table shows expected performance by indicating how students with similar Acuity scores in the baseline—or matching—year performed later on the state test. The concordance table provides educators with empirical predictions from a student’s current performance to the end-of-year criteria.
* A representative sample of student data from the state is necessary to develop scales and predictive concordance to the state test.
To ensure scientifically valid assessments and reliable test results, the Acuity Predictive Assessment items undergo rigorous Classical Test Theory and Item Response Theory test and item analyses.
Classical Test Theory
CTB incorporates the use of Classical Test Theory to construct high-quality Acuity Assessments. Classical analyses—such as p-values, distractor analyses, point biserial correlations, Mantel-Haenszel differential item functioning (bias indices as sample sizes permit), and test reliability coefﬁcients—are applied to the creation of each Acuity Assessment to ensure the highest quality items and to support accurate measurement of each student relative to each content area.
Item Response Theory
Item Response Theory (IRT) is a mathematical model that describes the relationship between a student’s level of achievement and the likelihood of a correct response on each test item. The use of IRT provides more accurate scoring and supports developmentally appropriate assessments by matching student abilities with the difficulty level of the assessment. IRT uses actual student data to determine empirically—based on statistical observation—how much information is provided by each item.
CTB was a pioneer in the practical application of IRT to large-scale assessment in the 1970s and continues to lead the industry today.
The IRT model used by Acuity provides information about each item, called item parameters, to provide more accurate measurement of student achievement. The three-parameter logistic (3PL) model characterizes a multiple-choice item in terms of three parameters:
- Item difficulty, which refers to the difficulty of the item
- Item discrimination, which reflects the strength of the relationship between the item response and ability
- Guessing, which indicates the likelihood of a correct response from a student simply by guessing
CTB was the first K–12 assessment publisher to use the 3PL IRT model—the most precise model available—for selected-response items.
In contrast, the Rasch or one-parameter logistic (1PL) model treats two of these parameters differently. While 1PL views item difficulty similarly to 3PL, it construes item discrimination and guessing at the same level for all items. Research conducted by CTB indicates that the 3PL model yields more accurate information than the IPL model, which is why it is the default model used at CTB, and more specifically, for Acuity.
CTB uses the two-parameter partial credit (2PPC) model for developing, scaling, and scoring constructed-response items. 2PPC defines a constructed-response item in terms of two parameters:
1. Item difficulty
As is true for the 1PL model used for selected-response items, the one-parameter partial credit (1PPC) model assumes equal item discrimination for all items. The advantages of 3PL over 1PL can be similarly applied to the comparison of the 2PPC and 1PPC models.
2. Item discrimination