© Springer Nature Singapore Pte Ltd. 2019
David Andrich and Ida MaraisA Course in Rasch Measurement TheorySpringer Texts in Educationhttps://doi.org/10.1007/978-981-13-7496-8_25

25. Derivation of Classical Test Theory Equations and Coefficient α

David Andrich1   and Ida Marais1
(1)
Graduate School of Education, The University of Western Australia, Crawley, WA, Australia
 
 
David Andrich

Keywords

CTTTrue scoreReliabilityCovarianceStandard error of measurementCoefficient α

Formalization and Derivation of CTT Eqs. (3.​1)–(3.​5) in Chap. 3

In many traditional texts, the subscript for the person is taken for granted. However, though quicker to write, it can contribute to confusion when it is not clear if the summation is over persons or over items, or both. Therefore, in this book, we continue to use subscripts n and i for a person and an item, respectively. According to Eq. (3.​1),
$$ y_{n} = t_{n} + e_{n} , $$
(25.1)
where $$ y_{n} $$ is the observed score on a test , that is, a person’s total score on a set of items, and $$ t_{n} $$, $$ e_{n} $$ are, respectively, the person’s true and error scores. This is the fundamental simple equation of CTT and note again that it has no item parameters.

Thus, we see that the observed score on the variable is the sum of two other variables, the true and error scores. Both are unobserved and therefore latent scores. In addition, they are real numbers, whereas $$ y_{n} $$ is typically an integer.

If the error is not correlated with the actual true score , then from what we have just learned about the variance of the sum of two variables, it follows that the variance of the observed scores is the sum of the variances of the true scores and the error scores. That is
$$ s_{y}^{2} = s_{t}^{2} + s_{e}^{2} $$
(25.2)

This is the second most relevant equation in CTT .

The next important concept developed in CTT is the formalization of reliability . It begins with the idea of two tests being administered to measure the same construct. These tests are often termed parallel tests. In some situations, there really are two tests, but we do not need them to develop a theory and then see what other ways we can calculate the reliability .

Each test will have its error of measurement , but the true score for a person will be the same. Here we have to use double subscripts briefly.

Let the score of person n on test 1 be $$ y_{n1} $$ and on test 2 be $$ y_{n2} $$.

That is,
$$ y_{n1} = t_{n1} + e_{n1} $$
(25.3a)
and
$$ y_{n2} = t_{n2} + e_{n2} . $$
(25.3b)

Because we are interested in how consistent the observed scores are from the two tests, we calculate the correlation between $$ y_{n1} $$ and $$ y_{n2} $$. We would want the correlation to be high. We begin with the calculation of the covariance between $$ y_{1} $$ and $$ y_{2} $$.

Derivation of Covariance

We have not derived these relationships in full elsewhere, and therefore for completeness include them here. We could use the random variable notation but for simplicity, we use the chosen scores in the derivations. In calculating the covariance , we immediately estimate the population value and therefore use N − 1 rather than N in the devisor, where N is the number of persons in the sample.
$$ \begin{aligned} c_{12} & = \frac{{\sum\nolimits_{n = 1}^{N} {(y_{n1} - \overline{y}_{1} )(y_{n2} - \overline{y}_{2} )} }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {[(t_{n} + e_{n1} ) - (\overline{t} + \overline{e}_{1} )][(t_{n} + e_{n2} ) - (\overline{t} + \overline{e}_{2} )]} }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {[(t_{n} + e_{n1} ) - (\overline{t} )][(t_{n} + e_{n2} ) - (\overline{t} )]} }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {[t_{n}^{2} + t_{n} e_{n2} - t_{n} \overline{t} + e_{n1} t_{n} + e_{n1} e_{n2} - e_{n1} \overline{t} - \overline{t} t_{n} - \overline{t} e_{n2} + \overline{t} \overline{t} ]} }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {t_{n}^{2} + \sum\nolimits_{n = 1}^{N} {t_{n} e_{n2} } - \sum\nolimits_{n = 1}^{N} {t_{n} \overline{t} } + \sum\nolimits_{n = 1}^{N} {e_{n1} t_{n} } + \sum\nolimits_{n = 1}^{N} {e_{n1} e_{n2} } - \sum\nolimits_{n = 1}^{N} {e_{n1} \overline{t} } - \sum\nolimits_{n = 1}^{N} {\overline{t} t_{n} } - \sum\nolimits_{n = 1}^{N} {\overline{t} e_{n2} } + \sum\nolimits_{n = 1}^{N} {\overline{t}_{{}}^{2} } } }}{N - 1} \\ \end{aligned} $$

Now because the error is assumed to be not correlated with the true score , nor with itself across two different tests, the sum of the products of all terms which contain an error term will be 0.

Therefore, the last term above simplifies to
$$ \begin{aligned} c_{12} & = \frac{{\sum\nolimits_{n = 1}^{N} {t_{n}^{2} - \sum\nolimits_{n = 1}^{N} {t_{n} \overline{t} } - \sum\nolimits_{n = 1}^{N} {\overline{t} t_{n} } + \sum\nolimits_{n = 1}^{N} {\overline{t}_{{}}^{2} } } }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {t_{n}^{2} - 2\sum\nolimits_{n = 1}^{N} {t_{n} \overline{t} } + N\overline{t}_{{}}^{2} } }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {t_{n}^{2} - 2\sum\nolimits_{n = 1}^{N} {t_{n} \sum\nolimits_{n = 1}^{N} {\frac{{t_{n} }}{N}} } + N\left( {\frac{{\sum\nolimits_{n = 1}^{N} {\,t_{n} } }}{N}} \right)^{2} } }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {t_{n}^{2} - \frac{2}{N}\left( {\sum\nolimits_{n = 1}^{N} {t_{n} } } \right)\left( {\sum\nolimits_{n = 1}^{N} {t_{n} } } \right) + \frac{N}{{N^{2} }}\left( {\sum\nolimits_{n = 1}^{N} {t_{n} } } \right)^{2} } }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {t_{n}^{2} - \frac{2}{N}\left( {\sum\nolimits_{n = 1}^{N} {t_{n} } } \right)^{2} + \frac{1}{N}\left( {\sum\nolimits_{n = 1}^{N} {t_{n} } } \right)^{2} } }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {t_{n}^{2} - \frac{1}{N}\left( {\sum\nolimits_{n = 1}^{N} {t_{n} } } \right)^{2} } }}{N - 1} \\ & = \frac{{\sum\nolimits_{n = 1}^{N} {(t_{n} - \overline{t} )^{2} } }}{N - 1} \\ & = \frac{{SS_{t} }}{N - 1} = s_{t}^{2} \\ \end{aligned} $$

The second last step is proved as follows:

To show that
$$ \frac{{\sum\nolimits_{n = 1}^{N} {t_{n}^{2} - \frac{1}{N}\left( {\sum\nolimits_{n = 1}^{N} {t_{n} } } \right)^{2} } }}{N - 1} = \frac{{\sum\nolimits_{n = 1}^{N} {(t_{n} } - \overline{t} )^{2} }}{N - 1}, $$
we work in reverse and just take the numerator for convenience, that is, we show that
$$ \sum\limits_{n = 1}^{N} {(t_{n} } - \overline{t} )^{2} = \sum\limits_{n = 1}^{N} {t_{n}^{2} - \frac{1}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} } $$
Proof
$$ \begin{aligned} \sum\limits_{n = 1}^{N} {(t_{n} - \overline{t} )^{2} } & = \sum\limits_{n = 1}^{N} {(t_{n}^{2} - 2\overline{t} t_{n} + (\overline{t} )^{2} )} \\ & = \sum\limits_{n = 1}^{N} {t_{n}^{2} } - 2\overline{t} \,\sum\limits_{n = 1}^{N} {t_{n} } + \sum\limits_{n = 1}^{N} {\left( {\frac{{\sum\nolimits_{n = 1}^{N} {t_{n} } }}{N}} \right)^{2} } \\ & = \sum\limits_{n = 1}^{N} {t_{n}^{2} } - 2\frac{{\sum\nolimits_{n = 1}^{N} {t_{n} } }}{N}\sum\limits_{n = 1}^{N} {t_{n} } + N\left( {\frac{{\sum\nolimits_{n = 1}^{N} {t_{n} } }}{N}} \right)^{2} \\ & = \sum\limits_{n = 1}^{N} {t_{n}^{2} } - \frac{2}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} + \frac{N}{{N^{2} }}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} \\ & = \sum\limits_{n = 1}^{N} {t_{n}^{2} } - \frac{2}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} + \frac{1}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} \\ & = \sum\limits_{n = 1}^{N} {t_{n}^{2} } - \frac{1}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} \\ \end{aligned} $$
In summary, the covariance between two parallel tests is simply the variance of the true scores.
$$ c_{12} = s_{t}^{2}.$$
(25.4)

From this equation, we can derive another relevant relationship based on correlations.

In Statistics Review 4, it is shown that the covariance is standardized to a correlation by dividing the covariance by the standard deviations.
$$ r_{12} = \frac{{c_{12} }}{{s_{1} s_{2} }}. $$
(25.5)

However, $$ s_{1}^{2} = s_{t}^{2} + s_{e}^{2} $$ and $$ s_{2}^{2} = s_{t}^{2} + s_{e}^{2} $$.

Note that here we assume that the error variance on both tests is the same—of course the true scores must be the same. Therefore, the variance of the observed scores is the same. That is, if $$ s_{y}^{2} $$ is the variance of the observed scores y of any two parallel tests 1 and 2, then these variances should be equal: $$ s_{1}^{2} = s_{2}^{2} = s_{y}^{2} $$.

Therefore, Eq. (25.5) reduces to
$$ r_{yy} = \frac{{s_{t}^{2} }}{{s_{y} s_{y} }} = \frac{{s_{t}^{2} }}{{s_{y}^{2} }}, $$
(25.6)
where the correlation of a test with itself is denoted by $$ r_{yy} $$.

Equation (25.6) represents the proportion of the total variance that is true variance. Thus the reliability $$ r_{yy} $$ of the test is equivalent to the proportion of the total variance that is true score variance .

Equation (25.6) can be further rearranged to give
$$ r_{yy} = \frac{{s_{t}^{2} }}{{s_{t}^{2} + s_{e}^{2} }} $$
(25.7)
and
$$ r_{yy} = \frac{{s_{y}^{2} - s_{e}^{2} }}{{s_{y}^{2} }} . $$
(25.8)
It is evident that because (i) $$ s_{e}^{2} \ge 0 $$, (ii) $$ s_{t}^{2} $$ cannot be greater than $$ s_{y}^{2} $$ and (iii) $$ s_{t}^{2} $$ and $$ s_{y}^{2} $$ are positive, that $$ r_{yy} $$ has the range 0 to 1, that is
$$ 0 \le r_{yy} \le 1 . $$
(25.9)

Derivation of the Standard Error of Measurement

By rearranging Eq. (25.8) it is possible to write an equation of the error variance in terms of the reliability .
$$ \begin{aligned} s_{e}^{2} & = s_{y}^{2} - r_{yy} \,s_{y}^{2} \\ & = s_{y}^{2} (1 - r_{yy} ). \\ \end{aligned} $$
(25.10)

Now we need to appreciate what the error variance is; it is the variation of scores from test to test for persons with the same true score or from the same person on more than one occasion.

The standard deviation of these scores is given by
$$ s_{e} = s_{y} \sqrt {(1 - r_{yy} )} . $$
(25.11)

Equation (25.11) is known as the standard error of measurement .

Notice that if the reliability is 1, that is, the scores on the two parallel tests are perfect, then the standard error is 0; if the reliability is 0, then the standard error is the standard deviation of the original scores which means that all of the variations is error variance.

Derivation of the Equation for Predicting the True Score from the Observed Score

From the observed score $$ y_{n} $$ and the test ’s reliability , it is possible to estimate the true score $$ t_{n} $$ and the standard error of this estimate. This estimate is made using the equation for regression.

It will be recalled from Statistics Review 4 that variable Y for person n can be predicted from variable X using the equation $$ \hat{Y}_{n} = b_{0} + b_{1} X_{n} $$. Note that we now use the subscript n for the person rather than i to be consistent with the presentation here. In this equation, $$ b_{1} = c_{xy} /s_{x}^{2} $$ and $$ b_{0} = \overline{Y} - b_{1} \overline{X} $$.

Now in the case where the score y corresponds to the true score t, and y remains as the observed score y, we obtain
$$ t_{n} = b_{0} + b_{1} y_{n} , $$
(25.12)
where $$ b_{1} = c_{yt} /s_{y}^{2} $$ and $$ b_{0} = \overline{t} - b_{1} \overline{y} $$.
In this special case, the covariance $$ c_{yt} $$ can be rearranged as follows. Again, you do not need to know this proof, but it is provided for completeness.
$$ \begin{aligned} c_{yt} & = \frac{1}{N - 1}\sum\limits_{n = 1}^{N} {(y_{n} - \overline{y} )(t_{n} - \overline{t} )} \\ & = \frac{1}{N - 1}\sum\limits_{n = 1}^{N} {(t_{n} + e_{n} - (\overline{t} + \overline{e} ))(t_{n} - \overline{t} )} \\ & = \frac{1}{N - 1}\sum\limits_{n = 1}^{N} {(t_{n} + e_{n} - \overline{t} - \overline{e} )(t_{n} - \overline{t} )} \\ & = \frac{1}{N - 1}\sum\limits_{n = 1}^{N} {(t_{n}^{2} - t_{n} \overline{T} + e_{n} t_{n} - e_{n} \overline{t} - \overline{t} t_{n} + \overline{t}^{2} - \overline{e} t_{n} + \overline{e} \overline{t} )} \\ & = \frac{1}{N - 1}\left( {\sum\limits_{n = 1}^{N} {t_{n}^{2} } - \sum\limits_{n = 1}^{N} {t_{n} \overline{t} } + \sum\limits_{n = 1}^{N} {e_{n} t_{n} } - \sum\limits_{n = 1}^{N} {e_{n} \overline{t} } - \sum\limits_{n = 1}^{N} {\overline{t} t_{n} } + \sum\limits_{n = 1}^{N} {\overline{t}^{2} } - \sum\limits_{n = 1}^{N} {\overline{e} t_{n} } + \sum\limits_{n = 1}^{N} {\overline{e} \overline{t} } } \right) \\ & = \frac{1}{N - 1}\left( {\sum\limits_{n = 1}^{N} {t_{n}^{2} } - \sum\limits_{n = 1}^{N} {t_{n} \overline{t} } - \sum\limits_{n = 1}^{N} {\overline{t} t_{n} } + \sum\limits_{n = 1}^{N} {\overline{t}^{2} } } \right) \\ & = \frac{1}{N - 1}\left( {\sum\limits_{n = 1}^{N} {t_{n}^{2} } - 2\sum\limits_{n = 1}^{N} {t_{n} \overline{t} } + N\overline{t}^{2} } \right) \\ & = \frac{1}{N - 1}\left( {\sum\limits_{n = 1}^{N} {t_{n}^{2} } - 2\overline{t} \sum\limits_{n = 1}^{N} {t_{n} } + N\left( {\frac{{\sum\nolimits_{n = 1}^{N} {t_{n} } }}{N}} \right)^{2} } \right) \\ & = \frac{1}{N - 1}\left( {\sum\limits_{n = 1}^{N} {t_{n}^{2} } - 2\frac{{\sum\nolimits_{n = 1}^{N} {t_{n} } }}{N}\sum\limits_{n = 1}^{N} {t_{n} } + \frac{1}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} } \right) \\ & = \frac{1}{N - 1}\left( {\sum\limits_{n = 1}^{N} {t_{n}^{2} } - \frac{2}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} + \frac{1}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} } \right) \\ & = \frac{1}{N - 1}\left( {\sum\limits_{n = 1}^{N} {t_{n}^{2} } - \frac{1}{N}\left( {\sum\limits_{n = 1}^{N} {t_{n} } } \right)^{2} } \right) \\ & = \frac{1}{N - 1}\sum\limits_{n = 1}^{N} {(t_{n} - \overline{t} )} {}^{2} \\ & = s_{t}^{2} \\ \end{aligned} $$
Therefore
$$ \begin{aligned} b_{1} & = \frac{{c_{yt} }}{{s_{y}^{2} }} = \frac{{s_{t}^{2} }}{{s_{y}^{2} }} = r_{yy} \\ b_{0} & = \bar{t} - r_{yy} \bar{y} \\ \end{aligned} $$
Therefore
$$ \begin{aligned} \hat{t}_{n} & = \bar{t} - r_{yy} \bar{y} + r_{yy} y_{n} \\ & = \bar{t} + r_{yy} (y_{n} - \bar{y}) \\ \end{aligned} $$

In this equation, we apparently do not know the value of $$ \bar{t} $$.

However, in the population $$ E[\bar{y}] = \bar{t} $$. Therefore, we substitute $$ \bar{y} $$ as an estimate of $$ \bar{t} $$.

Therefore, finally we can write
$$ \hat{t}_{n} = \bar{y} + r_{yy} (y_{n} - \bar{y}) . $$
(25.13)

Thus using Eq. (25.13) we can predict the true score from the observed score and from Eq. (25.11) we can estimate the error in this prediction.

Derivation of Coefficient α

To derive the calculation of coefficient $$ \alpha $$, we need to involve the number of items. Therefore, we begin with the relationships of the variances at the item level. From Eq. (3.​2) in Chap. 3,
$$ s_{i}^{2} = s_{t}^{2} + s_{e}^{2} , $$
(25.14)
where $$ s_{i}^{2} $$ is the variance of the observed scores of any item, $$ s_{t}^{2} $$ is the variance of true scores and $$ s_{e}^{2} $$ is the error variance relative to an item.
Then
$$ \begin{aligned} \sum\limits_{i = 1}^{I} {s_{i}^{2} } & = \sum\limits_{i = 1}^{I} {(s_{t}^{2} + s_{e}^{2} )} \\ \sum\limits_{i = 1}^{I} {s_{i}^{2} } & = Is_{t}^{2} + Is_{e}^{2} \\ \end{aligned} $$
(25.15)
Therefore, subtracting Eq. (25.15) from Eq. (4.​3) in Chap. 4 gives
$$ \begin{aligned} s_{y}^{2} - \sum\limits_{i = 1}^{I} {s_{i}^{2} } & = I^{2} s_{t}^{2} + Is_{e}^{2} - (Is_{t}^{2} + Is_{e}^{2} ) \\ & = I^{2} s_{t}^{2} - Is_{t}^{2} \\ & = I(I - 1)s_{t}^{2} \\ \end{aligned} $$
(25.16)
Therefore, dividing Eq. (25.16) by Eq. (4.​3) in Chap. 4 gives
$$ \begin{aligned} \frac{{s_{y}^{2} - \sum\nolimits_{i = 1}^{I} {s_{i}^{2} } }}{{s_{y}^{2} }} & = \frac{{I(I - 1)s_{t}^{2} }}{{I^{2} s_{t}^{2} + Is_{e}^{2} }} \\ & = \frac{{I(I - 1)s_{t}^{2} }}{{I^{2} (s_{t}^{2} + s_{e}^{2} /I)}} \\ & = \frac{{(I - 1)s_{t}^{2} }}{{I(s_{t}^{2} + s_{e}^{2} /I)}} \\ \end{aligned} $$
(25.17)
from which
$$ \begin{aligned} \frac{I}{I - 1}\left( {\frac{{s_{y}^{2} - \sum\nolimits_{i = 1}^{I} {s_{i}^{2} } }}{{s_{y}^{2} }}} \right) & = \frac{I}{(I - 1)}.\frac{{(I - 1)s_{t}^{2} }}{{I(s_{t}^{2} + s_{e}^{2} /I)}} \\ & = \frac{{s_{t}^{2} }}{{s_{t}^{2} + s_{e}^{2} /I}} \\ & = r_{yy} \\ \end{aligned} $$
(25.18)

This is the expression for reliability in Eq. (4.​5).

Thus to calculate an estimate of reliability according to coefficient $$ \alpha $$, we can write
$$ \alpha = \frac{I}{I - 1}\left( {\frac{{s_{y}^{2} - \sum\nolimits_{i = 1}^{I} {s_{i}^{2} } }}{{s_{y}^{2} }}} \right) . $$
(25.19)

The variances of the total scores and the items, $$ s_{y}^{2} $$ and $$ s_{i}^{2} $$, are calculated simply as $$ s_{y}^{2} = \frac{1}{N - 1}\sum\nolimits_{n = 1}^{N} {(y_{n} - \bar{y})^{2} } $$ and $$ s_{i}^{2} = \frac{1}{N - 1}\sum\nolimits_{n = 1}^{N} {(x_{n} - \bar{x})}^{2} , $$ where $$ N $$ is the number of persons involved.