Statistics Review 10: Bernoulli and Binomial random variables.
We continue with the dichotomous Rasch model and with the context of the assessment of proficiency. In this chapter, we use the set of responses of persons to items to estimate their proficiencies, given the estimates of item difficulties. Where the previous chapter was concerned with item calibration , this chapter is concerned with person measurement .
In theory, by conditioning on the total scores of items, we can estimate the person parameters independently of all item parameters. However, that has only recently been made operational and it is not yet practical. Instead, in estimating the person parameters it is assumed that the item parameters are known. This can be assumed if they have been estimated as described in the previous chapter.
![$$ P_{ni} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq3.png)
![$$ P_{ni} = \Pr \{ x_{ni} = 1\} = \frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }}. $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ1.png)
Make sure you understand the probability
of a Bernoulli random variable in Statistics Review 10 that helps in
understanding the use and meaning of .
This section could be in a statistics review. However, because we consider that the way the items are formalized, how the probability statements are interpreted, and how the scores on items can be summed, is integral to the interpretation of statistical analyses of assessments, we have included it in the main part of the book.
Solution Equations in the Rasch Model
Recall from previous chapters that the
total score is a sufficient
statistic for its parameter, in this case, the
proficiency of the person. Thus, the total person score
is the sufficient
statistic for the estimate of the person proficiency
where I is the number of items responded to by
person n. That is, all of the
information about
resides in the total score
.
In the previous chapters, we used the
sufficiency of the total score to show
how the person parameter can be eliminated to produce equations for
estimating the item difficulties without knowledge of the person
proficiencies. We now use the total score
for a second purpose, to estimate the proficiency of person n, given that we have the estimates of
the item difficulties. We can now relate this estimation to
Statistics Review 10.
We build the equation for the estimation of the person parameter by analogy. We then write out the formal equation and show how it can be derived using the idea of maximum likelihood introduced in the last chapter.
Responses of person n to 10 items
Random variables |
x n1 |
x n2 |
x n3 |
x n4 |
x n5 |
x n6 |
x n7 |
x n8 |
x n9 |
x n10 |
Total Score
|
---|---|---|---|---|---|---|---|---|---|---|---|
Value |
1 |
1 |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
0 |
6 |
Now, rather than each response being a replication of the same person answering the same item, each item is different and will have a different difficulty from every other item. As a result, the items are not replications of each other as in the case of replicated Bernoulli variables (Binomial experiment).
Therefore, we need to imagine the mean score of each item in a different way. We take two steps to build up this imaginary set up. First, imagine that each item is given many times to the person, and consider the estimate of the probability that the person answers each item correctly. This would be the mean number of times of the many replications that the person answers the item correctly. However, we recognize that it is not reasonable to ask the same person to answer the same item many times. Therefore, second, imagine that it is not the identical item that is administered on more than one occasion, but that there are many different items of exactly the same difficulty that are administered to the person. In this case, the theoretical mean number of correct responses will be the estimate of the probability that the person will answer correctly any one of these items with the same difficulty. The distinction in the second last sentence above between the identity of an item and the difficulty of an item will appear throughout this book.
Probabilities of responses of a person to 10
items: the proficiency of person n is and the difficulties of the items are
,
,
,
,
,
,
,
,
,
Random variables |
x n1 |
x n2 |
x n3 |
x n4 |
x n5 |
x n6 |
x n7 |
x n8 |
x n9 |
x n10 |
Total score
|
---|---|---|---|---|---|---|---|---|---|---|---|
Observed value |
1 |
1 |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
0 |
6 |
Average |
0.95 |
0.92 |
0.88 |
0.82 |
0.73 |
0.50 |
0.38 |
0.37 |
0.27 |
0.18 |
6.0 |
Probability |
0.95 |
0.92 |
0.88 |
0.82 |
0.73 |
0.50 |
0.38 |
0.37 |
0.27 |
0.18 |
6.0 |
The sum of the probabilities (theoretical means) of the number of times each item is answered correctly is equal to the number of items that are answered correctly. Thus, the sum of each row in Table 10.2 is 6.
To think about this, it might help if you imagine first that 10 items of exactly the same difficulty were answered by a person. If the probability of being correct on these items is 0.6, then one would expect that the number of times a correct answer would be given is 6.
![$$ 0.6 + 0.6 + 0.6 + 0.6 + 0.6 + 0.6 + 0.6 + 0.6 + 0.6 + 0.6 = 10\left( {0.6} \right) = 6 $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equa.png)
The case in Table 10.2 is analogous, except that every item has a different probability (theoretical mean number) of correct responses. For example, starting from the left, the items are successively more difficult for the person. Nevertheless, the sum of all of these probabilities should equal the number of correct responses, in this case 6.
The Solution Equation for the Estimate of Person Proficiency
The above rationale permits setting up equations to estimate the proficiency of each person, given the estimates of the difficulty for each item.
![$$ r_{n} = \sum\limits_{i\, = \,1}^{10} {x_{ni} } = \sum\limits_{i\, = \,1}^{10} {P_{ni} } . $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ2.png)
![$$ P_{ni} = \frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ3.png)
![$$ \begin{aligned} r_{n} & = \sum\limits_{i\, = \,1}^{10} {\frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }}} \\ & = \frac{{e^{{\beta_{n} - \delta_{1} }} }}{{1 + e^{{\beta_{n} - \delta_{1} }} }} + \frac{{e^{{\beta_{n} - \delta_{2} }} }}{{1 + e^{{\beta_{n} - \delta_{2} }} }} + \cdots \cdots \frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }} \cdots \cdots + \frac{{e^{{\beta_{n} - \delta_{10} }} }}{{1 + e^{{\beta_{n} - \delta_{10} }} }} \\ \end{aligned} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ4.png)
![$$ r_{n} = \sum\limits_{i\, = \,1}^{I} {\frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }}} . $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ5.png)
Thus, given that the difficulties of the
items are known, for example, estimated using the procedures from
the last chapter, the one unknown value in Eq. (10.5) can be
calculated.
Solving the Equation by Iteration
This equation cannot be solved
explicitly and we rely on computers to solve it. The equation is
solved iteratively in a systematic way but iteratively. In
particular, an initial value of is started with, the probabilities
calculated and summed. If this sum is greater than
that indicates that our first
estimate of
is too large and that we should
reduce it a little. On the other hand, if it is less than
, it indicates that our estimate is
too small and that we should increase it a little. That is one
iteration . The same procedure is
continued with the new value, and this is the second iteration . When the sum of the probabilities is
close enough to
according to some criterion that is
set, for example, only 0.001 different from
, then the iterations are stopped and
it is said that the iterations have converged on a solution to the chosen
criterion of accuracy. The chosen criterion is called the convergence
criterion . You do not have to carry out these
calculations, but it helps to have an idea of how they are
done.
For example, in the above case of 10
items, suppose we knew the items to have the difficulties shown in
Table 10.2, and that we know the person’s total score was as above. Our first estimate for the
proficiency might be
(based on experience).
![$$ \begin{aligned} \sum\limits_{i\, = \,1}^{10} {\frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }}} & = \frac{{e^{{\beta_{n}^{(0)} - \delta_{1} }} }}{{1 + e^{{\beta_{n}^{(0)} - \delta_{1} }} }} + \frac{{e^{{\beta_{n}^{(0)} - \delta_{2} }} }}{{1 + e^{{\beta_{n}^{(0)} - \delta_{2} }} }} + \cdots \cdots + \frac{{e^{{\beta_{n}^{(0)} - \delta_{10} }} }}{{1 + e^{{\beta_{n}^{(0)} - \delta_{10} }} }} \\ & = \frac{{e^{0.25 + 2.5} }}{{1 + e^{0.25 + 2.5} }} + \frac{{e^{0.25 + 2.0} }}{{1 + e^{0.25 + 2.0} }} + \cdots \cdots + \frac{{e^{0.25 - 2.0} }}{{1 + e^{0.25 - 2.0} }} \\ & = 0.94 + 0.90 + 0.85 + 0.78 + 0.68 + 0.44\\ & \quad + 0.32 + 0.31 + 0.22 + 0.15 = 5.59. \\ \end{aligned} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equb.png)
This means that with a proficiency of
, the person would be expected to
obtain a score of 5.59. However, the person has a score of 6.0,
therefore, the proficiency estimate should be a little greater.
![$$ \beta_{n}^{(1)} = 0.40 $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq35.png)
![$$ \begin{aligned} \sum\limits_{i\, = \,1}^{10} {\frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }}} & = \frac{{e^{{\beta_{n}^{(1)} - \delta_{1} }} }}{{1 + e^{{\beta_{n}^{(1)} - \delta_{1} }} }} + \frac{{e^{{\beta_{n}^{(1)} - \delta_{2} }} }}{{1 + e^{{\beta_{n}^{(1)} - \delta_{2} }} }} + \cdots \cdots + \frac{{e^{{\beta_{n}^{(1)} - \delta_{10} }} }}{{1 + e^{{\beta_{n}^{(1)} - \delta_{10} }} }} \\ & = \frac{{e^{0.40 + 2.5} }}{{1 + e^{0.40 + 2.5} }} + \frac{{e^{0.40 + 2.0} }}{{1 + e^{0.40 + 2.0} }} + \cdots \cdots + \frac{{e^{0.40 - 2.0} }}{{1 + e^{0.40 - 2.0} }} \\ & = 0.95 + 0.92 + 0.87 + 0.80 + 0.71 + 0.48 + 0.35\\ & \quad + 0.34 + 0.25 + 0.17 = 5.84. \\ \end{aligned} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equc.png)
The value of must be a little greater than 0.40,
and so we could try 0.45. By this successive process, we would
reach
correct to two decimal places.
It sometimes happens that the process does not converge to a solution. However, this is rare in the Rasch model and if it occurs, there are mechanisms to make the algorithm a little more sophisticated and to obtain convergence. Most computer programs have this sophistication built into them. If the Rasch model equation really does not converge , then this is a property of the data and not the model. Again, this is rare, but it is possible. Fischer (1981) describes such a case.
Initial Estimates
![$$ \beta_{n} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq38.png)
![$$ \begin{aligned} r_{n} & = \sum\limits_{i = 1}^{I} {\frac{{e^{{\beta_{n} }} }}{{1 + e^{{\beta_{n} }} }}} = I\frac{{e^{{\beta_{n} }} }}{{1 + e^{{\beta_{n} }} }} \\ \frac{{r_{n} }}{I} & = \frac{{e^{{\beta_{n} }} }}{{1 + e^{{\beta_{n} }} }} \\ \end{aligned} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ6.png)
![$$ {\text{and}}\quad 1 - \frac{{r_{n} }}{I} = 1 - \frac{{e^{{\beta_{n} }} }}{{1 + e^{{\beta_{n} }} }}\quad {\text{i}} . {\text{e}} .\quad \frac{{I - r_{n} }}{I} = \frac{1}{{1 + e^{{\beta_{n} }} }} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equd.png)
![$$ \frac{I}{{I - r_{n} }} = 1 + e^{{\beta_{n} }} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ7.png)
Proficiency Estimates for Each Person
Proficiency estimates for the dichotomous items of Table 5.3, persons are ordered by proficiency and items by difficulty
Person |
Responses |
Total Score
|
Location |
SE |
---|---|---|---|---|
38 |
101101001010000000 |
6 |
−0.920 |
0.550 |
2 |
101101110100000000 |
7 |
−0.625 |
0.537 |
40 |
010111110000101000 |
8 |
−0.340 |
0.531 |
42 |
110011111110010000 |
10 |
0.227 |
0.538 |
41 |
111101111101000000 |
10 |
0.227 |
0.538 |
44 |
111101111111000000 |
11 |
0.523 |
0.551 |
8 |
111110110101110000 |
11 |
0.523 |
0.551 |
35 |
111111011101100000 |
11 |
0.523 |
0.551 |
11 |
101111111110011000 |
12 |
0.837 |
0.572 |
9 |
110111111011011000 |
12 |
0.837 |
0.572 |
46 |
111011011011011010 |
12 |
0.837 |
0.572 |
29 |
111101111011110000 |
12 |
0.837 |
0.572 |
25 |
111110101111110000 |
12 |
0.837 |
0.572 |
27 |
011101111101100111 |
13 |
1.181 |
0.602 |
18 |
110111110111101001 |
13 |
1.181 |
0.602 |
36 |
111011110111111000 |
13 |
1.181 |
0.602 |
37 |
111101111111101000 |
13 |
1.181 |
0.602 |
20 |
111110011111101100 |
13 |
1.181 |
0.602 |
48 |
111110101111111000 |
13 |
1.181 |
0.602 |
13 |
111111011011110100 |
13 |
1.181 |
0.602 |
34 |
111111011100111100 |
13 |
1.181 |
0.602 |
32 |
111111101011111000 |
13 |
1.181 |
0.602 |
22 |
111111101101101001 |
13 |
1.181 |
0.602 |
43 |
111111111101110000 |
13 |
1.181 |
0.602 |
14 |
111110101111011110 |
14 |
1.570 |
0.647 |
12 |
111111100101011111 |
14 |
1.570 |
0.647 |
15 |
111111100110111110 |
14 |
1.570 |
0.647 |
21 |
111111111000111110 |
14 |
1.570 |
0.647 |
5 |
111111111010011110 |
14 |
1.570 |
0.647 |
4 |
111111111110011100 |
14 |
1.570 |
0.647 |
16 |
111111111110111000 |
14 |
1.570 |
0.647 |
45 |
111111111111000011 |
14 |
1.570 |
0.647 |
17 |
111111111111100100 |
14 |
1.570 |
0.647 |
7 |
111111110111111100 |
15 |
2.030 |
0.714 |
50 |
111111111110011110 |
15 |
2.030 |
0.714 |
49 |
111111111110111100 |
15 |
2.030 |
0.714 |
23 |
111111111111110001 |
15 |
2.030 |
0.714 |
6 |
111111111111111000 |
15 |
2.030 |
0.714 |
24 |
111111111111111000 |
15 |
2.030 |
0.714 |
33 |
111110111111111101 |
16 |
2.615 |
0.826 |
26 |
111111011111111101 |
16 |
2.615 |
0.826 |
10 |
111111111110110111 |
16 |
2.615 |
0.826 |
31 |
111111111111101110 |
16 |
2.615 |
0.826 |
19 |
111111111111111010 |
16 |
2.615 |
0.826 |
30 |
101111111111111111 |
17 |
3.481 |
1.081 |
28 |
111011111111111111 |
17 |
3.481 |
1.081 |
1 |
111111111111101111 |
17 |
3.481 |
1.081 |
39 |
111111111111111101 |
17 |
3.481 |
1.081 |
47 |
111111111111111101 |
17 |
3.481 |
1.081 |
3 |
111111111111111111 |
18 |
+ ∞ (4.762) |
∞ (1.658) |
For Responses to the Same Items, the Same Total Score Leads to the Same Person Estimate
Where students
respond to the same items, then irrespective of the pattern of
responses, the same total score
leads to the same proficiency
estimate. This is evident from Eq. (10.5), in which there is
no information about the actual responses—only the total score is used. It is a manifestation of the sufficiency of the total score for the person
parameter .
Estimate for a Score of 0 or Maximum Score
For a person with the maximum total score of 18, the proficiency estimate is infinite (+∞).
This is because the person’s proficiency is above the limit of the
difficulty of the test , and the
probability of a correct response must be 1.00 for all the items.
It is as if an adult stood on a weighing machine for babies, and
the indicator hit the top of the available scale , in which case the person’s weight is
unknown. Clearly, the person is beyond the limit measurable by the
particular machine. In that case, it would be necessary to use a
weighing machine that measures greater weights. In the example with
items, the person would take more difficult items to establish a
finite estimate of proficiency. Thus, the person is not thought to
actually have infinite proficiency, it is just that a finite
estimate cannot be obtained from these particular items. Likewise,
if a person answered all the items incorrectly, then the person
would have a proficiency estimate of . In order
to get a finite estimate of proficiency for such a person, easier
items should be used.
Sometimes groups of people need to be compared say for relative improvement or for baseline data. For example, we may have responses from boys and girls and we might have assessed them on some proficiency before some program of teaching is in place. If different numbers of boys and girls obtain a 0 score or maximum score, it would bias the comparison of the boys and girls if they were left out. Although they are left out in the item calibration , they cannot be left out of the group comparisons. Therefore, there is the need to provide an estimate of a person with a maximum (or minimum score of 0). Different methods have been devised for this purpose, and they all involve some extra assumption or reasoning that goes beyond the model itself. Some methods make it explicit that the person with a maximum or minimum score belongs to the population of persons. In RUMM2030 (Andrich, Sheridan, & Luo, 2018), a value is extrapolated by observing that the relative differences in successive values at the extremes increase. Thus, in this example, the successive difference between the scores of 15 and 14, 16 and 15 and 17 and 16 are 0.46, 0.59 and 0.87, showing successive increases. The procedure in RUMM2030 specifically uses the geometric mean of the differences of the three scores before the maximum score. As a result, the extrapolated value for a score of 18 is 4.762. The same principle is used to extrapolate the value for a score of 0, shown later for this example.
The Standard Error of Measurement of a Person
There is an extra column in Table 10.3 giving the standard error of measurement for each person. We do not derive this equation in this book, but it also arises directly from maximum likelihood theory.
![$$ \sigma_{{\hat{\beta }}} = \frac{1}{{\sqrt {\sum\nolimits_{i = 1}^{I} {P_{ni} (1 - P_{ni} )} } }} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ9.png)
Unlike CTT , the standard error is not the same for all persons. Compare Eq. (10.9) with Eq. (3.5) of Chap. 3 where the CTT standard error is a function of test reliability and variance. In the dichotomous RM the standard error of measurement depends on the total score ; if a person has answered few or very many items correctly, then the standard error is greater than if the person has answered a moderate number of items correctly. If a person has answered all the items correctly, then the standard error is infinitely large. This is consistent with not having a finite estimate of the person’s proficiency. Again, RUMM2030 provides a value using Eq. (10.9) for the extrapolated value . As expected, it is large and larger than the standard error for the score one less than the maximum.
Proficiency Estimate for Each Total Score When All Persons Respond to the Same Items
Total scores, frequencies, proficiency estimates and standard errors
Raw score |
Frequency |
Location (MLE) |
Std Error |
---|---|---|---|
0 |
0 |
− ∞ (−4.543) |
∞ (1.679) |
1 |
0 |
−3.316 |
1.050 |
2 |
0 |
−2.515 |
0.784 |
3 |
0 |
−1.993 |
0.672 |
4 |
0 |
−1.584 |
0.611 |
5 |
0 |
−1.235 |
0.573 |
6 |
1 |
−0.920 |
0.550 |
7 |
1 |
−0.625 |
0.537 |
8 |
1 |
−0.340 |
0.531 |
9 |
0 |
−0.059 |
0.531 |
10 |
2 |
0.227 |
0.538 |
11 |
3 |
0.523 |
0.551 |
12 |
5 |
0.837 |
0.572 |
13 |
11 |
1.181 |
0.602 |
14 |
9 |
1.570 |
0.647 |
15 |
6 |
2.030 |
0.714 |
16 |
5 |
2.615 |
0.826 |
17 |
5 |
3.481 |
1.081 |
18 |
1 |
+ ∞ (4.762) |
∞ (1.658) |
Two special features of Table 10.4 are noted. First, total scores with zero frequency (e.g. a score of 9) have proficiency estimates, and second transformation from a total score to an estimate is non-linear.
Estimates for Every Total Score
There are some scores in Table 10.4 which no one has achieved. For example, there is no one with a score of 1, 2, 3, 4, 5 and 9. Nevertheless, there is a proficiency estimate associated with these scores. This is because, given the difficulty of the items, the proficiency for each total score can be estimated from Eq. (10.5). Likewise, the standard error for these scores can be estimated from Eq. (10.9).
![$$ 9 = \sum\limits_{i = 1}^{18} {\frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }}} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Eque.png)
Non-linear Transformation from Raw Score to Person Estimate
Although the distance between all successive total scores is 1, the distances between the proficiency estimates for successive total scores are different. For example, the proficiency difference between scores of 4 and 5 is −1.584 to (−1.235) = −0.349 while the difference between scores of 10 and 11 is 0.227–(0.523) = −0.296. These differences reflect the non-linear transformation of the raw scores to the estimates. This non-linear transformation is an attempt to undo the implicit effects of constrained, finite, minimum and maximum scores. When there are many maximum scores because there are not enough items of greater difficulty, and when there are many scores of 0 because there are not enough easier items, it is said that there is a ceiling and floor effect, respectively.
![/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Fig1_HTML.png](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Fig1_HTML.png)
Non-linear transformation of the total score to an estimate
![/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Fig2_HTML.png](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Fig2_HTML.png)
Standard errors as a function of the estimates
Displaying Person and Item Estimates on the Same Continuum
![/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Fig3_HTML.png](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Fig3_HTML.png)
Item and person estimates on the latent continuum
CTT Reliability Calculated from Rasch Person Parameter Estimates
The calculation of a reliability index has not been very common in modern test theory . However, it is possible to construct an index of reliability which is analogous in calculation and interpretation, and generally in value, using Rasch measurement theory . We demonstrate its construction first and then comment on its interpretation.
Derivation of ![$$ r_{\beta } $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq43.png)
Given the estimates of proficiency and the standard error of these estimates, it is possible to calculate a reliability index in a simple way.
![$$ \hat{\beta }_{n} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq44.png)
![$$ \hat{\beta }_{n} = \beta_{n} + \varepsilon_{n} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ11.png)
![$$ s_{x}^{2} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq45.png)
![$$ \hat{\sigma }_{{\hat{\beta }}}^{2} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq46.png)
![$$ \hat{\sigma }_{{\hat{\beta }}}^{2} = \frac{{\sum\nolimits_{n = 1}^{N} {\left( {\hat{\beta }_{n} - \bar{\hat{\beta }}} \right)^{2} } }}{N - 1} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ12.png)
![$$ \bar{\hat{\beta }} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq47.png)
![$$ \hat{\sigma }_{{\hat{\varepsilon }}}^{2} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq48.png)
![$$ \hat{\sigma }_{{\hat{\varepsilon }}}^{2} = \frac{{\sum\nolimits_{n = 1}^{N} {\hat{\sigma }_{n}^{2} } }}{N} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ13.png)
The key feature of reliability in CTT is that it indicates the degree to which there is systematic variance among the persons relative to the error variance—it is the ratio of the estimated true variance relative to the true variance plus the error variance. In CTT , the reliability index can give the impression that it is a property of the test , when it is a property of the persons as identified by the test . The same test administered to people of a similar class of persons, but with a smaller true variance would be shown to have a lower reliability . Thus, the index needs to be interpreted with the distribution of the persons in mind.
![$$ r_{\beta } $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_IEq49.png)
![$$ r_{\beta } = \frac{{\hat{\sigma }_{\beta }^{\,2} - \hat{\sigma }_{\varepsilon }^{\,2} }}{{\hat{\sigma }_{\beta }^{\,2} }} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ14.png)
In the case with person/item
distributions that are standard and to which the
CTT reliability is correctly
applied, the values of the coefficient and those obtained from
Eq. (10.14) are very similar (Andrich, 1982).
However, in cases where coefficient
should not really be interpreted, the
values might vary. The situation can occur when there is an
artificially skewed distribution of scores in which there are floor
or ceiling effects in the responses. Then the assumption that the
sum of the item scores is effectively unbounded is grossly
violated, and the coefficient
becomes inflated. It is inflated
effectively because the scores of each person on the items may be
more similar than they would be if there were no floor or ceiling
effect. On the other hand, the error in the Rasch model is larger
at the low and high scores and therefore
will be larger than
. In cases where every person responds
to every item, both can be calculated and compared. If they are
very different, then the person/item distribution should be
reexamined before either is interpreted. In any case, the
interpretation of this index requires an examination of the
person/item distribution and if there are floor and ceiling effects
these should be noted. There are of course other factors that can
affect the interpretation of this index and this is just one of
them. Floor and ceiling effects will generate different values of
compared to
.
Example 2
For the data including graded responses that are analyzed in Part III of this book,
and
Therefore, .
These are very moderate values and are explained by the fact that the test was a little easy and persons were grouped at the top end of the range, and that the test is short. That this index is smaller when the data are grouped indicates that the responses within the items that are combined have some dependencies and that the dichotomous data gave an artificially high reliability .
In addition to providing the same kind
of information as the index , this index is readily calculated if
there are missing data without any extra assumptions needing to be
made. Missing data can occur either with some people missing some
items at random or when there is some structural missing data, for
example, different groups of persons are not all given the same
items. This case is considered in the chapter where linking tests with common items are discussed.
In addition, as is considered in Part II of this book, the index is relevant in the power to detect misfit of the responses to the Rasch model.
Principle of Maximum Likelihood
We now take the opportunity to show more explicitly the idea of maximum likelihood, which is central to estimation in the Rasch model and statistics in general. Again, some of this material could be a statistics review, but because it is central to the Rasch model we have retained it in the main section of the book.
![$$ \Pr \{ x_{ni} = 1\} = \frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }};\quad \Pr \{ x_{ni} = 0\} = \frac{1}{{1 + e^{{\beta_{n} - \delta_{i} }} }} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ15.png)
![$$ \Pr \{ x_{ni} \} = \frac{{e^{{x_{ni} (\beta_{n} - \delta_{i} )}} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ16.png)
![$$ \begin{aligned} \Pr \{ (x_{1i} )\} & = \prod\limits_{i = 1}^{18} {\frac{{e^{{x_{1i} (\beta_{1} - \delta_{i} )}} }}{{1 + e^{{\beta_{1} - \delta_{i} }} }}} \\ & = \frac{{e^{{1(\beta_{1} - \delta_{1} )}} }}{{1 + e^{{\beta_{1} - \delta_{1} }} }}\frac{{e^{{0(\beta_{1} - \delta_{2} )}} }}{{1 + e^{{\beta_{1} - \delta_{2} }} }}\frac{{e^{{1(\beta_{1} - \delta_{3} )}} }}{{1 + e^{{\beta_{1} - \delta_{3} }} }} \ldots \frac{{e^{{0(\beta_{1} - \delta_{18} )}} }}{{1 + e^{{\beta_{1} - \delta_{18} }} }}. \\ \end{aligned} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ17.png)
This equation can be simplified by simply summing the exponents of the numerators, and multiplying the denominator term which is exactly the same in every term.
![$$ \begin{aligned} \Pr \{ (x_{1i} )\} & = \prod\limits_{i = 1}^{18} {\frac{{e^{{x_{1i} (\beta_{1} - \delta_{i} )}} }}{{1 + e^{{\beta_{1} - \delta_{i} }} }}} \\ & = \frac{{e^{{6\beta_{1} - \sum\nolimits_{i = 1}^{k} {x_{1i} } \delta_{i} }} }}{{\prod\nolimits_{i = 1}^{18} {(1 + e^{{\beta_{1} - \delta_{i} }} )} }}. \\ \end{aligned} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ18.png)
Notice that the coefficient of the
person proficiency in the numerator is the person’s
total score —the sufficient statistic . The other term in the
numerator is simply the sum of parameters of the items that the
person has answered correctly. We will see that this term plays no
role in the final equation.
![$$ \begin{aligned} L & = \Pr \{ (x_{ni} )\} = \prod\limits_{n\, = \,1}^{N} {\prod\limits_{i\, = \,1}^{I} {\frac{{e^{{x_{ni} (\beta_{n} - \delta_{i} )}} }}{{1 + e^{{\beta n - \delta_{i} }} }}} } \\ & = \frac{{e^{{1(\beta_{1} - \delta_{1} )}} }}{{1 + e^{{\beta_{1} - \delta_{1} }} }}\frac{{e^{{0(\beta_{1} - \delta_{2} )}} }}{{1 + e^{{\beta_{1} - \delta_{2} }} }}\frac{{e^{{1(\beta_{1} - \delta_{3} )}} }}{{1 + e^{{\beta_{1} - \delta_{3} }} }} \cdots \frac{{e^{{0(\beta_{1} - \delta_{18} )}} }}{{1 + e^{{\beta_{1} - \delta_{18} }} }} \cdots \cdots \\ & \quad \frac{{e^{{1(\beta_{49} - \delta_{1} )}} }}{{1 + e^{{\beta_{49} - \delta_{1} }} }}\frac{{e^{{1(\beta_{49} - \delta_{2} )}} }}{{1 + e^{{\beta_{49} - \delta_{2} }} }}\frac{{e^{{1(\beta_{49} - \delta_{3} )}} }}{{1 + e^{{\beta_{49} - \delta_{3} }} }} \cdots \frac{{e^{{1(\beta_{49} - \delta_{18} )}} }}{{1 + e^{{\beta_{49} - \delta_{18} }} }} \\ & = e^{{6\beta_{1} + \cdots \cdots + 17\beta_{49} - \sum\nolimits_{n\, = \,1}^{N} {\sum\nolimits_{i\, = \,1}^{I} {x_{ni} \delta_{i} } } }} \prod\limits_{n\, = \,1}^{N} {\prod\limits_{i\, = \,1}^{I} {\frac{1}{{1 + e^{{\beta n - \delta_{i} }} }}} } \\ \end{aligned} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ19.png)
The last person (Person 3) is not included in Eq. (10.19) because that person has the maximum score of 18 and the person’s theoretical estimate is +∞. This person’s estimate is extrapolated, not estimated.
L in front of this equation stands for the Likelihood of the responses, which is the joint probability of the matrix of all responses across persons and items.
![$$ \ln L = 6\beta_{1} + \cdots \cdots + 17\beta_{49} - \sum\limits_{n\, = \,1}^{N} {\sum\limits_{i\, = \,1}^{I} {x_{ni} \delta_{i} } } - \ln \sum\limits_{n\, = \,1}^{N} {\sum\limits_{i\, = \,1}^{I} {1 + e^{{\beta n - \delta_{i} }} } } $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ20.png)
Now the task is to find the value for each person that gives the
maximum value for Eq. (10.20), which is the same value that maximizes
the likelihood of Eq. (10.19). For example, we could try different
values as we did above in obtaining a person’s estimate. To obtain
the equation for the maximum value requires calculus. It involves
differentiating Eq. (10.20) successively with respect to each
person’s parameter
. It turns out that equation is
exactly Eq. (10.5) we used above.
![$$ r_{n} = \sum\limits_{i\, = \,1}^{I} {\frac{{e^{{\beta_{n} - \delta_{i} }} }}{{1 + e^{{\beta_{n} - \delta_{i} }} }}} $$](/epubstore/A/D-Andrich/A-Course-In-Rasch-Measurement-Theory/OEBPS/images/470896_1_En_10_Chapter/470896_1_En_10_Chapter_TeX_Equ21.png)
As indicated in Chap. 9, maximum likelihood estimation is analogous but a different principle from that which is used in regression described in the statistics reviews. In regression, the criterion for the estimates is that the parameter estimates of the model are such that the residuals between the model and the data are minimized. In maximum likelihood, the parameter estimates of the model are such that the likelihood of the data is a maximum. Often, but not always, minimizing residuals and maximizing the likelihood give the same estimates.
Bias in the Estimate
The estimates of person parameters in the Rasch model are biased in the sense that with a fixed number of items, the person parameters at extremes are a little more extreme than they should be. The probabilities that are estimated are not biased, but the non-linear relationship between the person parameters and the probabilities creates a bias. This bias tends to 0 as the number of items is increased, and tends to 0 more quickly if the person and item distributions are well aligned. Various software packages for the person estimation have modifications to the maximum likelihood estimates which shrink the extreme values; RUMM2030 is one of these.