This chapter revises and goes beyond Chap. 20 in understanding the polytomous Rasch model . The chapter revises the model for more than two ordered categories as a direct, generalization of the simple logistic model of Rasch for dichotomous responses. Examples of such a response format include the familiar Likert-style attitude or personality items in questionnaires, as well as some partial credit structures in assessing proficiency.
Key features of the model are that (i) the successive categories are scored with successive integers as is done in Classical Test Theory (CTT) ; (ii) nevertheless, distances between thresholds which define the categories are estimated and no assumptions about equal sizes of the categories need be made; (iii) the model retains the distinctive properties of Rasch’s model in terms of the separation of person and item parameters; (iv) the threshold estimates do not have to be in the natural order . The ordering of threshold estimates is a property of the data, and if the estimates are not in their natural order , it shows that the categories are not working as required.
Andrich (1978) shows the original derivation of the model in which the scoring of the categories by integers and the interpretation of the more general parameters derived by Rasch and Andersen were clarified. Andrich (2010a, b) shows recent summaries of the model.
Rated Data in the Social Sciences
- (a)
According to Dawes (1972) 60% of studies have only rated dependent variables. Since then, because of an increase in performance assessment and applications in health outcomes, it is likely to have increased.
- (b)
Rating is often used in place of measurement , but the measurement analogy is retained, e.g. grading of essays as in the structure below.
- (c)
Often more than one sub-criterion is used, though in the end the scores on these criteria are aggregated, e.g. for the grading of essays:
Criterion |
Rating |
for person n |
---|---|---|
(i) Organization |
0–3 |
|
(ii) Content |
0–4 |
|
(iii) Grammar |
0–5 |
|
Total rating |
|
Successive categories are scored with successive integers, irrespective of threshold distances between categories. Note that different criteria can have different numbers of categories.
The Partial Credit and Rating Scale Specifications
Often when the number of categories is the same for all items, and the format for all items is the same, it is possible to estimate only one set of thresholds for all items. In that case, the model has been called the rating scale model. When the items have different numbers of categories, or where all the items have the same number of categories but the thresholds are estimated for each of the items, the model has been called the partial credit model (Masters, 2016). However, these are modifications only to the number of parameters estimated—the response structure for the response of a person to an item is identical. Therefore, unless specialized, we refer to the model as the Polytomous Rasch Model (PRM), and refer to the former as the rating scale parameterization and the latter as the partial credit parameterization. We write this difference formally in a subsequent section.
The Generalization to Three Ordered Categories
where .
Notice that the denominator is still the sum of all the terms of the numerator.
where and is the vector of thresholds of the item.
Notice that in the above formulation we make the thresholds sum to zero: .
This means that the parameter is the difficulty of the item and that the thresholds are located around the item, with the item’s difficulty in the middle of the thresholds .
Note that the successive categories are scored by successive integers beginning with 0, where 0, 1, 2 is just an extension of the 0, 1 scoring for two ordered categories . In addition, even though the successive categories are scored with successive integers, the thresholds are estimated. They do not have to be equidistant.
In early discussions of more formal models for ordered categories there was a belief that integer scoring required equal distances. You can read in the literature the reason the integer scoring appears, but it is not because the distances between categories are somehow equal. In any case, with three categories, there are just two thresholds and there is only one distance as such between them, not three.
The Expected Value Curve
We saw in Statistics Review 14 that the expected value E[X] and the probability of the response of 1, in the dichotomous Bernoulli variable, was the same: .
-
is the probability of a response with score x.
We notice that there is a slope parameter with this item. It is the rate of change of the expected value at the location of the item . The slope is considered further later in the chapter.
The Structure of the PRM
The structure of the PRM and its relation to the dichotomous Rasch model at the thresholds , can be seen by forming the probability of a response in the higher of two adjacent categories, given that the response is in one of these two categories.
where . We can see that this is just the dichotomous Rasch model with the difficulty of the item in the dichotomous model replaced by the difficulty of threshold 2, where the difficulty of the threshold, relative to the item’s difficulty , is added to this item’s difficulty.
Thus, the structure of the polytomous item is that the thresholds are simply characterized by the dichotomous Rasch model. These curves are parallel as in the usual case of dichotomous items which form a single set of items.
The Generalization to Any Number of Categories m + 1
where and .
where and the are known as category coefficients .
In the simple specialization of Eq. (21.1) to the dichotomous case when m = 1, .
The Slope of E[X]
Latent Threshold Curves
The threshold curves are shown in dotted lines because they are not observed. They are part of the latent structure of the PRM. The threshold characteristic curve, the conditional probability of the higher of two categories given that the response is in one of the two categories, is inferred—there is no dichotomous response at a threshold.
Diagnosing Problems with the Functioning of the Categories
The Rasch model for ordered categories has two unusual properties. First, in the model, and this was known to Rasch (1966), it is not an arbitrary matter to collapse and combine categories. If one has three categories functioning well, then if two categories are combined, they will not fit the data as well. Second, the thresholds that define the categories on the continuum do not need to be correctly ordered. If the estimates from the data are not correctly ordered, then this means that the categories are not functioning as intended.
Frequencies of responses in each category highlighting item 10
Seq |
Code |
Cat 1 |
Cat 2 |
Cat 3 |
Cat 4 |
---|---|---|---|---|---|
1 |
I0001 |
82 |
65 |
2 |
0 |
2 |
I0002 |
8 |
34 |
87 |
20 |
3 |
I0003 |
19 |
85 |
44 |
1 |
4 |
I0004 |
12 |
36 |
61 |
39 |
5 |
I0005 |
39 |
89 |
17 |
4 |
6 |
I0006 |
2 |
57 |
73 |
17 |
7 |
I0007 |
22 |
81 |
39 |
7 |
8 |
I0008 |
26 |
93 |
26 |
4 |
9 |
I0009 |
46 |
87 |
12 |
4 |
10 |
I0010 |
33 |
113 |
2 |
1 |
11 |
I0011 |
30 |
70 |
42 |
7 |
12 |
I0012 |
11 |
133 |
5 |
0 |
13 |
I0013 |
104 |
36 |
6 |
3 |
14 |
I0014 |
27 |
119 |
3 |
0 |
15 |
I0015 |
22 |
59 |
66 |
2 |
16 |
I0016 |
15 |
75 |
42 |
17 |
17 |
I0017 |
76 |
69 |
2 |
2 |
18 |
I0018 |
64 |
75 |
10 |
0 |
19 |
I0019 |
36 |
106 |
6 |
1 |
20 |
I0020 |
52 |
81 |
14 |
2 |
21 |
I0021 |
19 |
71 |
39 |
20 |
22 |
I0022 |
13 |
94 |
36 |
6 |
23 |
I0023 |
35 |
86 |
26 |
2 |
24 |
I0024 |
20 |
59 |
49 |
21 |
25 |
I0025 |
13 |
82 |
51 |
3 |
26 |
I0026 |
62 |
60 |
23 |
4 |
27 |
I0027 |
31 |
75 |
36 |
7 |
However, it is possible to get reversed thresholds even if there are many people in the relevant categories. Andrich (2011) shows such an example. Also, examples where there are structural explanations of reversed thresholds in the assessment of educational proficiency is given in van Wyke (2003). He considers the implication of evidence of reversed thresholds and shows the application of the model to construct a continuum of educational proficiency in mathematics.
The Partial Credit and Rating Parameterizations of the PRM
The Rating Scale Parameterization
Strongly Disagree (SD) |
Disagree (D) |
Agree (A) |
Strongly Agree (SA) |
where the thresholds are not subscripted by the item parameter, which the overall location is so subscripted.
Of course, it is possible that the threshold estimates are not equidistant across items. In that case, there is an interaction between the distances between the thresholds and the items.
Sometimes there are two kinds of items, for example those worded positively and negatively. In that case, it might be that the negatively worded items have similar threshold distances and the positively worded items have similar threshold distances, but that the positively and negatively worded items have different threshold distances. We may write the model differently, with the derivation below.
where now and .
The thresholds for all items have the same origin and can be compared across items. The thresholds are mean deviated from each item’s difficulty. Both forms can be instructive depending on the context of interpretation. In RUMM2030 are called centralized thresholds because they are centred about the item’s location , and are called uncentralized thresholds .
The Partial Credit Parameterization
where .
In summary, the only difference between the rating and partial credit parametrizations is that the former has all items having the same number of thresholds and the thresholds , which are deviations from the location of the items, are all the same, while in the latter the mean deviated thresholds are not the same. The rating and partial credit parameterizations can both be used when all items have the same number of categories. In the case that there are different numbers of categories among items, then the partial credit parameterization needs to be used. When the item location is added to the thresholds , then the thresholds are referenced to the same origin and can be compared.