Additive Kappa can be Increased by Combining Adjacent Categories

Weighted kappa is a measure that is commonly used for quantifying similarity between two ordinal variables with identical categories. Additive kappa is a special case of weighted kappa that allows the researcher to specify distances between adjacent categories. It is shown that additive kappa is a weighted average of the additive kappas of all collapsed tables of a speciﬁc size. It follows that, if the reliability of a categorical rating instrument is assessed with additive kappa, the reliability can be increased by combining categories


Introduction
In pattern recognition, classification and data analysis, similarity measures are used to quantify the strength of a relationship between two variables.Commonly used examples are, Pearson's product-moment correlation for measuring linear dependence between two numerical variables, the Jaccard coefficient for measuring co-occurrence of two species types, and the Hubert-Arabie adjusted Rand index for comparing partitions of two different clustering algorithms.A commonly used coefficient for measuring similarity between two ordinal variables with identical categories is the weighted kappa measure [1][2][3][4].Moreover, weighted kappa is a standard tool for assessing reliability of categorical rating instruments and scales in social, behavioral and medical sciences [2,4].
Suppose the data consist of two variables X and Y with identical ordinal categories A 1 , A 2 , . . ., A c .For example, if the rating scale would measure rigidity of a subject, the category labels could be, Absent, Slight, Moderate, Severe.The variables contain categorical scores of n subjects.For a population of subjects, let π ij denote the proportion classified in category i on X and in category j on Y , where i, j ∈ {1, 2, . . ., c}.Furthermore, define π i+ := c j=1 π ij and π +i := c j=1 π ji .The marginal probabilities π i+ and π +i reflect how many subjects are in category A i of X and Y , respectively.
With ordered categories dissimilarity between the variables on adjacent categories is usually of greater importance than dissimilarity on categories that are further apart.Weighted kappa allows the user to specify weights to describe the closeness between categories.If the weights are distances, pairs of categories that are further apart are assigned higher weights.Let w ij ≥ 0 for i, j ∈ {1, 2, . . ., c} be non-negative real numbers, with w ii = 0.The weighted observed dissimilarity between the variables is defined as O := c i=1 c j=1 w ij π ij .Weighted kappa corrects for (dis)similarity due to chance.The expectation of the weighted dissimilarity under independence is given by E := c i=1 c j=1 w ij π i+ π +j .Cohen's weighted kappa is then defined as [1][2][3][4] Measure ( 1) is a function from the set of all square contingency tables to the interval [1, −1].Its value is 1 when O = 0 (X = Y ), 0 when O = E, and negative when O > E. The fraction on the right-hand side of (1) shows that the value of ( 1) is invariant under multiplication of the weights w ij by a positive constant.
Let d 1 , d 2 , . . ., d c−1 ≥ 0 be distances between the c − 1 pairs of adjacent categories.The additive weights are defined as ( The weight in (2) can be seen as a distance between categories A i and A j on an underlying one-dimensional interval scale.If category A 1 is the origin then the amounts d k for k ∈ {1, 2, . . ., c − 1} indicate the relative locations of categories A 2 , A 3 , . . ., A c respectively.The weights are called additive weights since additivity holds between these distances.
Substituting (2) into (1) we obtain additive kappa [3] If we have d k = 1 for k ∈ {1, 2, . . ., c − 1} the weights in (2) are identical to the so-called linear weights [1,4].A limitation of linear kappa is that the ordered categories are assumed to be equidistant, which is an unreasonable assumption for many ordinal variables in real life applications.

A weighted average
In reliability studies it is sometimes desirable to combine some of the categories and shorten the rating scale, for example, when two categories are easily confused [1].With ordered categories it only makes sense to combine categories that are adjacent in the ordering.Theorem 2.1 below shows that the overall additive kappa is a weighted average of additive kappas of all collapsed tables of a specific size.
If the agreement table has c categories additive kappa requires the specification of the c − 1 distances d 1 , d 2 , . . ., d c−1 for weighting scheme (2).If we combine categories the collapsed table has less categories and we have to specify a new set of distances between the categories.We will use the following rule.If we combine the categories A k and A k+1 the distance d k drops from the weighting scheme.The set of distances for the weighting scheme of the collapsed (c − 1) × (c − 1) table is then given by d 1 , . . ., d k−1 , d k+1 , . . ., d c−1 .If we combine multiple categories at once all distances between the associated categories are dropped from the weighting scheme.
Suppose the agreement table has c categories and let m ∈ {2, . . ., c − 1} be fixed.The agreement table of size c × c becomes an m × m table if we combine c − m pairs of adjacent categories.Since we have c − 1 pairs there are ways to choose c − m from the c − 1 pairs.Thus, the agreement table of size c × c can be collapsed into M (c, m) distinct tables of size m × m.With regard to Theorem 2.1 below let O and E for ∈ {1, 2, . . ., M } denote the observed and expected weighted disagreement of these M (c, m) smaller tables.Furthermore, define The κ denote the additive kappas of the M subtables.Theorem 2.1 shows that the overall additive kappa κ a is a weighted average of the additive kappas κ of the subtables.The weights are the denominators E of the weighted kappas.Theorem 2.1 generalizes the main results in [1,3].The proof below consists of new arguments and provides more inside than the technical proof used in [1].
Theorem 2.1.Consider an agreement table {π ij } with c ≥ 3 categories and consider the M collapsed tables of size m × m.We have Proof.Let O a and E a denote, respectively, the weighted observed dissimilarity and the expectation of the weighted dissimilarity under independence of weighted kappa κ a .We first derive the identity where Consider an arbitrary element π ij of {π ij }.If i = j we have w ii = 0. Therefore, assume i = j.Since κ a and the κ are symmetric, the elements π ij and π ji have the same weights.Therefore, assume i < j.The weight of π ij in O a is the total distance between categories A i and A j , given in (2).If we combine two categories A k and A k+1 the distance d k drops from the weighting scheme and is not used in the calculation of the weights.For an m × m table the weight of π ij is thus smaller than w ij .If we consider all M tables of size m × m each distance d k drops out the weighting scheme the same number of times.
Hence, since we sum over all M tables in (7) it suffices to determine how often a specific distance d k drops from the weighting scheme.
The number of times a distance d k drops out the weighting scheme of an m × m table is given by c which is the number of ways to choose c − m − 1 distances from the remaining c − 2 pairs of distances.Since the number of times a distance d k is involved in the calculation of the weights of an m × m table is given in (8).Hence, if we sum over all M subtables of size m × m we obtain the identity in (7).
Next, applying similar arguments to the c × c table {π i+ π +j } and the E , we obtain the identity Finally, using ( 7) and ( 11), together with the identity we have 3 Conclusion Theorem 2.1 shows that the overall additive kappa is a weighted average of additive kappas of all collapsed tables of a specific size.Theorem 2.1 shows in particular that the additive kappa of an c × c table is a weighted average of the additive kappas of all (c − 1) × (c − 1) tables that are obtained by combining two adjacent categories.If the data do not have a particular structure [5] then these additive kappas are all distinct.This implies that there in general exist two categories such that, when combined, additive kappa increases.In addition, there exist two categories such that, when combined, additive kappa decreases.Theorem 2.1 thus implies an existence result.Moreover, if we measure inter-observer reliability in terms of additive kappa, the reliability can thus be increased by shortening the rating scale.