Comparing Rating Scales of Different Lengths: Equivalence of Scores from 5-Point and 7-Point Scales

Using a self-administered questionnaire, 227 respondents rated service elements associated with a restaurant, retail store, or public transport company on several 5-point and 7-point rating scales. Least-squares regression showed that linear equations for estimating 7-point from 5-point and 5-point from 7-point ratings explained over 85% of the variance and fitted the data almost as well as higher-order polynomials and power functions. In a cross-validation on a new data set the proportion of variance explained fell to about 76%. Functionally inverse versions of the derived linear equations were calculated for the convenience of researchers and psychometricians.

tion of variance explained f d to about 76%. Functionally inverse versions of the derived h e a r equations were calculated for the convenience of researchers and psychometricians.
Circumstances sometimes occur in which researchers or applied psychologists have to compare scores derived from rating scales with different numbers of response categories. In longitudinal research designs in psychology, education, marketing, and many areas of social research, for example, a 5-point scale that has been used for some time may be replaced by a new 7-point scale, or vice versa, and researchers may wish to establish a basis for continuity to enable comparisons to be made between the old and new data. In other circumstances, researchers may wish to compare newly collected 5point data with 7-point data already published in a journal article or report or to compare different published sets of 5-point and 7-point data.
A number of studies (reviewed by Cox, 1980) have been conducted to examine the effects of dderent numbers of response categories on the reliabhty and validity of rating scales and the response patterns generated by them (e.g., Cicchetti, Showalter, & Tyrer, 1985;Matell & Jacoby, 1971; 'Preparation of this article was supported by research Grant M88 from BEM Research to Andrew M. Colman and Claire E. Norris. We are gateful to Dr. Chris Nicklin for his technical advice and assistance, and to Gareth G. Jones of BEM for drawing our attention to the yroblem discussed in this article. Schutz & Rucker, 1975). In contemporary psychometric practice, the majority of rating scales, Lkert scales, and other attitude and opinion measures contain either five or seven response categories (Bearden, Netmeyer, & Mobley, 1993;Shaw & Wright, 1967). Symonds (1924) was the first to suggest that reliability is optimized with seven response categories, and other early investigations tended to agree (see hise ell; 1955, for a comprehensive review of early research). In an influential review article, Mder (1956) argued that the human mind has a span of absolute judgment that can distinguish about seven distinct categories, a span of immediate memory for about seven items, and a span of attention that can encompass about six objects at a time, which suggested that any increase in number of response categories beyond six or seven might be futile. Odd numbers of response categories have generally been preferred to even numbers because they allow the middle category to be interpreted as a neutral point, and more recent research (e.g., Green & Rao, 1970;Neumann & Neumann, 1981) has tended to reinforce the general preference for 5-point or 7-point scales.
We shall confine our attention to the comparabihty, equivalence, and estimation-in both drections-between 5-point and 7-point scales, although the mathematical and empirical methods may be generalized to rating scales that dlffer arbitrarily in numbers of response categories. We shall first &scuss naive mathematical solutions, and we shall explain why these solutions are fundamentally untrustworthy. We shall then outline empirical solutions, based on the ratings given by respondents in a large-scale survey of attitudes towards services. Finally, we shall present some recommendations to researchers and practitioners who confront these problems. NAIVE MATHEMATICAL SOLUTIONS The easiest and most obvious method of estimation and consequently the one that is probably most widely used is a simple proportional transformation. This approach involves multiplying each >-point score by the proportion 7/5 to scale it up to an equivalent 7-point score or multiplying each 7point score by 5/7 to scale it down to an equivalent 5-point score. This method of solution can be visualized by imagining an elastic rule with five equidistant numerals is stretched evenly to fit alongside a longer ruler with seven numerals or one with seven numerals compressed to fit alongside a ruler with five.
An analogous mathematical solution entails transforming the original 5point or 7-point scores to standard (z) scores and then treating them as fully equivalent and comparable. Transformation of raw scores to standard scores is achieved through the relation z = ( x -M)/s, where x is the raw score, M is the mean of the raw scores, and s is the standard deviation of the raw scores. The standard deviation is the square root of the variance s2, an unbi-ased estimate of which is s = C[(xi -M)]/(N -I), where N is the number of raw scores xi and the summation is over i from i= 1 to i= N. Standard scores are widely used for comparing raw scores from drfferent distributions, because they are dimensionless quantities with mean and standard deviation equal to O and 1, respectively, yet they retain the original shape or mathematical form of the raw score dstributions from which they are derived. Standardization is an obvious and natural approach that has proved useful for evaluating empirical data (Rosenthal & Rosnow, 1991), designing experiments (Cohen, 1988), and integrating results from many studies (Hedges & Olkin, 1985), but it has certain drawbacks for comparing data from rating scales of unequal lengths. In particular, it can be used for converting scores only when the means and standard deviations for the scales are known, and this information is not always available in published and unpublished data that researchers may wish to convert.
Although elementary mathematical solutions may be popular in practice, they are likely to yield inaccurate equivalences because they contain hdden assumptions about human information processing. A purely mathematical approach provides no basis for the choice of suitable parameters for the transformation equation; these can be established only through empirical research. The solution via standardzation also rests on implicit assumptions about psychological equivalences between scales of different lengths. How people respond to rating scales with unequal numbers of response categories is a quintessentially psychological rather than a mathematical question, and the aim of this study is to derive the best solution by analyzing data from empirical research.

EMPIRICAL STUDY
We obtained responses on a variety of 5-point and 7-point rating scales from 227 respondents throughout England and Wales, 77 men and 150 women aged 20 years to "over 60" (in the over-60 range, exact ages were not recorded). The respondents were recruited by a form of snowball sampltng with the help of students who volunteered to participate as respondents and to recruit additional respondents in return for course credts. The sample thus consisted of undergraduate students and their friends (some of whom were also undergraduate students) and relatives. Through a self-administered questionnaire, the respondents rated a retail store, restaurant, or public transport company with which they had recent experience. These service categories were chosen on the assumption that all respondents would have used a store, restaurant, or public transport in the recent past, and this turned out to be the case.
The respondents first rated over-all service quality ("How would you rate the over-all quality of the [store, restaurant, or public transport com-pany]") on a 7-point scale, and they then rated the quality of a key service element associated with that service provider on 5-point and 7-point scales. Different service elements were rated for different service categories: helpfulness of staff (store), competence of staff (restaurant), and availabhty of information (public transport). The rating scales were presented with the two extremes of the five or seven response categories anchored by either bipolar adjective pairs, e.g., in a scale to rate the helpfulness of staff, the anchors were not at all helpful at one end of the scale and extremely helpful at the other, or by comparisons with the level of service expected, e.g., in the scale relating helpfulness of staff to expectations, the anchors were considerably better than expected and considerably worse than expected. On two-thirds of the rating scales the response categories were &splayed as a series of numerals from 1 to 5 or 7, and the respondents were asked to circle or tick an appropriate number. On all other scales the response categories were simply five or seven open bracket pairs, and the respondents were asked to place a tick in the appropriate space. Our aim was to include some commonly used presentation formats and a variety of subject matter with a reasonably representative sample of respondents in terms of sex, age, and geographical distribution.

RESULTS AND ANALYSIS
In our analysis, we made comparisons between responses to 5-point and 7-point scales that ddfered only in number of response categories. The correlation between the 5-point and 7-point scales was high (r = .92, p < .001).
The ratings were analyzed by least-squares regression to determine the best fit of h e a r , quadratic, third-order polynomial, and power function equations, which are the simplest equations that might reasonably be expected to explain the relationship between the 5-point and 7-point ratings. The results are summarized in Table 1 The following equations represent the least-squares best fitting h e a r , ¶uadratic, third-order polynomial, and power functions. In these equations, x represents the observed 5-point or 7-point ratings, y, and y5 represent the estimated 7-point and 5-point ratings, respectively, and the numerical estimates of the constants a, 6, etc. are derived from the regression analysis and are shown together with the lunits of their standard errors of estimate (the probability that an estimated score wdl fall within one standard error of its predicted value is approximately 68%).

Linear Equations (y =ax + b):
The coefficient of determination for Equation 1 is R2=.848, and for Equation 2 it is also R2 = ,848; in each case 84.8% of the variance in ratings is accounted for by the h e a r equation.

Quadratic Equations (y = ax2 + bx+ c):
The coefficient of determination for Equation 3 is R2 = .848, and for Equation 4 it is R2= ,848, indicating that the least-squares fit is no better than that for the linear equations.

Third-Order Polynomial
The coefficient of determination for Equation 5 is R2= .851 and for Equation 6 it is R2= ,849. The slight increase over the coefficients for Equations 1 to 4 is inconsequential: higher-order polynomials necessarily provide better least-squares fits to virtually all data sets than lower-order polynomials, because they contain more terms and parameters.
Power Function (y =axb): The coefficient of determination for Equation 7 is R2 = .848, and for Equation 8 it is R2=.846. These figures show that the power function equation accounts for about 85% of the variance in estimating 5-point ratings from 7point ratings, and vice versa.
The regression Equations (1) to ( 8 ) together with the simple proportional and z score transformations were cross-vahdated for goodness of fit with a new data set. These data were from 224 of the participants in the original study responding to questions about a M e r e n t senrice element (promptness of service). The RZ values for the goodness of fit to this new set of cross-validation data are presented in parentheses in Table 1. As expected with cross-vahdation, the values of RZ are lower than for the original set of data (regression equations almost invariably fit the data from which they are derived better than cross-validation data). Also, the correlation between the 5-point and 7-point scales is slightly lower for these data ( r = .88, p<.001).
However, the pattern of results is similar to the original data set, with all equations fitting reasonably well (accounting for 76.8% to 78.4% of the variance). The simple proportional transformation again fit more poorly than the other equations.

INVERSE LINEAR EQUATIONS
The h e a r , quadratic, third-order polynomial, and power function equations generated estimates that dld not differ meaningmy from one another in accuracy: the lowest coefficient of determination for the original set of data was R2= 346 and the highest was R2 = ,851. In the light of these f i dings, the most suitable method of estimation is probably best chosen with the help of Occam's razor. The simplest is the linear transformation, and it seems the most sensible choice for general use.
However, for practical applications it is desirable to have a pair of equations with an inverse functional relationship to each other, that is, an equation for estimating 7-point from >-point ratings that is an inverse function of the equation for estimating 5-point from 7-point ratings. Equations 1 and 2 do not have this inverse relation to each other. For instance, if a 7-point rating is estimated using Equation 1 from a 5-point rating x and the resulting estimation is then inserted into Equation 2 (to estimate its 5-point equivalent), the result w d not be identical to the value of the original >-point rating x.
A pair of inverse h e a r equations can be calculated by averaging the derived regression equations. Rewriting Equations 1 and 2 uniformly, using x, to represent 7-point ratings and x5 to represent 5-point ratings, comparison, equivalence, and estimation of scores derived from rating scales with unequal numbers of response categories or alternatives. Such problems are ubiquitous in a wide variety of pure and applied research, and nonempirical solutions are inadequate. They are analogous to psychophysical problems, requiring solutions based on empirical information about how people respond to rating scales that differ only in their numbers of response categories.
The results showed that linear regression equations gave results virtually equivalent to those derived from more complicated transformations. In hindsight this is not surprising. Psychophysical relations between the magnitude of sensations and the physical intensity of their correspondmg stimuli have usually been found to be best described by logarithmic or power functions (Stevens, 1975). Ln the case of rating scales with unequal numbers of response categories, the relationship between the two variables is a psychological rather than a psychophysical relation, and some simple relationship could perhaps have been anticipated.
The multiplicative constant a in the linear equation y = ax+ b turned out to be close to 7/5 in Equation 13 and to 5/7 in Equation 14. However, the h e a r regression equations are referable to the simple proportional transformation (multiplying by 7/5 or 5/71, because they are empirically derived, include the extra specification of an additive constant, and provide error terms. Straightforward z transformations fit the data as well as the h e a r transformations, but they can be applied only when data are available from which to estimate the variance or standard deviation of the untransformed scores, and such information is not always provided in summaries of data collected in the past. When standard deviations are unavailable, z scores cannot be calculated, whereas transformations via the inverse h e a r equations derived in this article may still be used for converting scores.
Although 5-point and 7-point rating scales are by far the most com-mon, other scale lengths are sometimes used. Further research is required to determine whether the conclusions reported in this article apply more generally to other scale lengths. Meanwhile, the inverse Equations 13 and 14 for the comparison of 5-point and 7-point data are recommended for the estimation of equivalences.