Scanner Data Price Indexes: Addressing Some Unresolved Issues

Scanner data are increasingly being used in the calculation of price indexes such as the CPI. The preeminent approach is the RYGEKS method (Ivancic, Diewert and Fox 2011). This uses multilateral methods to construct price parities across a rolling year then links these to construct a nonrevisable index. While this approach performs well there remain some unresolved issues, in particular; the optimal window length and the linking method. In this note, these questions are addressed. A novel linking method is proposed along with the use of weighted GEKS as opposed to a fixed window. These approaches are illustrated empirically on a large scanner dataset and perform well.


INTRODUCTION
The advent of scanner data-that is electronic point-of-sale price and quantity data collected by retailers-has the potential to significantly change the way price indexes, such as the Consumer Price Index (CPI), are constructed. Not only do scanner data offer a virtual census of transaction prices for certain products but they also make available quantity information. Ostensibly this new data source has the potential to significantly increase the quality of economic statistics. This has led a number of statistical agencies-such as the Netherlands, Norway, New Zealand, Australia, Sweden, and Switzerland-to begin integrating scanner data into their CPIs.
However, the use of scanner data has proved problematic in practice. Price discounts, which induce consumers to stock up on a product, can lead to distorted measures of price change given the influence of inventories on expenditure patterns in later periods. This problem has been widely documented (de Haan and van der Grient 2011). However, the approach outlined by Ivancic, Diewert and Fox (2011)-and implemented by de Haan and van der Grient (2011) and de Haan and Krsinich (2014)proposed a way forward. They built on the earlier work of Balk (1981) and Kokoski, Moulton, and Zieschang (1999) who noted that multilateral index number methods-an approach usually reserved for spatial comparisons between countries-could also be used in the temporal context. Ivancic, Diewert and Fox (2011) developed a multilateral approach, called the rolling year GEKS method or RYGEKS, which was specifically designed for use in calculating indexes in a temporal context such as the CPI.
The RYGEKS method is an extension of the widely used GEKS approach for imposing transitivity on bilateral indexes (Gini 1931;Eltetö and Köves 1964;Szulc 1964). The GEKS method imposes transitivity on a set of bilateral indexes-P rt for r, t ∈ T , where T is an index set of time periods-by taking the geometric mean of the relative price indexes linked via each period in turn. Denote the set of transitive log indexes as α = (α 1 , α 2 , . . . , α T ). Then, the index between periods r and t is derived as where V ⊆ T is a set of periods over which the indexes are transitivized. RYGEKS extends this by tailoring it to the context of temporal comparisons and the CPI in particular. This is driven by two factors. First, a feature of temporal comparisons is that the set of products on the market continually evolves. This means comparisons between two periods which are time-distant are likely to be less reliable than between periods which are closer because fewer products will be matched. Second, a complication with regard to the CPI is that it is generally viewed as desirable not to revise this index. This is problematic in the context of GEKS. When data for a new period become available it will almost certainly lead to changes in price parities-that is the calculated relative price levels for each period-compared to those which were calculated without the new data. The suggestion of Ivancic, Diewert, and Fox (2011) is twofold. First, to address the reliability problem they suggested limiting V to 13 months (in the case of a monthly index) rather than using all the available data. Second, they outlined a method for linking the new parities, calculated using the current window, onto previously published index numbers. To illustrate the linking method let us denote the previously published log parities for the first 12 months of the 13-month rolling window as,ᾱ = (ᾱ 1 ,ᾱ 2 , . . . ,ᾱ 12 ). They suggest calculatingᾱ 13 as;ᾱ 13 =ᾱ 12 + (α 13 − α 12 ). Here, α 12 and α 13 are the newly estimated parities in the current window.
Since Ivancic, Diewert, and Fox (2011) there has been little resolution around the optimal window length and linking method. The choice of the linking method remains an area of active investigation (de Haan 2015). For example, Krsinich (2014) recently suggestedᾱ 13 =ᾱ 1 + (α 13 − α 1 ) as the linking method. Here the new parity is linked via the oldest period rather than the newest as in Ivancic, Diewert, and Fox (2011). Of course a number of different rolling window sizes could be used and various other modifications of the GEKS approach have been suggested to mitigate the matching problem (Lamboray and Krsinich 2015).
In the next two sections we address each of these issues in turn. With regard to window size, in Section 2 we argue that rather than finding a certain optimal window length a natural solution is to instead use weighted GEKS (WGEKS). This, through suitable choice of weights, allows for the fact that comparisons tend to become less reliable as the periods being compared become further apart. With regard to the linking problem we advocate for minimizing a constrained least-squares criterion in Section 3. This treats all periods symmetrically and results in a particularly simple expression for updating the existing parities. Section 4 illustrates our suggestions empirically using a large scanner dataset. We consider the linking method, weighted and unweighted GEKS over various windows, the influence of different choices of weights and also the sensitivity of results to the imposition of the nonrevisability constraint. While the indexes approximate each other fairly closely there are some important differences between them. WGEKS and the proposed linking method perform well while unweighted GEKS performs poorly in the case of large windows. Our results also indicate that while the precise choice of weights in WGEKS does not appear to be pivotal the imposition of the nonrevisability constraint is not entirely innocuous. Section 5 provides a summary of our conclusions.

THE OPTIMAL WINDOW LENGTH
Within the framework of Ivancic, Diewert, and Fox (2011) the choice of window length involves the tension, on the one hand, between using as much of the data as possible and, on the other, of using only bilateral comparisons which are reliable. As the window gets larger comparisons are included in the computation which are likely to be increasingly unreliable-because of fewer matched products. In practice, they advocated for a window of 13 months as they argue this is the minimum width which allows for annually seasonal products. An alternative approach, as outlined by de Haan and Krsinich (2014), is to impute the "missing" prices using regression methods.
Another way of resolving this tension between the differential reliability of the bilateral indexes is via WGEKS (see, e.g., Rao 2001). That is, consider the following weighted least squares problem of minimizing the sum of squared errors (SSE) by choice of α, (2) It is well known that in the case of equal weights (w rt = w) then the optimal solution satisfies (1) (see Rao and Baneerjee 1986). Hence, WGEKS is a generalization of the standard approach that explicitly allows for the differential reliability of the bilateral comparisons. Indeed, the windowing approach of Ivancic, Diewert, and Fox (2011) can be thought of as a specific type of WGEKS which imposes weights of either zero or one on the bilateral indexes depending upon whether they fall within or outside the window. Our suggestion is to instead use more informative weights. This will mean that price changes in the distant past do not exert too much influence on the calculation of current price changes.
The weights in WGEKS have been constructed in a range of ways in the literature (Rao and Timmer 2003;Diewert 2005;Hill and Timmer 2006). In the empirical section below, we consider three approaches which are consistent with this literature. The first approach, upon which we primarily focus, is what we call the average matched expenditure share (AMES) method. Here, the weights are calculated as w rt = i∈I rt 1 2 (s ir +s it ). Where i indexes products, I t is the index set of products available in period t,Ī rt = I r ∪ I t is the set of products available in either period, while I rt = I r ∩ I t is the index set of products available in both periods. Hence, if prices and quantities are denoted p it and q it , respectively, then expenditure shares arē s it = p it q it i∈Īrt p it q it . This choice of weights results in comparisons with large matched expenditure shares receiving higher weight.
A key advantage is that this approach closely concords with the weighting structure in the Törnqvist index number formula adopted below. Note also that it treats each period's expenditure share symmetrically even if total expenditures are quite different. An alternative approach is to use average matched expenditure (AME); w rt = i∈I rt 1 2 (p ir q ir + p it q it ). Here, larger weight is given to cases where expenditures are matched and also where expenditures are large. This last feature of AME is both potentially an advantage and a disadvantage. When there are more purchases the individual prices calculated from the data will be more reliable and hence should receive higher weight. However, in a highly inflationary environment AME weights can become distorted and give too much emphasis to periods with high prices unlike AMES weights. This is not of great concern in our application below, however, as we consider a period of time when inflation was fairly benign. Finally, rather than focusing on expenditures we could use average matched product shares (AMPS); w rt = 1 2 ( |I rt | |I t | + |I rt | |I r | ). This is quite similar to AMES except it gives each product equal weight rather than basing it on relative expenditures. We find that AMPS declines faster than AMES as the distance between r and t rise because it tends to be the lower-expenditure products which disappear from the market.
With reasonable weights, such as those outlined above, the WGEKS method is likely to make better use of the available bilateral comparisons than the simple 0-1 weighting scheme reflected in windowing. It also potentially enables the use of a longer window because price indexes which are less reliable, but contain some useful information, will be suitably downweighted. The way in which the weights degrade as the length of time between periods rises also provides a more informed basis upon which to choose a window length if such an approach is required in a production environment. We explore the performance of WGEKS indexes and various weighting schemes in the empirical section.

THE LINKING METHOD
In the introduction, we outlined two approaches to imposing the nonrevisability constraint on scanner data indexes via linking. A compelling alternative approach is to estimate constrained WGEKS parities. More formally, suppose data for the index set of periods V are available, that the latest period is period r and we have previously estimated parities, denotedᾱ t , for all t ∈ V , t = r. Then, we propose solving the constrained least squares (CLS) problem to estimate α r , This becomes a particularly simple problem because there is only a single parameter to estimate, α r . Given the formulation of SSE(α) in (2) all the squared terms not involving α r are fixed. In this case, it is easy to show that the solution to (3) yields This derives the new parity as a weighted average of the inflated price level in period r relative to each period t ∈ V , t = r. This solution is different from both Ivancic, Diewert, and Fox (2011) and Krsinich (2014) primarily in that it treats all periods symmetrically rather than linking via a single period. Hence, it can be argued that the proposed method preserves as much of the GEKS principle as possible-that of treating each period symmetrically-while ensuring the nonrevisability constraint is enforced. The approach is particularly easy to implement given that in (4) all that is needed are the bilateral comparisons between the new period and each of the old periods.

AN EMPIRICAL COMPARISON OF METHODS
To investigate the approaches outlined above, we make use of a large scanner dataset made available by IRI (see Bronnenberg, Kruger, and Mela 2008). This includes supermarket scanner data for a number of product categories across various cities in the U.S. We focus on three cities-Dallas, Los Angeles (LA) and New York (NY)-and a large number of product categoriesbeer, carbonated drinks, coffee, margarine and butter, soup, toilet tissue, and toothpaste. The data stretch from 2001 to 2012 and include weekly prices and quantities at the barcode level in each surveyed store.
To ensure the robustness of our results, we explored different ways of constructing the data. Three approaches were considered. First, we used data from all stores and defined a product as a unique combination of barcode and store (approach I). Second, we defined a product similarly but constructed a balanced store panel (approach II). This only used data from stores which were observed in all periods of the data for each of the city-product category combinations. As a result, the product matching rate in this case wholly reflects product turnover and not changes in the store sample. Third, we constructed products as a unique combination of store chain and barcode (approach III). This aggregates across stores within a chain and was one of the approaches considered in Ivancic, Diewert, and Fox (2011). They recommended against aggregating across all stores as store characteristics and quality may vary and this may contaminate estimated price change. However, within a retail chain, store quality is likely to be much more homogeneous.
To highlight our key results, and to keep our presentation concise, we confine some tables to the online appendix. Detailed summary statistics, by city and product category, on the number of; observations, products, stores, unique product-store (or  (2014), CLS = the proposed constrained least squares approach. ‡ max = the maximal window is used (i.e., all available data). § None = no weights, AMES = average matched expenditure share, AMPS = average matched product share, AME = average matched expenditure.
product-chain) combinations and time periods, as well as the total expenditure, can be found there. We estimate 19 indexes using each of the three different datasets outlined above. In terms of bilateral index number formula, we adopt the widely used Törnqvist index, Our objective is to investigate the impact of; the rolling window length and the weights in WGEKS, the link method as well as the impact of imposing the nonrevisability constraint. The indexes, and the assumptions used in their construction, are listed in Table 1. Method B is that suggested by Ivancic, Diewert, and Fox (2011). In each case, we started the linking process after 13 months of data were available (or 6 months when this window length was used).
The performance of the indexes is evaluated on the basis of three main factors. First, differences between the indexes. This is the most important factor and is measured by the difference in annual average inflation between each of the indexes with all other indexes. This is then averaged across product categories and cities to provide a measure of deviation for each pair of indexes. Second, we consider the extent to which different methods produce concordant price changes over time. This is done by comparing the absolute difference in annual inflation rates between each index. That is, comparing indexes, say A and B, for each city and product category we calculate, d AB t is defined analogously). We then average d AB t across time, product categories, and cities. This determines the similarity of the indexes and the extent to which they record different measures of inflation in Table 2. Matrix of differences in annual % change (averaged across city and product category) † real time. Finally, the variability of an index is also an important consideration so we calculate the standard deviation of the log change in each index over time. The log change is calculated on both a monthly and annual basis. Our computations generate a large number of results which are presented in their entirety in the online appendix. However, a key feature of the results is that they are relatively robust to the data used. The results across the three approaches to data con-struction are quantitatively and qualitatively very similar. Hence, we focus the discussion of our results on case I, which uses data from all stores and defines a product as a unique barcode-store combination. The index differences, absolute differences, and standard deviation are reported in Tables 2-4.
To illustrate the similarity of the results across data types, we estimate the correlation between each of the statistics for each data type. These are reported in Table 5. The correlations are Table 3. Matrix of absolute differences in annual % change (averaged across city and product category) †  high. For example, the correlation between the average differences in annual inflation between data type I, shown in Table 2, and corresponding results for data type II is 0.9524. The lowest correlation is that between data I and III for the average differences in annual inflation. But this is still 0.7714 illustrat-ing a strong concordance in the results. All correlations are significant at the 1% level. Perhaps the most important feature of our empirical results, and one which is likely to provide some reassurance to statistical agencies, is that shown in Table 2. None of the indexes calculated here give clearly erroneous results. This is in contrast with the use of chained indexes which can lead to significant drift (de Haan and van der Grient 2011). Nevertheless, some of the differences in indexes are nontrivial, especially for certain products and cities, and these differences appear to relate to the choices made in index construction.
First to the issue of window length. Indexes A-E are calculated using unweighted GEKS with windows of 6, 13, 24, 48 and the maximum possible number of months respectively. While using window spans of 24, 48, or more months has not been suggested in the literature, these indexes provide some useful insight into the sensitivity of the choice of window length. Though none of these indexes appear to show large deviations in average annual inflation it can be seen in Table 3 that there are some significant differences in the timing of measured price change reflected in the absolute differences. These differences get larger as the window sizes diverge. In particular, for E the average absolute difference in annual inflation from B averages 1.40 percentage points. Another notable feature of larger windows is an increase in the variability of index movements. Table 4 shows that a widening of the window leads to increasing volatility in both the annual and monthly log changes.
The window sizes in indexes F-J correspond to those for A-E but use WGEKS (where the weights are AMES). Interestingly, the deviations between the indexes is somewhat narrower than in the unweighted case. So that G-the WGEKS index with a 13-month window-differs from J-with the maximal window-by 0.75 percentage points in terms of average absolute annual percent change. Also, while the standard deviation of the WGEKS indexes rises as the window widens this increase is not as significant as was observed for the unweighted indexes.
What is evident from the results at the city-product category level is that one of the drivers of the poor performance of the unweighted GEKS method with large windows is high product attrition rates over time. We illustrate this by considering two product categories; beer, which has a low attrition rate, and toilet tissue, which has a high attrition rate. The indexes B, D, E, and J for Los Angeles are shown in Figure 1, while a scatterplot of the attrition rates, based upon the AMES criterion, is shown in Figure 2 (the online appendix provides a table of the attrition rates by product category and city for spans of 6, 12, 24, 48, 72, and 120 months). It can readily be seen that all the indexes approximate each other closely for beer. But this is not the case for toilet tissue. The higher attrition rate has a detrimental influence particularly on index E which uses the unweighted maximal window. On the other hand, index J-which also uses a maximal window, though AMES-weighted-performs fairly well.
Now consider the choice of the linking method. Indexes K and L use a 13-month window, two different linking methods and unweighted GEKS. As the results in Table 3 indicate, the CLS approach (used in index L) yields similar results to index B, which uses the linking procedure proposed by Ivancic, Diewert, and Fox (2011). By contrast, some differences are evident between index L and index K, which uses the approach  of Krsinich (2014), though they are not particularly large. We also construct indexes M and N. These use a 13-month window, two different linking methods and AMES weights in WGEKS. Here, we also find that the CLS approach (index N) produces results which are more similar to the linking method of Ivancic, Diewert, and Fox (2011) (index G) than Krsinich (2014) (index M). Another interesting feature of the linking methods is the impact that it has on the index variability. According to Table 4, the standard deviation of the annual log change is much the same for indexes B, K, and L and also G, M, and N. However, for K and M, which use the Krsinich (2014) approach to linking, the monthly standard deviation is relatively high. This likely reflects the method's focus on using annual changes as the link which can result in more volatile monthly movements. Such a possibility was noted by de Haan (2015) and our results indicate support for this conjecture.
Given our advocacy of WGEKS it is important to explore what role the choice of weights plays. We calculate indexes O, P, Q, and R, which use the CLS linking method, the maximal window and the various weighting methods described above, or no weights in the case of O. Interestingly, the weighted indexes-P, Q, and R-approximate each other very closely according to Table 3. However, this does not mean that the choice of weights is irrelevant as index O, which is unweighted, is different from P, Q, and R. Hence, it is likely that the weights we have chosen, though conceptually different, all approximate each other fairly closely in our data.
Finally, we consider the impact that imposing the nonrevisability constraint has on the results. Index S uses WGEKS with AMES weights but here we estimate the indexes for each month using all the data. This is compared with P, which uses the same approach except adopts the CLS linking method to ensure nonrevisability. Interestingly, there appears to be nontrivial differences between these two approaches. While P and S do not differ greatly on the basis of average inflation the average absolute difference in the annual percent change in Table 3 is nontrivial at 0.61 percentage points. Note also in Table 4 that index S also has lower variance than does index P.

CONCLUSION
In summary, our results indicate some support for the WGEKS approach with the CLS linking method. While the choice of the linking method does not appear to be too important empirically the CLS method is likely to be attractive on theoretical grounds. Interestingly, the imposition of the nonrevisability constraint is not entirely innocuous. While it does not appear to lead to large differences in the results it can change the timing of measured price change. Our results indicate that the choice of window length is more significant. Larger windows lead to greater variability in the unweighted GEKS indexes compared with WGEKS. These effects were fairly modest for commonly used window lengths but larger for long windows. However, the use of weights to determine a type of window appears to be a useful way forward. Here, the discussion can primarily focus on the choice of weights rather than window length. Moreover, given that most reasonable weighting strategies are likely to result in relative weights which are quite similar, as in our application, these choices are unlikely to be highly controversial leading to more reproducible index numbers.

SUPPLEMENTARY MATERIALS
The online appendix contains a comprehensive listing of the results shown in Tables 2-4 for each data type. In addition, data summary statistics and product attrition rates are provided for each city in each product category.