Analysis of the Potential Market for Out-of-Print eBooks

The growth of the electronic book market has allowed publishers to make many previously out-of-print titles available, cost-effectively, in an electronic format. However, as of January 2012, there were still nearly 2,700,000 out-of-print titles that are unavailable as eBooks. The goal of this paper is to generate estimates of how much producer and consumer surplus could be created by making these out-of-print titles available in eBook markets. To do this, we first collect a unique dataset, comprising a random sample of all out-of-print titles that are and that are not available in eBook markets. We then use Bayesian Propensity Score Matching techniques to match books in these two samples based on their observable characteristics. Using these matched titles, we estimate that making the remaining 2.7 million out-of-print books available as eBooks could create $740 million in revenue and $860 million in consumer surplus in the first year after their debut. We also estimate that $460 million of this revenue would accrue directly to publishers and authors as profit.


I. Introduction
Although it has been 40 years since the first eBook was created by Michael S. Hart 1 in 1971, and more than 10 years since the first eBook was sold online in 1998, eBooks did not show significant market share growth until the first Kindle was introduced by Amazon in November 2007. Since then, sales of eBooks have grown rapidly as reflected in a variety of industry statistics. For example, in 2010, sales of Kindle titles at Amazon exceeded the sales of hardcover titles for the first time (Miller, 2010)  Among these 1.4 million titles, many represented digitized versions of titles that had been unavailable in print versions for some time. Our data show that, out of 3,720 to-be-released Kindle titles from October 1, 2011 to December 31, 2011, 596 titles (16%) were already available in physical format (the remainder were new books released in both eBook and print formats or new books only available in eBook format).
1 Source: http://en.wikipedia.org/wiki/eBook 2 Source: http://www.idc.com/about/viewpressrelease.jsp?containerId=prUS22737611 3 Year-end AAP sales report represents data provided by 84 U.S. publishing houses. 4 Source: http://www.ebookreaders.org.uk/amazon-Kindle/ Digitization of catalog titles -books that have been available in print for some time -has become a new revenue source for publishers and writers, especially for independent writers. For example, novelist Barbara Freethy re-released 12 of her out-of-print titles priced at $0.99 each. The re-release was well received by the market, with her book, "Don't Say A Word," which had been out-of-print in 1995, climbing to No.2 on the Barnes & Noble's NOOK bestseller list in 2011 (Owen, 2011).
There are several potential reasons that previously out-of-print titles might perform well as eBooks. First, eBooks have a very different cost structure than print titles. Because of the fixed costs associated with physical printing runs, after an initial print run sells out, it only makes sense to reprint the book if the expected residual demand exceeds 500 to 1,000 copies. Books with expected demand below these numbers will be allowed to go out of print. However, this leaves a great deal of demand unmet -demand that could be filled in an electronic format where the fixed costs associated with digitization are very low, and where the marginal costs of delivery are near zero. A second reason out-of-print titles might do well as eBooks is the increased opportunities for discovery afforded by digital marketplaces. Physical bookstores only stock 20,000-100,000 unique titles, whereas online retailers can stock as many books as are available. Add to this, increased opportunities to use recommendation engines, peer reviews, and personalized advertisements, and you have a recipe to allow consumers to discover a broader selection of titles than they could in a physical storefront (Zentner, Smith, and Kaya 2012;Brynjolfsson, Hu, and Smith 2010;Kumar, Smith, and Telang 2011). Finally, it is possible that some titles will benefit disproportionately from the convenience and immediate gratification offered through electronic delivery of eBooks.
Together these arguments suggest that the eBook marketplace might give new life to previously out-ofprint titles. The goal of this paper is to produce estimates of the producer and consumer surplus that could be created by bringing the world's 2.7 million out-of-print titles back into print as eBooks. To do this, we first generate a random sample of out-of-print books that are available as eBooks, and out-of-print titles that are not available as eBooks. We then use Bayesian Propensity Score Matching techniques to match titles across these two groups based on observable characteristics. Based on these techniques, we estimate that making the world's out-of-print titles available as eBooks could create $740 million in revenue in the first year after publication, $460 million of which would accrue to the publishers and authors. In addition, we estimate that making these books available would create $860 million in consumer surplus in the first year after publication.

II. Literature Review
This paper draws on a variety of literatures, notably the marketing and information systems literatures on how electronic markets influence variety, sales, and welfare. In this context, Brynjolfsson, Hu, and Smith (2003) find that estimate the consumer surplus gain from access to increased product variety in online stores versus physical stores. Based on 2000 data, they find an increase of nearly $1 billion in consumer surplus from increased product variety in books alone. Brynjolfsosn, Hu, and Rahman (2009) extend this result to show that there is very little competition between online and offline retailers in niche product settings, and Brynjolfsson, Hu, and Simester (2011) show how electronic marketplaces decrease consumer search costs for products relative to search costs that would be seen in physical marketplaces.
Finally, Brynjolfsson, Hu, and Smith (2010) find that the consumer surplus gain from "long tail" markets is significantly larger in 2008 than it was in 2000.
Our paper also draws heavily on the statistical literature on Propensity Score Matching techniques. These techniques were first proposed by Rosenbaum and Rubin (1983) as a way to remove bias due to observed covariates.

III. Methodology
Our research is focused on estimating the consumer surplus gain from introducing previously out-of-print books to the eBook market. By out-of-print, we mean books that are not stocked by new book retailers and distributors (i.e. they potentially only available through used book markets). We operationalize this in our study by considering books that are not available directly from Amazon (even if they may be available from an Amazon marketplace seller) as being out-of-print. Figure 2 provides an example of such an out-of-print title.
Using this definition, our proposed methodology relies on generating random sample of books that are out-of-print, but available in eBook format, books that we refer to as "Kindle Out-Of-Print" or KOOP titles; and books that are out-of-print and not available in eBook formats, books that we refer to as Non-Kindle Out-Of-Print" or NOOP titles.
In our study, we are interested in predicting the potential sales of NOOP titles if they were made available in the Kindle marketplace, which can be given as (1) where ATU is the average treatment effect of moving a book from being unavailable (Y 0 ) to available (Y 1 ) in the Kindle marketplace when the book was previously unavailable in the Kindle marketplace (D=0), and where is the expected revenue generated by digitizing a NOOP title and is the current sale of NOOP titles, which is by definition 0.
Unfortunately, we do not know sales or pricing information for a potential NOOP title, given that they are not yet available in the Kindle marketplace. Moreover, we cannot directly conduct an experiment to randomly choose NOOP titles and bring them into the Kindle marketplace. However, we can observe the sales and price of titles that have already been re-released. Thus a tentative solution to the problem of estimating (1) is to infer the sales and price of NOOP titles from sales and price of those KOOP titles that have been re-released, as in (2) (2) The challenge to directly calculating the treatment effect using (2) is that one must assume that there is no difference between the KOOP and NOOP samples. If this is not true, the bias of inferring the true estimates to (1) by using the estimates of (2) is given by (3) The NOOP and KOOP samples are likely to differ given that publishers may intentionally digitize titles that are more likely to be successful in the eBook market before they will digitize other titles. In our study, we use Propensity Score Matching as a way to match NOOP titles to similar KOOP titles in an effort to remove this bias. Specifically, after calculating the propensity score using observable characteristics of the books in our sample, we assume that books with the same propensity score can be seen as being randomly assigned to their respective KOOP or NOOP group.
We use the following steps to calculate the propensity score for books in our sample: (1) Calculation of Propensity Score using Probit/Logit models (2) Calculation of Propensity Score using Near Neighborhood Matching, Stratification Matching, Caliper Matching, Mahalanobis Metric Matching, etc (3) Multivariate analysis on the matched groups We note that Propensity Score Matching relies on two important assumptions. The first is that any selection bias is only due to observed variables. The other assumption is overlap, which means that there is sufficient overlap between the propensity scores in both samples (NOOP and KOOP in our case) to support matching.

Sample selection of KOOP titles and NOOP titles
Our first goal is to find a random sample of all KOOP and NOOP titles. We then cross-matched these books with the print book page at Amazon to determine the International Standard Book Number (ISBN) for the matching print title of each book, and to determine if the book was out of print. After removing all books that were still in print, 4,210 KOOP books remained in our sample.
We then determined the physical characteristics of these KOOP titles by search for the ISBN number in Global Books in Print (GBIP), Bing, and Amazon we outlined below and tracked the daily rank and price for these KOOP titles for 8 days from November 22, 2011 to November 29, 2011, and calculated the weekly average rank and weekly average price for those titles, which we will subsequently use to determine Kindle sales for these titles.
We obtained a random sample of NOOP titles (100,000) by randomly selecting a sample of titles that are no longer in print (from GBIP), We then dropped all titles published after 2005, with significant missing product information, and titles that have Kindle copies available. This yields a sample of 7,930 NOOP titles.

Calculating Propensity Score for KOOP titles and NOOP titles
After identifying KOOP and NOOP titles, we need to match samples in the NOOP group with samples in the KOOP group. To do this, we first select the variables to be used in the Propensity Score Matching process. Brookhart et al. (2006) suggests that, in selecting Propensity Score Matching variables, researchers should include all variables that might affect outcome, even if they are not related to the exposure. This decreases the variance of estimated exposure without increasing bias. Following this approach, we include the following variables in our Propensity Score calculation: (1) Price: The list price of the print version of a title. (9) Large Publisher: This is an indicator variable set to one for "large publishers." The publisher of the book is identified from the second through sixth digits in the ISBN number. Table 1 and Table 2 present summary statistics for the data. These statistics show clear differences between the two groups, but also show a significant amount of overlap for most variables. The differences between the summary statistics for the two groups suggests a need to use Propensity Score Matching techniques to control for any bias across the two samples, and the overlap in variables suggests an opportunity for these techniques to be successful. To do this, we first calculate the propensity score each title using the following Probit model: is the latent utility for , the choice made by publisher to publish the book in Kindle format.
The predicted value of is the propensity score.  Table 3 displays the resulting coefficients for this regression. These results suggest that (not surprisingly) publishers decisions regarding which books to bring back into print are not random, but are heavily influenced by the coefficients in our regression: list price, number of pages, and rank of physical format all negatively impact the probability of an OOP title being digitized, while the number of Bing search results and the total number of titles from the publisher in our sample positivity affect the possibility of a title being digitized.
Beyond these individual coefficients, we are also interested in distribution of propensity scores for NOOP and KOOP titles, and whether there is sufficient overlap in these distributions. Figure 4 displays the density plot for the propensity scores from the two samples. From this plot, it is clear that the distribution of propensity scores for KOOP group and NOOP group is quite different, with KOOP titles generally having a higher propensity score, but that there is also significant overlap between the two distributions for propensity score values between 0.2 and 0.8 (see also Figure 9 for a histogram of titles in each group by propensity score values).

Calibrating Sales Rank and Sales Quantity
Before matching KOOP and NOOP titles based on these propensity scores, we first must estimate the sales (and revenue) that titles in the KOOP group receive. Unfortunately, Amazon does not publicize its Kindle sales on a per title basis. It does, however, list the "sale rank" of each Kindle title and we use the techniques established in the literature to map these sales ranks to sales levels.
Specifically, prior research has shown that the relationship between Amazon sales and sales ranks approximates a Pareto (Brynjolfsson, Hu and Smith 2003;Chevalier and Goolsbee 2003;, which after a log transformation is given as follows: (6) We then calibrate this relationship using data provided by a major publisher matching Kindle weekly sales to observed Kindle sales ranks. This dataset covers weekly sales and sales ranks for 713 eBook titles for 10 weeks.
In our setting it is particularly important that this relationship produces strong fits in the tail of the distribution (titles with lower sales). Our initial exploratory data analysis using (6) found that, consistent with the prior literature (Brynjolfsson, Hu, and Smith 2006), the Pareto distribution doesn't fit well in the tails of the distribution. Because of this we estimated a form of (6) using various different polynomial rank terms, finding that a third degree polynomial best fits our data based on BIC and R 2 measures: The resulting calibration estimates, and observed sales-rank pairs, are shown in Figure 5. This Figure   suggests that, while we obtain reasonably good fit for observations with ranks below 200,000, the fit is not quite as good in the extreme tail (ranks above 200,000).
Because of this, we complement the method outlined above by using a simple experiment (first proposed by Chevalier and Goolsbee 2003) where we order several copies of books with ranks greater than 200,000 and observe their sales rank both before and after purchase. Specifically, we randomly selected 30 Kindle titles with ranks between 200,000 and 1,000,000. We then purchased between 1 and 3 copies of these books and tracked their sales rank before and after this experiment. The resulting ranks are shown in Table 5, where we made our initial purchases at 2:00PM on "Day 1." This table shows that the effect of the sale did not show up in the sales rank until 6-7 hours after the initial purchase, we then use the approximate changes from 1, 2, or 3 purchases to estimate the decrease in rank one would see when a copy of a low selling title is purchased.
In summary, we estimate sales based on observed sales ranks as follows: (1) For the titles with low ranks (<200,000), we predict the sales using the result from regression shown in Table 4.
(2) If a title has a rank higher than 200,000, we assign sales according to the expected sales that belong to the interval the title's rank falls into based on the experiment described above. For example, if a title has a weekly rank of 231,221, we assign it the sales of 0.838.
(3) If a title does not have a rank, we assume it has no sales.

Adjusting Sales Decay Effect for Weekly Rank
Unfortunately, the results above only tell us sales at a particular point in time after the release of a Kindle title. They do not tell us what that title was selling initially after release. Because we are interested in estimating sales after the initial release of a title, we need to attempt to estimate the decay curve of Kindle sales over time.
To do this, we obtain the Kindle release date for each book in our sample and then we attempt to estimate the rank in an arbitrary week based on the distribution of sales by week as follows: is the fixed effect of title on Rank, and is a function of . We then combine (8) and (9) to get (10), which can be used to calculate : Thus, can be estimated by as long as is not large.
An assumption of (8) and (9) is that only the fixed variables will affect Rank besides time, which is a fairly strong assumption, but it is balanced by the fact that these out-of-print titles typically do not receive a lot of promotion and typically have very infrequent price changes.
As noted above, we do not have many books that we observe at the same number of weeks after release.
Instead, we can use another method to estimate , and then use (10) to recover the rank. Specifically, we have both older and newer titles that fall into different release weeks in the sample. If these titles experience similar decay, the average rank of titles that fall into each week is exactly what the sales would look like after a certain time since the debut of a title. As such, we use the following model to estimate (11) where are book characteristics, and is a function of In order to choose the actual model in the form of (11) and , we first calculate the average rank for titles that fall into each week and then plot the average rank for each week. We also conducted a kernel regression of average ranks on time. The data and fitted lines are plotted in Figure 6. From Figure   6, we can see that sales seem to climb after the title's debut, and then drop gradually after around 125 weeks. From Figure 6, we can see that we might need to fit different models to newer titles and older titles separately.
Using this, we used forward step AIC for model selection, where (12) is the initial model, and (13) is the full model: We tried different cut point to divide the dataset into two subsets, and then apply the forward step model selection process based on AIC to each of the subsets. After some trial and error we found week 125 works well. Model (14) and (15) fit each subset as suggested by this approach, with the estimation of model (14) shown in Table 9 and the estimation of model (15) in Table 10.
From (14) and (15), we find that only appears in (14), suggesting that KOOP titles do not show an obvious sign of decay until 75 weeks after their debut. We also estimated two other models on data that has . One model includes only the variable and the other includes all variables in model (15) plus as independent variables. The results for these models are shown in Table 10.
These estimates show that is not significant in either of the models. One explanation for this "stable" period might be that these titles are obscure and rarely receive promotion. Thus, the length of Week time it takes for consumers to become informed of a KOOP title's debut might be relatively uniformly distributed for a long period.
We will adjust the rank of all titles using the last week (week=125) of the "stable" period as standard week. Thus, if a title has value of bigger than 125, we do not need to adjust the rank. However, if a title has value of smaller than 125, we will need to adjust the rank using (14).
In our estimates, we need to pay special attention to the 24 titles in our sample that have sales ranks lowered than 20,000. To be conservative, we do not adjust their ranks. The distribution of ranks after adjustment is shown in Figure 8.

Matching Propensity Scores
Next, we attempt to exploit this overlap to match the NOOP to the KOOP samples based on their propensity score. We first note that the smaller overlap between KOOP and NOOP samples is not a problem for propensity score values larger than 0.8 because our goal is the match NOOP titles to KOOP titles, and in this range there are more than enough KOOP titles compared to NOOP titles. The lack of overlap is a problem, however, for propensity score values less than 0.2 since we only have 256 KOOP titles in this range (6.1% of all KOOP titles). Because of this, to be conservative in our analysis we only consider books with propensity scores larger than 0.2 in our analysis, effectively treating NOOP titles with propensity scores from 0 to 0.2 as having no impact on consumer of producer surplus if they were to be digitized. For titles with propensity scores greater than 0.2, we attempt to match titles across groups using two different methods, outlined below.

(1) Nearest Neighbor Matching (NNM)
Using the nearest neighbor matching (NNM) method, we select the KOOP title with the propensity score closest to the score of the NOOP to be matched. Since the number of KOOP titles is much smaller than eek W eek W the number of NOOP title to be matched, we use NNM without replacement. In order to avoid bad matches, we also only consider pairs that have difference in PS smaller than 0.005.
To check the matching quality using this technique, we note that good propensity score matches should be able to balance the distribution of the relevant variables in both the control and the treatment groups (Caliendo and Kopeinig 2008). From Table 7 and Figure 10, we can see that the distributions of variables after matching are quite similar. In addition to checking the distribution of variables in both groups, Sianesi (2004) suggests that researchers could calculate the propensity score again for both groups after matching, and compare Pseudo R 2 after matching. If the matching is strong, the Pseudo R 2 should be low.
We used this method and found that McFadden Pseudo R 2 drops from 0.3566 to 0.0064 after matching. 5 This suggests that, after matching, the variables used for matching can no longer tell the difference between two groups. Thus, the Average Treatment Effect on the untreated group can be calculated as the average of outcomes in the matched KOOP group.
Using these matched samples, and the Kindle sales values estimated above, we find that the ATU of sales would be 1.945 copies/book per week, which is higher than the average copies KOOP samples with propensity scores larger than 0.2 sold during that week, and the ATU of revenue would be $13.74, which is lower than the average revenue of KOOP samples with propensity score larger than 0.2.
(2) Stratification Method Cochran (1968) shows that five subclasses are often sufficient to remove over 90% of the bias due to the subclassifying variable or covariate. However, as the number of subclassifying variables increases, the number of subclasses would need to increase exponentially (Cochran and Chambers 1965). However, since propensity score is a scalar variable of multiple covariates, using propensity score alone on 5 subclasses would often be enough to remove over 90% of the bias due to each of the covariates (Rosenbaum and Rubin 1984). Thus, to implement this approach we take all titles with propensity scores larger than 0.2 and stratify them into 8 subgroups based on the propensity score: if a title has a propensity score between 0.2 and 0.3, it is assigned to stratum 1, and so on through stratum 8 (propensity score of 0.9 to 1). The average Treatment effect can be calculated using (16) and (17).
The results using these two equations are shown in Table 12. We find that the ATU of sales using the stratification matching is 1.71 copies/book per week, which is higher than the average sales in the KOOP sample with propensity scores larger than 0.2, and the ATU of revenue is $12.94/book, which is smaller than the average revenue of KOOP samples with propensity scores larger than 0.2.

Bayesian Propensity Score Matching (BPSM)
Conventional method of Propensity Score Matching discussed earlier does not do a good job of providing a confidence interval for the results Propensity Scores. The variance in Propensity Score estimates is especially important in our study, because of the skewness in sales across titles. Although the number of high-selling titles is relatively small, they could excessively influence our results, particularly if some of the bestselling KOOP are matched multiple times to NOOP samples.
We use Bayesian Propensity Score Matching, which allows us to draw multiple sets of propensity scores from the distribution and repeating the matching process with each set, to help estimate the confidence intervals for propensity scores. We implement the Bayesian Propensity Score Matching approach using the two-stage method proposed by Gelman et al. (2003) and Kaplan and Chen (2011). Our specific model is the same as the Probit model used above. To estimate this model, we choose a diffuse prior, and set the posterior distribution of the model as We run 15,000 iterations of this model with thinning parameters set to 3, and we choose a burn-in period of 2,000, leaving 3,000 propensity scores for use in our estimates. The trace plots for the model variables are shown in Figure 11 and Table 11 shows the estimates of covariates. Table 11 shows that the resulting coefficients are quite similar to those obtained using the conventional propensity score method above.
We then use the resulting 3,000 propensity scores to generate matches based on both the Nearest Neighbor and Stratification methods applied above. This will give us an interval that accounts for the variation due to the uncertainty in the propensity score. The resulting estimates are shown in Table 12, and summarized below: (1) The expected average sale of NOOP titles is 1.53-1.78 copies/week (25%-75% CI) using the Nearest Neighbor method, and 1.61-1.74 copies/week (25%-75% CI) using the stratification method. This is much higher than the average weekly KOOP sales of 1.43.
(2) The expected average revenue for a NOOP title is $12.01-13.61/week (25%-75% CI) using the Nearest Neighbor method, and $12.6-13.15/week (25%-75% CI) using the stratification method. This is much lower than the average weekly revenue for KOOP titles of $14.69. Figure 12 shows that the probability of being digitized is strongly correlated with expected revenue. This is especially obvious for titles with very high propensity scores (0.9-1). We can see this more clearly by running (20) on all Kindle titles: (20) Table 13 displays the results of this regression and shows that publishers tend to release titles with higher propensity scores earlier than other titles (negative ). Likewise, larger publishers enter the digital market earlier (negative ) than other publishers do.
Before we move on to the next part, we need to examine the robustness of our result. In order to do this, we randomly draw (1) 6,000 (2) 8,000 (3) 10,000 (4) 12000 samples and re-run the previous steps using these samples to compare how result differs. Table 14 shows the result of estimation using different subset of samples. We can see that although the number of ATU estimated varies cross different random subset of samples, the variation is small. We consider the estimation is pretty robust.

Welfare Analysis
In this section, we use these propensity score and Kindle sales estimates to calculate estimates of the producer and consumer surplus that could be realized by making current NOOP titles available in Kindle format. In the following analysis we do this by estimating these figures for the first year after Kindle release for a random sample of 100,000 NOOP titles with a propensity score larger than 0.2.
To evaluate the potential impact of this digitization on consumer surplus, we follow the technique developed by Hausman (1981) and applied by Brynjolfsson, Hu, and Smith (2003) and Hausman and Leonard (2002). Specifically, we measure compensating variation as follows: Following Hausman (1981) and Brynjolfsson, Hu and Smith (2003) and , we assume the consumer's demand follows the Cobb-Douglas demand function, which is Using Roy's identity (24) and solving this function, we get 6 The virtual price is defined by Hausman (1981) as the lowest price that would set demand equal to zero.
and (26) Using (25) and (26), it can be shown that (Hausman 1981): (27) Further, following Brynjolfsson, Hu, and Smith (2003), if we assume zero income elasticity for books ( ) -based on the fact that books make up a relatively small proportion of overall consumer expenditures -and given that , equation (27) simplifies to (28) and, assuming a constant elasticity across Kindle titles, the total CV for all titles is given by (29) where N is still the number of NOOP titles to be digitized.
If we take the average price and initial quantity over a random sample of titles, we can further simplify This leaves our main task as estimating price ( ), sales ( ) for the NOOP samples in order to calculate average revenue, and then multiply this by to get consumer surplus brought by digitizing a large number of NOOP titles. We discuss this approach in more detail below.
We calculated the expected average revenue from digitizing Kindle titles as part of the Propensity Score matching discussion above. Since sales over the first 75 weeks of a title are relatively stable, we can simply multiply the average expected weekly revenue of one NOOP book by the number of titles to be digitized and the number of weeks. The first column of Table 15 displays the results for this calculation and shows expected revenue of $627.17 to $707.51 (25%-75% CI) per title for the first year after their debut using nearest neighbor matching, and $655.20 to $683.80 (25%-75% CI) using stratification. This is lower than $763.88, the total revenue generated from the same number of randomly selected KOOP titles with a propensity score larger than 0.2, and is lower than the $714.22 and $672.93 estimates that would result from using the traditional estimation approach with nearest neighbor and stratification matching respectively.

(2) Publisher Welfare
To calculate publisher welfare, we first note that, based on current Kindle sales contracts, publishers receive 70% of the marginal profit generated from Kindle sales, which is price minus delivery cost. The , is the number of NOOP titles to be digitized (1 for the estimates below), is the average size of NOOP titles (which we assume to be 5MB), and is the average scanning cost (which was estimated to be $5-$10/book, and where we use $10/book to be conservative). 8 First note that (31) roughly equals to (32): where is the average treatment effect using revenue as the outcome, and is the average treatment effect using sales as the outcome.
Using estimates for and obtained above, we estimate (see Table 14) that average publisher welfare from digitizing one previous unavailable (NOOP) title with PS higher than 0.2 is between $405 and $421 (25%-75% CI) using the Nearest Neighbor Method and between $387 and $437 (25%-75% CI) using the stratification method.

(3) Retailer Welfare
To estimate retailer welfare, we use the fact that Amazon receives the remaining 30% of marginal profit.
Following a similar approach as in (32)  Our result using this equation is shown in the third column of Table 14. We find that retailer welfare per title is between $170 and $191 (25%-75% CI) using the Nearest Neighbor Method and between $178 and $185 (25%-75% CI) using the stratification method.

(4) Consumer Surplus
Following equation (30), consumer surplus can be calculated as follows: where all parameters are known except for price elasticity ( ).
To calculate price elasticity, we start with the following relationship between price and sales:  Table 16, suggest that Kindle price elasticity is between -1.53 and -1.86.
We note that this is similar to the price elasticity of physical books found in previous studies (for example, Brynjolfsson, Hu and Smith (2003) estimated print book elasticity between −1.56 and −1.79, and Ghose and Gu (2006) print price elasticity between -1.49 and -1.89.
To be conservative, we use a price elasticity of -1.86 in (35), which results in a consumer surplus estimate of between $729.27 to $822.69 (25%-75% CI) per title using the Nearest Neighbor Method and between $761.86 and $795.12 (25%-75% CI) using the stratification method.

IV. Discussion
As noted above, the growth of the eBook market has created a significant potential opportunity for publishers and authors to bring previously out-of-print titles back into the marketplace through electronic distribution. The goal of this paper is to attempt to generate economic estimates of the producer and consumer surplus that could be created by digitizing and selling the 2.7 million books that are currently unavailable in eBook format.
In this paper we attempted to generate these estimates by converting the known sales rank into estimates of sales of a random sample of out-of-print titles that are available on the Kindle marketplace. We then used propensity score matching techniques to match these Kindle-available (KOOP) titles to a similar random sample of out-of-print titles that were not available on the Kindle marketplace (NOOP). We then estimated that the sales of NOOP titles would approximate the estimated sales for their matched KOOP title if the NOOP titles were made available in an electronic marketplace. We then use these estimates, along with established methods for calculating surplus generated by new goods, to estimate the consumer and producer surplus that would be generated by digitizing randomly selected NOOP titles with PS larger than 0.2. These estimates are presented above.
With these estimates, we can then generate a total estimate of the consumer and producer surplus that could be created by digitizing all the world's 2.7 million out-of-print titles and making them available as eBooks by multiplying the 2.7 million and then scaling these estimates to account for the fact that 41.3% of our titles (40.8% to 41.8% with a 25% confidence interval) have propensity scores above 0.2, the cutoff point for obtaining reliable estimates in our data.
After doing this, we find that bringing the world's 2.7 million out-of-print titles back into print as eBooks could create $740 million in revenue in the first year after publication, $460 million of which would accrue to the publishers and authors. In addition, we estimate that making these books available would create $860 million in consumer surplus in the first year after publication.
However, we wish to note carefully that our methodology for obtaining these estimates has several important limitations. First, our estimates rely on accuracy of the propensity score matching across NOOP and KOOP titles, which is based on observable book characteristics. If these observable characteristics do not adequately capture publisher's decisions about which out-of-print titles to bring into the Kindle market, it could bias our results. In order to check how this selection might affect our prediction, we eliminate all top 10% bestselling KOOP samples, which one might argue to titles that were deliberately and successfully selected by publishers. We then use the rest of the samples to match with our NOOP samples. The average weekly sale per title drops to 0.23, and the average weekly revenue per title drops to $2.65. Based on this calculation, making the remaining 2.7 million out-of-print books available as eBooks could create $150 million in revenue and $177 million in consumer surplus in the first year after their debut. Out of the revenue, $55 million would accrue directly to publishers and authors as profit.
These numbers are much lower than what we get using all KOOP samples. However, the numbers suggest that the surplus created by digitization is still pretty large even if top selling titles were those that were successfully selected by publishers. Second, lacking publicly available Kindle sales data, our estimates rely on our ability to properly map observed sales ranks for Kindle titles to actual sales levels. While we tried to be both careful and conservative in this estimation, as noted above, the fit between sales rank and sales is relatively poor for low selling titles -the focus of our research, and this might also bias our results. Third, we only considered sales of titles with PS bigger than 0.2 when calculating this surplus generated from releasing 2.7 million titles, causing our estimate underestimated. A final category of limitations arise from the fact that our estimates are (of necessity) based on the current size and scope of the eBook market. Our estimates could change (and indeed would likely increase) as the penetration of eBook readers increases. Previous research shows that the cannibalization of physical book from eBook for the same title is negligible (Hu and Smith, 2011). However, we may be overestimating the true surplus generated by digitizing these new titles if the sales of these new titles cannibalize sales of existing titles (titles that are currently available in Kindle format). However, in spite of these limitations, we believe that our estimates provide a useful first effort to estimate changes in consumer surplus resulting from the introduction of new goods in this strategic market.
We also note that the method proposed in this paper could also be applied by publishers to decide which of their titles they should focus on first when digitizing out-of-print catalog titles. We note that publishers could also adapt our proposed methods to take into account other, unobservable, book characteristics that might influence the decision to introduce books into the Kindle market.