I NFERRING A PP D EMAND FROM P UBLICLY A VAILABLE D ATA 1

With an abundance of products available online, many online retailers provide sales rankings to make it easier for consumers to find the best-selling products. Successfully implementing product rankings online was done a decade ago by Amazon, and more recently by Apple’s App Store. However, neither market provides actual download data, a very useful statistic for both practitioners and researchers. In the past, researchers developed various strategies that allowed them to infer demand from rank data. Almost all of that work is based on an experiment that shifts sales or collaboration with a vendor to get actual sales data. In this research, we present an innovative method to use public data to infer the rank–demand relationship for the paid apps on Apple’s iTunes App Store. We find that the top-ranked paid app for iPhone generates 150 times more downloads compared to the paid app ranked at 200. Similarly, the top paid app on iPad generates 120 times more downloads compared to the paid app ranked at 200. We conclude with a discussion on an extension of this framework to the Android platform, in-app purchases, and free apps.


Introduction 1
The growth of mobile phones and smart phones over the last few years has been phenomenal. Based on recently published reports, there are about 106 million users of smart phones 2 in the United States. Globally, there are 1.1 billion active mobile subscriptions 3 with over 100,000 new smart phones being sold every quarter. 4 As more countries deploy high speed wireless networks, users are spending an increasing amount of time on their phones. A significant reason for this growth has been attributed to the availability of mobile phone applications (apps) that are becoming ubiquitous on all mobile operating systems. In April 2012, according to Appshopper. com, there were over 787,000 apps available for the iOS platform and 50 percent of mobile phone users used the downloaded applications. Similarly as of May 2011, the total number of apps available for the Android platform was approximately 200,000 (Barra 2011 Apple's iPhone ushered an era where developers were able to sell their innovative applications to a large consumer base through the iTunes app store platform. These apps cover a wide variety of domains including games, location services, productivity, and healthcare. In 2011, Apple announced that more than 15 billion apps had been downloaded from its app store as of July 2011. 6 Clearly, the app market has found favor with customers. Mobile apps attract end consumers and create diverse opportunities for additional revenue for app developers, device manufacturers, and cellular service providers. More importantly, as users start valuing these apps more and more, app stores have an opportunity to engender strong externality. Thus, these platforms lure developers, sometimes by providing deep subsidies, to write diverse applications for them. This has resulted in growth in the number of both large and small app publishing firms entering this highly dynamic market. This increase in diversity of developers results in greater variety of software applications available to consumers (Boudreau 2012). Based on recent statistics, 7 there are 500,000 apps approved for the iOS platform developed by over 85,000 app developers.
The growth of app market provides a great opportunity to examine important questions around software innovation, firm entry and exit strategy, software product pricing and promotion, platform leadership, and externality. However, our understanding of this market is limited due to lack of demand data. Similar to Amazon's book market, Apple, Google, and Nokia do not provide sales information on any application. In fact, even the app developers get somewhat aggregated data from these platform owners. For example, Apple may not provide details on applications downloaded on the iPad and iPhone separately to a developer. 8 Additionally, app developers themselves are reluctant to share any details on demand for competitive reasons. Thus, most individuals have access to aggregate numbers or the rank data. For example, Apple's App Store typically provides a list of the top 200 paid apps, top 200 free apps, or top 200 highest grossing apps. Unfortunately, having access to just an app's rank is not useful because one cannot infer the value of an app placed at a given rank. For a developer, it is hard to determine whether the cost of moving up a few ranks by promotional activities is worth the benefits. Similarly, one cannot readily determine whether a particular niche in the app market is viable or how to set various marketing mix variables. Thus, having access to demand is highly beneficial to both practitioners and researchers.
Fortunately, researchers were able to infer demand from the rank data in the case of Amazon's book sales (Brynjolfsson et al. 2003(Brynjolfsson et al. , 2010Chevalier and Goolsbee 2003;Chevalier and Mayzlin 2006). In these papers, rank and sales are assumed to be related via the power law (or Pareto distribution) implying that a small number of products capture a large share of the market. The typical Pareto distribution that has been estimated in extant research is Where b is the scale parameter and a is the shape parameter.
To estimate the model parameters,we need both rank and sales information for app. Rank information is generally available through the publishers of those ranks. But to get demand data, researchers either conducted experiments or collaborated with publishers. Chevalier and Goolsbee (2003) conducted a creative experiment to infer demand. They selected low-selling books (for which the demand was known or assumed to be very small) and purchased large quantities (relative to already low demand) of each book on Amazon. As the ranks for the books changed, they could infer the relationship between the sales rank and experimented demand. The downside of this approach is that such experiments are practical only for very low-selling books. Therefore, inferring the relationship at the top ranks using this approach is not entirely accurate. In their study, Brynjolfsson et al. (2003) collaborated with a book publisher to get access to demand data to establish the sales and rank relationship. In both approaches, it is imperative to get access to demand data. However, the most important aspect of calibrating this relationship is that it paves the way for other interesting work. For example,  estimated the elasticity of substitution between new and used goods. Ghose and Sundararajan (2006) used it to study the software security product marketplace. Brynjolfsson et al. (2003) and Anderson (2004) Chevalier and Goolsbee (2003) Data Source: Poynter (2000) 1.199 Chevalier and Goolsbee (2003) Data Source: Weingarten (2001) 1.05 Chevalier and Goolsbee (2003) (Evidence from various experiments suggesting a value between 0.9 and 1.3) 1.2 Brynjolfsson et al. (2003) 0.871  Data Source: Weingarten (2001) 0.952 Chevalier and Mayzlin (2006) 0.78 Brynjolfsson et al. (2010) 0.613 used this to establish the long-tail phenomenon on the Internet. One can also study the dynamics of demand over time.
In summary, use of sales rank to compute actual sales or use in lieu of sales has become common in academic research because of the unavailability of actual sales data.
With the increasing popularity of mobile apps, we expect that many researchers will be studying the dynamics of this market. In this paper, we provide a methodology to link rank with actual downloads for mobile apps using publicly available data. While our method is similar to methods proposed in prior studies, it is different on a few key dimensions. First, to calibrate the relationship between rank and sales, the prior work needed access to demand data (either from an experiment or from a book publisher). Getting demand data is quite challenging, much more so in the App market, rendering this work quite difficult. In our paper, we calibrate mobile apps rank-sales relation using publicly available data alone. Access to any demand data is not needed at all. Second, many prior studies calibrated this relationship using books with very low rank/baseline sales. This has the potential to introduce prediction inaccuracies for top selling books. In our case, we are calibrating the relationship for top ranking apps. We believe we can provide more accurate estimates for the top selling apps (which sell disproportionally more).
In recent work, Carare (2012) examines how the past rank of an app influences future demand. Since demand data is unavailable, he provides a method that overcomes lack of demand data to estimate the parameters. In our paper, we provide a framework to infer demand from publicly available rank information. This direct measure of demand then allows for estimation of other variables of interest-for example, Carare's method cannot recover price elasticity (see page 732).
In this paper, we will illustrate that one can calibrate the ranksales relationship using publicly available data alone. Furthermore, to the best of our knowledge, this is the first study that tries to calibrate the relationship between app rank and sales for mobile platform. The next two sections discuss our estimation method and data. We then present our results and provide some validation to our method. In the final section, we provide evidence of robustness and generalizability and present our conclusions.

Model
Our subject for inferring the demand will be Apple's App Store. We will discuss Google's Android store (recently renamed Google Play Store) in a later section. A key feature of app stores is that three different rank lists are publicly available: top-free applications, top-paid applications, and top-grossing applications. The top-free list shows the mostdownloaded applications that have no upfront purchase price.
The top-paid list shows the most-downloaded applications that have a non-zero price. The top-grossing list ranks the applications based on revenue generation.
Like the extant literature, we assume a Pareto distribution for inferring downloads from rank. Assuming number of downloads of an application at rank r p in the top-paid list is given by d rp , the Pareto distribution could be written as Here b p defines the scale factor that is dependent on the total market size for iPad or iPhone apps, and a p defines the shape of the Pareto curve.
Similarly we define the Pareto distribution of apps in the topgrossing list where pd rg is the revenue generated by the application at rank r g in the list. This revenue could also be written as the product of price (p) and number of downloads (d rp ) of the same application in the top-paid list. Thus we can write the distribution for the top-grossing apps as 3) assumes that the top-grossing apps generate their revenues from the upfront pricing only. Additionally, both free and paid apps may include additional features inside the application that users may purchase. Thus, paid apps may generate some additional revenue that is not reflected in (3). However, in-app features are most common for free apps. For paid apps, the dominant source of revenue is still upfront prices. In the "Discussion" section, we will discuss how our method can be adapted to in-app purchase options.
In equation (2) and in equation (3), we know the values of p, r p , and r g from publicly available data. The unknown parameters that we need to estimate are b p , b g , a p , a g . We can rewrite (3) after taking logs as This could be estimated using a simple truncated ordinary least square regression. For readability purposes, we do not index r p and dr p for a given app i. In other words, rank and price information for an app (i) is treated as independent cross sectional data even if the same app appears multiple times.
Notice from (8) that we can only recover the ratio of scale parameters (b g /b p ). Estimating individual values of the scale parameters (b p and b g ) requires additional information. Since the information for actual downloads for an individual app is not readily available, we use aggregate downloads in a day to recover (b p and b g ). To see this, notice that if we know aggregate downloads (D t ) then Thus, with the knowledge of total number of downloads of top ranked apps we can recover b p and b g from the formula above as In the equation above, the shape parameter (a p ) is estimated from prior equations and the integral of individual app downloads (d rp ) defines the total downloads associated with all top ranked apps.

Data
As we mentioned earlier, the top-paid rank list and topgrossing rank list are readily available from various websites such as Apple, Appshopper, AppAnnie, and Mobilewalla. Our data period was from April 2011 to May 2011. The information collected contained the top 200 paid and the top 200 grossing app rankings recorded twice for each day during this period for both iPad and iPhone. We also collected data on prices. It should also be noted that the presented methodology could be scaled to incorporate a ranking list of any size as long as data is available. The summary statistics are given in Table 2.
We observed 20 different categories for apps where 38 percent were categorized as games, 13 percent as productivity, 7 percent as entertainment, 6 percent as utilities, 5 percent as photography, 5 percent as education, 4 percent as business, 3 percent as news, and 18 percent as the remaining 12 categories. A snapshot of the categories is presented in Figure 1.
In our calibration, we did not use any app characteristics other than the rank and price of an app. 9 The descriptive statistics are presented only to show the differences between apps for iPad and iPhone. For example, the average app file size is smaller for the iPhone (when compared to the iPad) and app prices are lower, suggesting the possibility of fewer graphical details, potentially due to the smaller screen size on iPhones.

Figure 3. Overlapping Apps on Top 200 Paid and Top 200 Grossing App Lists (iPad and iPhone)
We observe that, on average, 28 percent of apps overlap between iPhone and iPad lists and the average correlation between ranks is 0.46. We also found that 128 (10%) unique apps out of 1,223 have a presence in the top paid lists for both the iPad and iPhone. Similarly, 207 (13%) unique apps out of 1,638 have a presence in the top free lists for both the iPad and iPhone.
We plot the rank correlation across two different platforms in Figure 2.
For our analysis, we use apps that appear on both lists. A summary of the overlap between these two lists is provided in Table 3.
On average, 53 percent of the top 200 paid apps on iPad and 46 percent of the top 200 paid apps on iPhone are also ranked among the top 200 grossing apps list. As seen in Figure 3, there is a strong correlation (average value = 0.55 on iPad and 0.49 on iPhone) between the ranks of apps on the top-paid list and top-grossing list, which suggests that a higher rank (lower numerical value) in the paid list tends to generate larger revenue. A positive and significant estimate β 1 suggests that an increase in the rank on the top-paid list increases the rank on the top-grossing list as well. Estimate for price (β 2 ) is negative and significant suggesting, all else equal, higher prices lead to more revenues and a lower numerical rank on the top grossing list. 10 We are not examining the effect of price on sales (or elasticity) but are simply using price to connect the  As shown in equations (6,) (7), and (8), once we estimate (4), we can recover the shape parameter for both the top-grossing and top-paid apps readily. They are produced in Table 5.

Shape Parameter (a)
The estimated values suggest that most sales occur in the "head," so the distribution of app demand is top-heavy (even within the 200 top apps).
A true benefit of the shape parameter is that we can estimate the ratio of the number of downloads of two apps that are ranked differently during any given day in Apple's app store.  (11) and (12) is that one can compare the value of different ranks. We can infer the number of downloads or revenues for different ranks. For example, the number of downloads enjoyed by a top ranked iPad app is 120 times higher than the app ranked at 200. Similarly a top ranked iPhone app gets 150 times more downloads than the app ranked at 200. Also, from the shape parameters for revenue (based on the top-grossing app list), the top-ranked app grosses 1.86 times more revenue than the second ranked app on the iPad. This relative valuation is an important factor for firms when they are investing marketing dollars in promoting their applications. One can readily infer the benefit of moving up or down in rank relative to the money spent on promotions.

Scale Parameter (b)
Usually, the shape parameter would be all we are interested in. Estimates of the shape parameters readily allow for comparison between two ranks. Moreover, the shape parameters tend to remain more stable over time. However, we can now provide a method to estimate the scale parameter as well.
To estimate the scale parameter we need additional information. Note from equation (7) that we can only recover the ratio of scale parameters. To estimate absolute scale parameters, we would need access to actual sale volume from a vendor. However, as we showed in equations (9) and (10), even the aggregate total number of downloads is enough to recover these parameters.
Fortunately, the total number of downloads is calculated and presented by various app store analytics firms. As estimated by one such analytics firm (Distimo), the total number of downloads per day for the top 300 paid iPad apps is approximately 110,680 and the total number of downloads per day for the top 300 paid iPhone apps is approximately 386,545. We can readily plug these numbers in equations (9) and (10) to estimate the scale parameter. 11 Given these statistics and the coefficients from the regression above, estimated scale parameters using equations (9) and (10) are provided in the first row of Table 6.
11 Even though we estimated shape parameter with the top 200 apps, extrapolating the demand to the top 300 apps is straightforward.

Figure 4. Number of App Downloads Versus App Rank on Top Paid List (iPad and iPhone)
Thus we can now illustrate the relationship that links the total downloads of each app for any given rank. Similarly, we can specify the function that links revenues with rank. Using our estimated parameters, these functions are as follows: We expect the estimate of scale parameter to change with time as more consumers buy apps on their mobile devices. Nonetheless, aggregate download numbers allow for the estimation of scale parameter.
The graph in Figure 4 plots equations (13) and (14) for app sales as a function of app rank in the top paid list for the iPad and iPhone. Given the high value for the shape parameter, the number of downloads drop sharply. An app ranked 200 on the iPad would generate about 100 downloads per day. However, an app ranked at 1000 would generate only about 25 downloads. Given that there are more than 200,000 apps available, it is fair to conclude that most of them generate little or no demand.

Model Validation
One challenge in testing these models is the lack of available data on downloads and app revenue. We developed three different ways to test the validity of our model.
First, recall that we estimate two different models, one estimating total downloads and the other estimating total revenue. The difference in the two is that the second model is price multiplied by the demand coming from the first model. Thus, we can cross-validate the models by estimating downloads from the first model and multiply by the app price to get the estimated revenue from the second model. From Table 7, we can see that the values of estimated revenue are very close (confirmed by t-test) which provides confidence in the accuracy of the computed models.
Additionally, to validate the model, we partnered with two separate app developers (who have requested to remain anonymous). Developer D1 shared data on an application, its rank, and total downloads (iPad + iPhone) for a month. Developer D2 shared similar data, but her app was available for iPhone only. We cannot estimate shape parameter from We also plot actual and predicted downloads per day for this app over the sample period, as seen in Figure 5. It is evident from the plot that the model is doing a good job of predicting the sale of a given app if the rank is known. This provides additional evidence regarding the robustness of our method.
Data from developer D2 is used to estimate shape parameter. However, D2's app is a low-ranked app (average rank is 350). Typically Apple does not publish this rank. However, the app developer worked with an app analytics firm that claimed to have this information. The data spanned 5 months from January 2011 to May 2011. The average number of downloads during this period was 301 (std. dev. = 13). Using this data (N = 55), 12 we estimated the shape parameter to be 0.89. This is consistent with our estimate of 0.94. This is despite the fact that the app was in the tail while our model is estimated using top 200 data. Thus, the actual data from two separate apps provides validation for our method in general.
Finally, Distimo also shared the average daily revenue aggregated for the top 200 apps in the top grossing list to be approximately $632,158 for iPad and $1,014,371 for iPhone. Using these numbers and the approach suggested in equation (9), we estimated b g as $80,538 for the iPad and $120,375 for the iPhone. We see that these scale parameters are close to the estimates we provided in Table 6. This further validates our 12 The rank data was available only for a few days during each month. approach to infer app demand from the publicly available data.

Discussion
So far we have presented a methodology to infer the functional form of demand for Apple's App Store (iPad and iPhone). We believe that this approach is portable to any size of ranked lists and any platform as long as the rank data from multiple lists is available publicly. This is an important contribution as it opens doors for both researchers and practitioners to investigate interesting research and marketing investment questions for mobile platforms. Since the Android platform is another fast-growing mobile platform and in-app purchases are gaining momentum, the following discussion shows how our framework can be ported. We also discuss how our approach can be extended to the free app rank list.

Android Platform
Although the Google Android store provides crude information on the range of total lifetime downloads of an application, it does not provide any meaningful periodic demand numbers. However, similar to Apple, Google does provide both the top paid app lists and the top grossing app lists. Therefore, we can easily port our method to develop the rank-demand correlation for Android app store. We collected similar data for one week in April 2012 and reestimated the shape parameters as shown in Table 8.
We find that that the shape parameter for paid list (a p ) is similar to the one estimated for the iOS platforms but the shape parameter for revenue (a g ) is larger. It suggests that the revenue generated by apps in a top grossing app list is more skewed on the Android platform.

In-App Purchase (IAP)
Apple's iOS and Google's Android platforms allow for in-app purchases of content, functionality, services, or subscriptions, 13 allowing app developers to generate additional revenue from either paid apps or free apps. In-app purchases allow consumers to buy these additional features after exploring the capabilities and benefits of an app.
Notice that our analysis depends on tying the paid list with the grossing list via price to generate our estimates. However, if in-app purchase options become a major source of revenue, we need to find a way to modify our estimates. In what follows, we explore this option in detail. To account for inapp purchases, we rewrite equation (3) Here I IAP is an indicator variable identifying the availability of in-app purchase options. If there is no in-app purchase available (I IAP = 0), our analysis boils down to what we did earlier.
If in-app purchases are available (I IAP = 1), θ determines the revenue generated from in-app purchases. In short, all else equal, if an app generates large revenues from in-app purchases, its position on the top grossing list will move relative to the top paid list. Since we know which apps have in-app purchase options, it allows for ready identification of θ.
Taking logs of both sides reduces the regression equation to log(r g ) = β 0 + β 1 × log(r p ) + β 2 × log(p + θ × I IAP ) + g (15) Estimating the parameters using a nonlinear regression, we find that the shape parameters are consistent (a p_iPad = 0.90, a p_iPhone = 0.939). Thus, the addition of in-app purchase options does not change the slope parameter of our estimated distribution. We estimate the value of θ = 0.16 (p-value < 0.03), which suggests that, on average, IAP revenue is approximately equal to the 16 cents per download of an app (about 7% of the revenue). Thus, our method allows us to calibrate the rank-sales relationship readily, even in the presence of inapp purchase options.
Of course, our model's accuracy will decrease if each paid app has an in-app purchase option and generates unequal revenues from in-app purchases for unobserved reasons. Even then, if we have some observables that can predict inapp revenues, we can readily use this method. 14 In short, our method can be readily modified to accommodate the in-app purchase option.

Free Apps
Our approach so far has focused on the demand estimation for the paid apps that are placed on the top 200 bestselling list and overlap with the top grossing app list. Another extension 13 https://developer.apple.com/news/pdf/in_app_purchase.pdf.
14 Generally we would expect that, in future, apps on the top paid list will also generate more revenues from in-app purchase. This positive correlation will keep their position on top-grossing app intact, allowing our method to go through readily. of this work is to estimate the demand for free apps, where the overlap with the top grossing list is purely because of the inapp purchases. Since free apps attract over 10 times more volume of downloads, it is even more lucrative for app developers to infer the extent of revenue generated using in-app purchases for free apps. We briefly discuss how our method can be applied to free apps.
The key difference between paid and free apps is price, so the revenues for free apps are simply Using the overlap between top grossing apps and top free apps, our estimable form, similar to equation (5), reduces to log(r g ) = β 0 + β 1 × log(r F ) + g where the estimated parameters are related to model parameters as follows: Recall, our grossing app parameter a g is already estimated from the paid list. Therefore, we can readily recover a F from the estimate β 1 . 15 In our estimation, we treat each data point as an independent observation even if the same app appears multiple times. This is because the rank of an app is driven by its sale alone. If that were not the case and some other factors affected rank, then it is possible that a higher ranked app might have lower downloads. In that case, we can rely on fixed effect models. These models assume that each app is separate and unique and has its own intercept. Our method can readily be used if Apple were to start a ranking scheme that does not entirely rely on demand alone. 16

Conclusion
We have used publicly available rank data and presented a methodology to estimate the product sales from the rankings of apps listed in top-200 lists on Apple's app store for both iPhone and iPad. From our analysis, we find that the number of downloads enjoyed by an iPad app ranked first on the top paid list is 120 times the number of downloads for the app ranked 200. Similarly an iPhone app ranked first gets 150 times more downloads compared to the app ranked at 200. We also show that the iPhone app ranked first on the top grossing list earns 95 times more revenue compared to the app ranked 200. The similar number for the iPad app store is 110. Thus our model allows for comparison between any two ranked apps.
We also provide a method to estimate scale parameter from aggregate data. Thus, we show that the top ranked app on the iPad is downloaded 13,516 times per day and the top ranked app on the iPhone is downloaded 52,958 times per day (April-May 2011). These estimates should help app developers and marketing professionals guide their marketing efforts.
We have validated our model in different ways, including gathering data from two separate app developers. We also consider various extensions. For example, we extend our method to Google's Android platform to calibrate a similar relationship. We consider the possibility of how in-app purchase options could affect our estimates. We show how our method can be readily adapted to account for the expected growth of in-app purchase options in the future. We also show how our method can be extended to the free app rank list.
Most importantly, we believe that inferring demand data from rank is highly valuable for researchers. It will open doors for 15 We estimate a f = 0.45 and 0.62, respectively, for iPhone and iPad from our data. These estimates are lower than paid app suggesting that the curve for free apps is not as skewed as for paid apps. 16 If we used the fixed effect approach, our estimate for shape parameter increases to 1.33 for iPhone and 1.64 for iPad. They are significantly higher than estimates without fixed effect or the estimated shape parameter from the actual sales data from developer D2. more exciting and interesting research that has not been possible due to the absence of reasonable demand estimates. Mobile apps are an important and fast growing technology market. Understanding this market and the opportunities it offers is important for different stakeholders. We believe our research method and results have many important implications and hence make a very useful contribution not only to academic literature, but also to managers and entrepreneurs. We also believe our methods can generalize to any platform that provides top-paid and top-grossing rankings. Top mobile platforms indeed provide such rankings.
Our research can be improved on many different dimensions. Our dataset is limited to the top 200 apps. Having access to a larger number of ranked apps (say, the top 1000) should provide a better fit. Similarly, while we verified our estimates with data from only two app providers, there are significant opportunities to expand the scope of this data collection by collecting data from more apps across different categories. Also, we assumed that both download-and revenue-based rankings follow a power law, but alternate and more precise distributions could also be developed if more data is available from app producers. We hope future research will extend and refine our methods.

Copyright of MIS Quarterly is the property of MIS Quarterly & The Society for Information
Management and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.