Change in Copyright Law as a Market Intervention to Realize the Welfare Potential of Text Mining in Scientific Research

With text mining technologies advancing it becomes easier to analyse and search through an ever-growing amount of literature and to semantically integrate literary works within large corpora of literature as well as with the emerging global web of linked data. It becomes possible to use and reuse literary works in more efficient and productive ways. With these new conditions and possibilities the existing justification of copyright law is questioned and a reassessment of the law’s efficiency is required. Copyright law is justified by the economic rationale of a dynamic incentive for production of knowledge by means of temporary property rights and monopoly rents for rights holders. The objective of this paper is to examine the current welfare economics-based justification of copyright law with regard to text mining technology uses in scientific research. The paper proposes possible changes to copyright law to address the conflict between the current copyright law and the opportunity of increased productivity. The paper combines micro-economic argumentation of the “Economic analysis of law�? with growth theory and seeks to illustrate a public welfare analysis for the specific constellation of literary works information goods markets, researchers and research organisations as productive users of works and research text mining services. After introducing public welfare analysis and its application to questions of market failure and information externalities the paper describes the technical possibilities, benefits and conditions of text mining as well as relevant market players. The economics of copyright law are explained and how copyright currently applies to text mining uses. The economic methodology is then applied to the legal situation and the possibilities to realize the potential of text mining in scientific research. The paper also takes copyright’s competition objective and the coupling/tying of literary works markets and text mining services into account, considers network externalities and international trade and it draws analogies to other markets with similar challenges. The findings of the paper are presented mainly for national-level public welfare analysis, but can be applied to regional level. Applied to the economic region of the European Union (EU) the conclusion of this article is: A mandatory copyright limitation exempting all research/scientific-purpose text mining uses including uses with commercial purpose from copyright, combined with the requirements that anti-circumvention law and contract law must not prevent executing such a limitation and that “legal access�? defined as technical access to literary works is a sufficient condition to exercise this limitation, should be implemented in EU copyright law, because this is the best way to maximise the public welfare in the EU. ----------- NB: A public consultation on the review of the EU copyright law also addressing text mining is currently underway (extended deadline for submissions: 5 March 2014).


Introduction
Text mining technologies have matured and as a form of computational research today enable more productive scientific research and in many disciplines have become indispensable to keep up with the rising surge of literature produced. Text mining technologies are often applied to copyrighted works.
The extent to which researchers and text mining service providers facilitating those researchers can do so without copyright holders' permission and associated transaction costs has become a factor of national competitiveness. Therefore, highly developed countries are broadening copyright limitations in their national legislation and EU member states are pushing for more flexibility and less constraints on copyright limitations in EU copyright law. Various proposals to achieve this have been made, most notably proposals of a designated copyright limitation applicable to text mining uses with scientific purpose and application of fair use. In this context, provisions to prevent the override of copyright limitations through contracts and technical protection measures (TPMs) as well as definitions of "legal access" to copyrighted works as criterion for execution of a copyright limitation have been proposed. Copyright legislation and its changes are forms of government market intervention and subject to public welfare assessment following the overall objective to maximise national public welfare and increase national competitiveness.
In this article a welfare economics approach to copyright law is employed to explore the effects a change in copyright law with regard to text mining would have on the literary works information 3 goods markets, on researchers and research organisations as text mining users, on the text mining service market and on public welfare as a whole.
First, the overriding public welfare objective will be described, detailing how intervention measures are appraised and how such appraisals take positive information externalities into account. Second, research text mining and its benefits are presented, followed by an introduction to the basics of copyright law and how it currently applies to text mining uses. Further, copyright law's intervention rationale and how copyright law affects production and use of literary works will be explored. The 'non-expressive use' copyright limitation will be explained as a broader concept comprising text mining uses.
Finally, the focus will shift to copyright's competition objectives and how those objectives play into the market power relationship between non-expressive use-based services and the copyright-based literary works market, concluding with an illustration of public welfare analysis concerning two options: exempting research text mining for non-commercial uses only and exempting all research text mining uses. The result of the illustration -that exempting all research text mining uses leads to the highest level of public welfare -can be assumed to apply to most developed countries as well as to the larger economic area of the EU.
The article therefore follows the same direction as Reichman's & Okediji's (2012) approach of incremental change, i.e. change within the logic of property law: Copyright law's capacity to impede the use of the cumulative literature and data that "digitally integrated scientific research methods massively ingest poses a threat to basic scientific research methods today… the task is to reconcile the historical values of intellectual property law with the modalities of a digital age, in order to reinforce the needs of scientific investigators operating under twenty-first century conditions, and to stimulate maximum public welfare payoffs from their new technological tools." 2 Public welfare and government intervention

Public welfare objectives
Copyright law is often viewed by its intellectual property (IP) aspect and as it applies to individual self-contained creative works. In the context of text mining individual self-contained works become of secondary importance and the aspect of copyright's public welfare function deserves more attention.
To this end taking a look on what public welfare objectives are and how public welfare can be measured is worthwhile.
A market economy is a society as a system where laws give entitlements to private stakeholders such as property rights and protect those entitlements. Public welfare is a measurement describing the total economic and social welfare of a society. The overall policy objective of governments in most developed countries is to maximise public welfare.
Public welfare objectives work toward this overriding aim and consist of market price-based and distributional objectives as well as social and non-economic objectives. Despite the broad range of objectives under the umbrella of public welfare objectives, market price-based objectives can be considered the most important ones and the aim is to reach market efficiency which represents maximum public welfare contribution from markets.
However, "there are many obstacles that prevent a community's resources from being distributed among different uses or occupations in the most effective way. The study of these … seeks to bring into clearer light some of the ways in which it now is, or eventually may become, feasible for governments to control the play of economic forces in such wise as to promote the economic welfare, and, through that, the total welfare, of their citizens as a whole" (Pigou 1932). In this vein, economic efficiency has been defined "in terms of allocative efficiency which maximises social welfare (i.e. maximising the sum of individuals' utility)" (PWC 2011). It is described also as market efficiency, optimal resource allocation, resource use efficiency or market equilibrium. Circumstances which prevent economic efficiency are described as market failure.
The market failure approach is based on a few assumptions to abstract from and reduce complexity in its application. One assumption is that resources are scarce. Because resources are scare the problem of how to best allocate resources to achieve maximum public welfare exists in the first place. Another assumption is that market participants act rationally out of self-interest and make decisions based on complete information. It is also assumed that all market participants have a limited budget. Therefore, acting rationally and out of self-interest implies that market participants compare the costs and benefits of different options at disposal and decide for the option which achieves the most utility and benefits for them. Consumers choose between goods to buy aiming for the highest utility per price unit and producers choose between options of what goods to produce aiming for the most benefits in form of profits. Choices between goods is also determined by how different goods are substitutable to each other.
Further to this, the market failure concept is based on the ideal of a fully competitive market where there is no price control and prices perfectly match consumer preferences and demand on the one side and producer preferences and goods on offer on the other side. Competition reveals market demands and induces producers to meet these demands by operating in the most productive way. Depending on goods' substitutability consumers can switch to another option of which good to buy, inducing competition among producers. Competition thereby increases overall productivity, i.e. with the same resource input more and better goods can be produced.
Entitlements are traded (transactions) and in a perfect market prices serve as a signal to direct resources to their highest-valued uses. "No perfectly informed and rationally behaving homo economicus will engage in a transaction, if his situation would deteriorate by it. Therefore, transactions will only take place in cases where they enhance the individual welfare of all parties to the transaction. Because those who have the highest value for a good, are willing to pay the highest price 5 and the seller will only sell for a price that is higher than his own valuation, the good will end up with the actor who values it the most and with whom it will therefore contribute the most to social welfare" (Koelman 2004).
Market failure occurs and maximum public welfare is prevented when another conceivable outcome exists where one market participant can become better off without someone else becoming worse off.
Market failure can occur for a number of reasons such as the nature of the good in question, market externalities, the nature of transactions and information asymmetries as well as market structure and market power. When market failure is found to occur government intervention may be justified.
However, market failure does not necessarily lead to government intervention and also social and noneconomic aims can justify government intervention.
With the 'law and economics' approach legislation is one type of government intervention: "the desirability of a law is not established by its 'justice', but instead by its 'efficiency'. That is to say, a 'good' rule is one that results in 'maximum social welfare'" (Koelman 2004). Copyright law and its changes are government market interventions following this approach. They are property right law interventions which in contrast to other property right laws exist and can change at the legislature's discretion. For copyright law as market intervention market efficiency as well as social and noneconomic objectives are applicable.
Market externalities (also called market external effects or spillovers) can be a type of market failure and are the main justification for copyright law. Therefore, they deserve extra attention. The term "external" in this context refers to the ideal market model where all benefits and costs from a market transaction are captured in the price paid in the transactions, i.e. they are internalised in the market.
As a consequence, a saleable good with negative externality would be under-priced leading to overproduction and overconsumption. Too many resources are spent in this market which could be used in other areas in the society to a better value for the society at a whole. For example, the pollution resulting from the production of electricity by burning coal negatively affects people living in the vicinity of the power plant, but those negative effects are not 'priced in' when the electricity is sold.
The third party, people living in the vicinity of the plant, suffer, but are not part of the transaction between the producer and consumer of the electricity.
Examples of goods with positive externalities are solar panels, leading to the reduction of pollution, vocational training, leading to improved skills of employees, but where the benefits to reap for an employer may be too far in the future to be encouraged to invest in, or education and research in general, likely leading to positive effects barely conceivable at the time and point when it takes place.
Accordingly, external effects are benefits and costs which are not captured in the market price but are borne by others than the parties involved in the transaction. A slightly broader definition would be: "externalities are benefits (costs) realized by one person as a result of another person's activity without payment (compensation). Externalities generally are not fully factored into a person's decision to engage in the activity." (Frischmann & Lemley 2006) Negative externalities are typically associated 6 with environmental damage for which the producer of a good is not taken into account, but also social degradation indicated by crime and erosion of families is an example.
Positive externalities are also associated with knowledge and infrastructure.
Public welfare is constituted not only by goods' private value for individuals acting as market participants, but by the social value of goods, i.e. what benefits (or costs) a good brings to individuals irrespective of their market transactions and to the society as a whole.
"Many public policy debates over legal and economic issues boil down to a debate over which types of externality-producing activities to be concerned with and the extent to which institutions should be designed to regulate some and promote other externality-producing activities" (Frischmann & Lemley 2006). In this sense, to achieve maximum public welfare and make the most appropriate decision concerning intervention measures, the market price-based data is only part of what needs to be taken into account. Estimates of market externalities and the social value of goods are similarly important.

Social cost-benefit analysis
As a government intervention a change in copyright law would be subject to public welfare analysis of which a social cost-benefit analysis is the pivotal element. Following market objectives and nonmarket objectives and by estimating market data as well as externalities government bodies including EU-level institutions (e.g. EC 2008) use welfare economics to evaluate market interventions retrospectively as well as to appraise the anticipated impacts of contemplated market interventions on public welfare.
Welfare economics uses techniques of microeconomics and aggregates them to make macroeconomic conclusions -asking "in what way will a change in the law influence the decisions of market participants and how will their decisions affect social welfare?" (Koelman 2004).
To do this the public welfare of the current state of society as well as the public welfare of the alternative conceivable state of society as it would result from a specific proposed intervention is estimated to analyse the impact of the intervention.
For each of those two options the public welfare is calculated by analysing the respective option's intervention-related social benefits and social costs, weighing the total benefits achieved against the total costs incurred. Unrelated costs and benefits are assumed unchanged. The public welfare value of each state is time adjusted reflecting the change of money values over time and risk adjusted reflecting the uncertainties of the value estimations. Irrevocable costs (sunk costs) are not taken into account, but opportunity costs of continuing to tie up resources are (UK Government 2011).

7
The results for both options can be compared by combining them in a specific type of cost-benefit analysis. By combining both results the public welfare of the alternative option appears as opportunity costs in the combined cost-benefit analysis.
Therefore the net public welfare can be a net cost (the public welfare of the alternative state as opportunity costs outweighs the public welfare of the current state) or accordingly, a net benefit. If the opportunity costs prevail in the calculation the intervention would be justified as it would reshape market conditions toward better outcomes and increased public welfare; the society would be better off with the intervention.
With such a public welfare cost-benefit analysis also the net value of the government intervention itself can be quantified in monetary terms. For example, if the result of the analysis is that an intervention would increase public welfare then its net value (or net impact) would be the absolute value of the difference between the two options' respective public welfare. The net value of an intervention is the most important result of social cost-benefit analysis although it doesn't need to be the only one. Other measurement categories can be used to appraise and compare intervention options.
Benefit-cost ratio or cost-effectiveness analysis expresses the ratio between total costs and the net value of the intervention rather than its absolute net value. The economic internal rate of return can be used to express the net value of an intervention option as it would spread over a set number of periods of time as a rate of return over all periods, i.e. as a discount rate that would result in a zero net value.
Multi-criteria analysis is used to select alternatives according to a set of different criteria in contrast to cost-benefit analysis which focuses on the unique criterion of maximisation of social welfare (EC 2008). Economic impact assessments are also conducted to take into consideration sector, region or country specific socio-economic impacts of an intervention project (EC 2008). In principle all these assessment methods can be used for decisions on change in any economic entity not just on national level, but also on a more local or regional/supranational level.

Price formation and primarily affected market
While governments can intervene for a number of reasons, market price-based data often provides the first point of reference for measuring public welfare of different intervention options. Often the intervention impact can be located in one market which is primarily affected. For change in copyright law concerning text mining literary works the primarily affected market is the copyright-based literary works market.
Market prices usually reflect the best alternative uses that the goods could be put to (UK Government 2011). The price formation model, a technique of microeconomics, is used to analyse changes of welfare contribution from markets affected by an intervention, i.e. techniques of microeconomics are aggregated to make macroeconomic conclusions. 8 Conventionally, welfare contribution from a specific market is defined as the sum of the producer surplus and consumer surplus for all goods produced in that market. If public welfare would be constituted by market aspects only then public welfare would be the sum of the producer surplus and consumer surplus for all goods produced in a society. Producer surplus is the quantity of the good sold multiplied by the price (i.e. the revenue for the producer) minus the marginal production costs of the good. Fixed costs are sunk costs. Transaction costs inherent for a particular market are often borne partially by producers and consumers and would need to be subtracted from the respective surplus.
The consumer surplus is typically determined as the price consumers would be willing to pay minus the formed price they actually pay.
The "deadweight loss" is potential surplus not realized as people who value the good more than its marginal cost are not able to buy it. The idea of the price formation model is that the quantity consumed at the formed price in the context of the wider market environment represents optimal resource allocation and maximum public welfare contribution from that market.
The most common economic indicator is the gross domestic product (GDP) or its variant GNI (gross national income) presenting the sum of the producer surplus for all goods produced in a society and thereby also indicating economic growth over time. While more producer surplus usually means more tax revenue for the national budget, GDP/GNI alone is not suited as an indicator for public welfare. By contrast, public welfare analysis also estimates the consumer surplus and for an intervention appraisal the change to the combined producer and consumer surplus is taken into account. In any case, market price-based estimates are limited in that they provide little information about market external effects and how they would change with an intervention.

Downstream market effects
For change in copyright law acting as a market intervention not only the copyright-based literary works market is affected, but other markets are affected as well and market price-based effects can be detected there.
A distinct market within the market economy is often defined by the type of good produced. How resources are turned into goods and how prices form in one market is often related to upstream markets where the producing firm acts as a consumer to obtain goods for its production. Markets for upstream value-adding steps as well as markets for natural resources, capital and labour are upstream markets. For example, the impact of an intervention of increased rate of employment for example would be seen as a positive intervention impact, measureable not only in the primarily affected market.
In a wider sense, the inter-market substitution can be an intervention impact. To take this kind of impact into consideration in an appraisal the decreased output in the one market would be offset with the increased output in the other market to estimate the net impact on output. Similarly, the net impact of the substitution on employment could be determined and taken into consideration.

9
Besides upstream market impacts and inter-market substitution, downstream markets can be affected by an intervention. Downstream markets are markets where the producers are consumers in the primarily affected market. Downstream markets play a particularly important role in how externalities deriving from the primarily affected market are taken into account in public welfare analysis and therefore deserve particular attention.
So called pecuniary externalities are effects which are captured in the downstream market. Pecuniary externalities are mostly discussed in the context of goods which are sources of significant positive externalities and that is how it will be discussed in this article. For taking account of pecuniary externalities in an intervention appraisal three points should be considered: First, pecuniary externalities represent transfer of wealth among private parties only, but not increase of the overall wealth. Second, pecuniary externalities are not real externalities, with implication on how to measure them. And third, pecuniary externalities as they get appropriated in the downstream market are a more suitable measurement than consumer surplus of the primarily affected market to take account of private benefits beyond producer surplus in markets for goods with substantial real externalities. Dnes (2011) articulates the first point as follows: "Pecuniary externality occurs when loss of money by one party is associated with financial gain to another …: it represents redistribution of income with no impact on … overall economic welfare." Related to this the second point focuses on the fact that despite the name "externality", pecuniary externalities are actually not necessarily external to the market in question (e.g. the market primarily affected by a market intervention) and certainly are not external to the market system as a whole: "[Pecuniary externalities do not] result in a wedge between private and social returns" (Frischmann & Lemley 2006). They are rather associated with the private return described as "consumer surplus", but in contrast to consumer surplus focus on surplus which is actually internal to a transaction instead of assuming that the consumers' willingness to pay is derived from the social value of a good: "To the extent that the parties transact and recognize the sharing of surplus between them, then the benefits are not really external to the transaction" (Frischmann & Lemley 2006).
The concept of pecuniary externalities conceptualises the social value, i.e. the total value, of a good as the sum of producer surplus, real externalities which are actually external to the market, and surplus actually captured by the consumer under no assumption of perfect market conditions. Pecuniary externalities "affect distribution (e.g., the division of surplus between producers and consumers) [and more] … generally affect participants in a market, or parties to a transaction, by operating on the price mechanism. The most common sort [of pecuniary externality] involve differences between what a consumer is willing to pay for a product and what the consumer is actually required to pay in the market. This "consumer surplus" is external to the producer's activities and decisionmaking-the producer cares … not how much more [beyond the market price] I value [the good]" (Frischmann & Lemley 2006).
In any case, pecuniary externalities are not external to a system of interrelated relevant markets in which the price mechanism takes effect overall. Beside wealth transfer between one market and a downstream market also inter-market and intra-market substitution effects can be entirely based on pecuniary externalities.
To illustrate intra-market substitution entirely based on pecuniary externalities Frischmann & Lemley (2006) cite another author's example: "[c]ompetition is a rich source of 'pecuniary' externalities … Suppose A opens a gas station opposite B's gas station and as a result siphons revenues from B. Since B's loss is A's gain, there is no diminution in overall wealth and hence no social cost, even though B is harmed by A's competition and thus incurs a private cost", i.e. B's loss is a negative pecuniary externality for A.
Similarly, for an inter-market substitution entirely based on pecuniary externalities a decreased output in another relevant market (B's loss) offsetting the increased outputs in the market in question (A's gain) would be a negative pecuniary externality for the market in question (A).
The concept of pecuniary externalities is one from the macroeconomic perspective. From this perspective it does not matter so much from which private party wealth is transferred to which other private party. It does not even matter so much to which extent private parties are consenting to the wealth transfer, i.e. whether they are parties of a transaction. What really matters from this macroeconomic perspective is that wealth transfer through pecuniary externalities does not change total output and overall wealth.
For public welfare analysis it means that no matter where the wealth transfers representing pecuniary externalities take place they can, in principle, be detected within the intervention-relevant markets and thereby can be measured with market price-based data in intervention appraisals.
The third point focuses on the positive externalities wealth transfer into and beyond downstream markets represent. It builds on the second point asking how the private benefit beyond producer surplus should be conceptualised for a market for goods with substantial real externalities.
For goods with substantial real externalities the consumer surplus is not a suitable measurement (e.g. for measuring public welfare), because consumer surplus is a theoretical assumption which may work for goods without externalities but not for goods with systemic or intended externalities. The concept of consumer surplus builds on the concept of the ideal market where externalities are deemed failures and are not recognized in the price formation model. Accordingly, in this ideal market model consumer surplus appears to represent the social value of the good, because the private and social market effects are assumed identical in the model. For markets with systemic or intended externalities this assumption is misleading: the social value of a good cannot be inferred from the consumer's willingness to pay for the good. For example, a person crossing a bridge cannot be assumed to be willing to pay for the crossing as much as would be necessary to cover the full costs for producing the bridge even if taken together the amounts all persons crossing the bridge would be willing to pay for their crossings and even if some of the persons crossing the bridge carry goods to a village's farmers market (downstream market) to make profit.
Persons crossing a bridge would be willing to pay not more than their more or less immediate private gain, but their combined willingness to pay is unlikely to accrue to the social benefits.
The discrepancy in markets where significant externalities occur between consumers' private benefit and expected gain from a good and the overall social benefits from the good and consequent depreciation of social benefits in the price mechanism becomes more clear when focusing on consumers of the primarily affected market as producers in downstream markets: "[S]pillovers frequently arise in situations where beneficiaries' willingness to pay understates societal demand. … The market system works rather well in responding to consumer preferences measured by consumers' willingness to pay for goods and services. But the demand-signaling function of the price mechanism does not necessarily work well when purchasers use a resource as an input to produce a different good for which they themselves cannot expect to capture the full social value. For example, where an input is used to produce public or merit goods, the productive user may fail to observe, appreciate, or appropriate the social value. Purchasers' willingness to pay reflects their private demand-that is, the value that they expect to realize-and does not take into account value that others might realize as a result of their use… Their private willingness to pay accordingly understates the social value of their use. Dynamically, this demand manifestation problem works its way upstream" (Frischmann & Lemley 2006).
The wealth transfer captured in the downstream market diverts from the total wealth transfer which includes wealth 'leaking' out of and through downstream markets as real externality. Consequently, for assessment of the intervention impact on markets with significant externalities, pecuniary externalities -the wealth actually captured in downstream markets -rather than a hypothetical consumer surplus should be used as a market price-based measurement. Real externalities by definition and in contrast to pecuniary externalities represent the difference between private returns in intervention relevant markets and the social returns.
In the case of goods with real externalities the assumption of a perfect market with consumer surplus simply reflecting the social value of a good is not suitable. Public welfare impact of an intervention is reflected in changes to private returns as producer surplus and pecuniary externalities as well as real externalities. Impacts on real externalities should be measured separately as will be discussed in later sections in this article.

Negative and positive externalities
As for many types of market intervention also for change in copyright law measuring and estimating the effects a change would have outside of the primarily affected market and other affected markets is essential. Market externalities are omnipresent and their existence alone although described as market failure is not necessarily a reason for market intervention. Some externalities cannot possibly be internalised and are unavoidable and systemic. Others are in fact intended.
In any case, not all costs and benefits are captured in the market price and market price-based estimates can only give a partial answer to the question of how costs and benefits for the society would change with an intervention measure.
Hence, for public welfare analysis taking into account market external effects is similarly important or sometimes even more important than estimating market price-based data: "The full value of goods such as health, educational success, family and community stability, and environmental assets cannot simply be inferred from market prices, but we should not neglect such important social impacts in policy making" (UK Government 2011).
External effects in this context are also described as non-market impacts, non-market goods or the wider social costs and benefits for or wider effects on the society as a whole.
Judging an intervention based on equity objectives for example cannot rely in full on market data.
Equity objectives seek even welfare distribution based on geographical criteria as well as based on fairness with regard to how much stakeholders in a market actually have contributed to the value created in that market. Equity objectives therefore are partially based on more local market price data, but are also concerned with subjective judgement what constitutes value contribution and what is fair.
In the former sense option appraisal on a national level can adjust costs and benefits according to their distributional impacts. In the latter sense equity objectives are less based on market price but social and non-economic criteria taking into account market external effects.
Social indicators on the other side explicitly address environmental and social costs. For example the quality-adjusted life year (QALY) measurement is used for valuation of projected health gains resulting from an intervention. The Index of Sustainable Economic Welfare (ISEW) and the Genuine Progress Indicator (GPI) take into account environmental and social factors. The GPI is developed, among others, by the EU, and is designed to replace GDP as indicator. How useful and necessary social indicators are can be seen by the following representative example: The GDP growth resulting from an intervention is highly positive, but does not take into account environmental costs. The GPI taking into account the environmental costs could turn out to be zero or negative. The GPI would still more realistically express the public welfare impact of an intervention.
Other social indicators include data about informal work, levels of education, leisure time, security and factors reflecting non-economic policy objectives.
Non-market effects can be taken into account based on monetary values other than values based on market price data, valuation of unvalued costs and benefits with monetary values, non-monetary quantification as well as qualitative assessments.
Valuation with monetary values can include inferring implicit prices by observing or surveying consumers or eliciting money values with approaches such as hedonic pricing, compensation tests or willingness-to-pay for non-tradable goods approaches. Social indicator and equity based approaches 13 are used complementarily to market priced based indicators in public welfare analysis to estimate the total impact of an intervention and select the best alternative. While social indicator approaches are often an essential ingredient for public welfare analysis they are concerned mainly with negative externalities.
However, the subject of this article is mainly about positive externalities. Positive externalities, besides education factor in a number of areas: "Domestic economic policy in most industrialized countries recognizes and seeks to promote and implement non-market driven welfare benefits. For example, government policies in areas such as … scientific research, or tax-based support for infant industries, are standard examples of welfare benefits that do not derive from the free market model" (Okediji 2000).
Further examples of positive externalities are those deriving from knowledge, infrastructure, security and natural habitats. Validations of positive external effects can include leisure time, social equality and personal freedoms.
Including externalities in public welfare assessment is not only important as to how those effects are collaterals of market effects or simply co-exist, but of major importance is how they feed back into the market and, specifically for positive externalities, how they provide the "costless" basis for productivity growth.
Copyright law applies to creative works including literary works. Literary works are information goods. Positive externalities derived from information goods provide a major source for such a "costless" basis and growth. Therefore in the following text, the nature of different types of goods including information goods will be explained before showing how positive externalities deriving from information goods can be conceptualised in the context of economic analysis.

Types of goods
In general, a good is something that can satisfy someone's wants or needs and thus has economic utility. There are four main types of goods shown in the two-dimensional classification in Figure 1.
There are two determinants for the nature of the good: excludability and subtractability (sometimes referred to as rivalry). Excludability describes how the use of a good can be controlled and users can be excluded or included. 'Non-excludable' means that if a public good is made available to one consumer, it is effectively made available to everyone (UK Government 2011).
Subtractability indicates to which extent one person's use subtracts from an available good so that another's use is diminished.
Non-excludability can also imply non-rejectability, i.e. people cannot choose whether or not to consume the good.
Both, 'private goods', such as shoes or houses, and 'club goods', such as a movie shown in a cinema or a public park, are excludable. Private goods are excludable and also highly subtractive -one person using a pair of shoes in fact outright prevents another person from using it. "Private goods are consumed rivalrously and thus are naturally scarce" (Frischmann & Lemley 2006). By contrast, club goods have a lower level of substractibility. For example, one person watching a movie in a cinema doesn't prevent another person from watching the movie in the same cinema. To the extent that the provision of the club good does not rely on scarce resources club goods are also less scarce.
Common-pool resources (sometimes referred to as common goods), which can be natural resources such as water, forests and wildlife for example, are subtractive. One person's use of such a common good reduces the benefits available to another. But common goods are also non-excludable, and users might have unrestricted access and compete for a limited supply of these goods. Consequently, overuse can lead to resource depletion -something also described as the "tragedy of the commons".
A public good in the context of economic analyses is defined as non-excludable and non-subtractive.
Public goods (also called pure public goods) are goods which cannot possibly be made excludable without incurring prohibitively high exclusion costs. National defence is a public good e.g. -once it exists it would be difficult to exclude specific individuals within the country from being defended against an air strike. Equally, an additional person being defended within an area wouldn't prevent others in this area from being defended, i.e. the level of substractibility is low.
While public goods often are produced by the public sector (and financed through taxes), the term "public good" does not necessarily refer to the fact that the good is produced by the public sector. The term just describes the intrinsic nature of a good and doesn't explicate who is producing it. A "good" can be a product or service and the term does neither prejudge the nature of the good nor whether or not it is governed by property rights.
Supplying a "service" does not involve transfer of the ownership of the service itself. Services typically are procured for a limited period of time and irreversibly vanish once rendered. Also, services typically are rendered and consumed during the same period of time and each service delivery is unique even if the same service consumer requests the same service again. Therefore, services are rather subtractive and excludable goods.
The term "product" is used inconsistently. It often refers to items which are governed by property rights and traded as such or simply to physical goods which often by nature are more excludable and subtractive. Sometimes the term more generally refers to something which is produced or bundled offerings where property rights or physical goods are combined with services or other entitlements. In this sense the line between what is a products and what is a service cannot always be clearly drawn.

Information goods
Information can be distinguished in private information and public information. The former is destined for a distinct user and has no value per se while public information addresses indistinct users and does have a value per se (KEA & CERNA 2007). "Information goods are pieces of public information that carry specific utilities" (KEA & CERNA 2007).
Economics treats information mostly in one of two contexts, either in the context of how its creation and dissemination and use in general should be organised, or in the context of how availability of information and symmetric thereof determines market transactions. This article treats information in the former context and when the term "information" is used it refers to public information.
Information goods are economic intangibles as they can only be stored and delivered by means of a medium. The medium can be a tangible carrier medium, but the main value of information goods derives from the information they contain and not from their medium. Tangible goods can embody ideas and information, but if they are not meant as the carrier medium for the information they are not information goods. Therefore, reproducing information goods is typically much cheaper compared to reproducing a tangible good.
Information services are not information goods. Examples for information goods are literary works, sold individually or reoccurring as collections (e.g. journal subscriptions), formula, protocols describing methods, images, audio-visual recordings, software, etc. Producing a first copy of an information good involves a relatively high level of fixed cost. However, today information goods are mainly reproduced without a physical carrier medium to which the information has to be tied, but are reproduced and transmitted through computers and computer networks. Therefore, the reproduction costs are effectively zero.
In fact, copies of information goods today are made predominantly by default as part of technological processes. Within an intranet and assuming the Internet keeps its end-to-end principle also in general, distribution costs can be said to be close zero too, because distribution through computer networks is just a matter of encoding and decoding information and sending miniscule bits of data. Processing power and storage capacities of computers as well as bandwidth and compression capabilities can be expected to further rise further driving these trends. Therefore, information goods' marginal costs can be generalized to be zero.
Information goods are also intrinsically not excludable and with digital technologies information can be reproduced without loss of quality. The value of information goods can still decrease with increased use for reasons such as presentation in an utterly inappropriate context, but such cases are rare exceptions and the contrary of value loss can be the case as well: "Unlike land, forage or other types of exhaustible resources, knowledge is not depleted, by use for consumption; datasets are not subject to being 'over-grazed' but instead are likely to be enriched and rendered more accurate as more researchers are allowed to comb through them" (David 2001). Therefore information goods today by and large have a very low level of substractibility.
Consequently, since information goods today can be reproduced and distributed at close-zero costs without degrading information goods effectively are not a scarce good. Due to their non-excludability and low levels of substractibility information goods are intrinsically pure public goods.
The term "global public good" has been coined for public goods which are non-excludable and have low substractibility throughout the world as opposed to just in one State or region. In contrast to national defence for example, information goods are global public goods.
In the context of public goods in general and with a focus on types of ownership the term "commons" is used to describe common-pool resources or public goods depending on the level of subtractability of the good described as a commons. Accordingly, information goods today measured by their intrinsic properties fall in the latter category of type of commons.
While intrinsically information goods are pure public goods in practice they fall somewhere in a spectrum between pure public goods and partially excludable club goods depending on the extent they are made excludable through legal and technical means.
Illustrating the partial excludability of information goods in practice Frischmann & Lemley (2006) with reference to property rights which confer this partial excludability describe information goods' "commons component" (pure public good aspect) and "property component" (club good aspect).
Due to their intrinsic public good properties information goods are a rich source of externalities, which are predominantly positive externalities.
"The main economic characteristic of information … is that it carries strong externalities. This means that its circulation not only provides utility to the one who gets it but also to society as a whole" (KEA & CERNA 2007).
Those positive information externalities for purpose of economic analysis can be framed in two ways -as a subset of knowledge spillovers or as intellectual infrastructure spillovers. In the former context information goods still appear more like self-contained pieces which can be amassed, are content, form corpora, are 'stored', transmitted, etc. In the latter context information appears rather as preintegrated networks dissociated from spatial terminology.

Information externalities
Knowledge and infrastructure externalities are widely acknowledged as major sources for public welfare benefits. To get a better idea about external effects from information goods first knowledge externalities deserve a closer inspection. Similar to information goods, knowledge doesn't fit well in traditional economic theory because it is neither physical and scarce nor excludable. It's a public good.
To the extent wealth from knowledge is not captured in the market system it appears as market externality, i.e. as knowledge spillover. Knowledge spillover is a broader concept than external effects from information goods. Knowledge "includes cognitive categories, rules for the interpretation of information, tacit skills as well as problem-solution and retrieval strategies, which cannot be captured in [information itself]" (Von Engelhardt 2008). For example, the "argument in mathematics: the argument itself is information but only few mathematicians have the knowledge to understand the argument" (Von Engelhardt 2008).
Building on this example Von Engelhardt establishes: "[When using the term knowledge spillovers] we refer to the diffusion of information as well as the learning and knowledge-related effects on a population which are caused by the diffusion: when an argument is published in a scientific journal it has a number of consequences which go beyond the primary effect of the diffusion of the information itself. By studying the argument some mathematicians may refine their own mathematical skills; some scientists may use the insights of the argument for solutions of other problems etc. All these effects of a publication, i.e. the effects on mathematical skills as well as on the diffusion of the information itself, are in the following explanations included in the term knowledge spillovers." Knowledge spillover in this sense is very similar to external effects from information goods -it comprises knowledge-related effects such as learning and other human abilities and activities.
However, the concept of knowledge spillover is broader in that the knowledge related effects can be borne and conveyed not only by information goods such as a book, but by any product or service.
Often ideas and innovations are "embodied" in a product or service which is not an information good. Therefore, externalities specifically deriving from information goods can be seen as a sub-set of knowledge spillovers.
Another way to conceptualise external effects from information goods is to view them as infrastructure externalities. Frischmann & Lemley (2006) describe information and ideas as an infrastructure resource or infrastructure good -"shareable resources capable of being widely used" which are generally sources of substantial positive externalities: "Ideas themselves are a good example of [intangible] infrastructure, because they are not merely passively consumed but frequently are reused for productive purposes." A third way to frame external effects from information goods is to name them straight away "information externalities" or "information spillover". Ramello (2005) for example implies how information externalities through creation and disclosure of information contribute to a "complex dynamic connected with the special attributes of knowledge". In principle information externalities arise from any type of information good including literary works of any sort.
Perhaps in an effort to explain their incommensurability information externalities have also been compared to externalities of environmental damage in the form of waste, for example, and it has been concluded that information externalities can also be negative. The problem with waste though is not that it exists, but that it is eating up natural resources and space. Knowledge spillovers may also eat up attention span and distract, but, on an individual basis as well as on society level, filtering knowledge and, more importantly, integrating information and synthesising and abstracting knowledge is possible with less limits than renewing once burned natural resources. Physical waste is threatening and costly.
Intellectual waste is either irrelevant and can be ignored or is a source of serendipity and innovation.
What for one person is waste can be a good find for another. Hence, while it's justified not to count the turnover of physical waste removal industries or industries of remediation of environmental and accidental damage into an output measure for public welfare, industries for information externality 'waste' management may well be counted in. In whatever way information goods are viewed, directly as the source of information externalities, as intangible infrastructure or as conveyer or constituent of knowledge, information goods are a major source of positive market externalities.
For appraisal of a market intervention such as copyright law which affects production and use of information goods the fact that information externalities exist and are positive and are significant for public welfare leaves the questions to which extent and where those benefits become effective, i.e. whether or not and by whom they get appropriated, and how to measure them.

Efficiency residual and endogenous growth
A weakness of cost-benefit analysis is not only that not all types of costs and benefits and how they would change with an intervention can be explicated or even known, but also the lack of traceability and attributability of costs and benefits. Change in copyright law acting as a market intervention determines the extent information externalities can be leveraged as a source of increased productivity and growth, but the lack of traceability and attributability of those positive effects requires indirect measurements.
In general, national level market data is aggregated from data from smaller economic units such as individual firms. External effects resulting from expenditures and other measures partly spread out into the unknown as well as feed inward into production activities from unknown sources.
As a consequence, positive external effects, most importantly positive knowledge, information and infrastructure externalities, are poorly accounted for in market price-based data. Beyond this external effects do not represent a fixed total value which -if not traceable -at least is appropriated somewhere and they also spill over from and into other countries which adds further difficulties to modelling and measuring.
At least in macroeconomic accounting such measurement problems leads to massive inconsistencies which has spiked the development of new economic growth theories which seek to tackle these measurement difficulties and explain advances in the organisation of production and market external factors with regression based indirect measurement approaches. Such approaches can be used to take into account also effects which can't be grasped with direct measurements. With regression based estimates government interventions taken in the past can be evaluated based on historical data to inform about likely impacts of contemplated future interventions.
Growth theories encompass models for empirical measurement of inputs distinguished in different production factors as well as a production function which represents the total output as a function of those production factors.
Growth models today recognize and take into account the production factors physical capital (incl. natural resources), labour, human capital and knowledge capital with a prime emphasise on the latter two, and they include a residual parameter called total factor productivity (TFP).
The production function can be used to optimise the allocation of resources to the respective factors and the residual parameter expresses change in productivity not attributable to the accounted factor inputs. Simplified the production function could for example look like this: Output = f(P, L, H, K, TFP).
These models are called endogenous growth models as the virtuous cycles building on knowledge, infrastructure, technology, ingenuity, etc. and leading to innovation and productivity growth are not seen as exogenous to the economic system (which would implicate that investment in knowledge e.g. has no predictable effect on output), but are integral to the model. The issue advancing economies face is that you can work harder and save more money only so much and spending money for labour and natural resources doesn't change the fact that those resources are increasingly exhausted. Ultimately, what becomes most important is to innovate and increase productivity.
Endogenous growth models "recognize that some part of innovation is, in fact, a form of capital … Increments of knowledge are put on an equal footing with all other forms of investment, and therefore the rate of innovation is endogenous to the model. …What is new in endogenous growth theory is the assumption that the marginal product of (generalized) capital is constant -not diminishing. … The TFP residual captures changes in the amount of output that can be produced by a given quantity of inputs. … To the extent that productivity is affected by innovation, it is the costless part of technical change that it captures" (Hulten 2001).
TFP, therefore, is not a direct measurement of increase of productivity, but is inferred with regression analysis and represents the inclusion in the economic system model of market external effects which 20 are otherwise not taken into account but are nonetheless important, most notably for the explanation of productivity growth on national level.
The effects determining TFP "are lumped together as a "left-over" factor (hence the name "residual"). … this is the source of the famous epithet, 'a measure of our ignorance'" (Hulten 2001).
Endogenous growth models represent recognition of positive externalities' impact on growth and enable economists to pursue questions concerning the magnitude of this impact.
Beside the production function also a cost function approach can be used to determine TFP. The cost function is a dual to the production function basically addressing the same optimisation problem and can also be used to distinguish increased productivity attributable to accounted inputs from increased productivity arisen through unknown sources.
In any case, for the production function and cost function approaches used on a national level an increase of TFP indicates that the society gets more produced with the same or less resource inputs.
Data used for input and output can be purely market price-based, but also include data estimates on externalities. Also non-monetary values can be modelled in. Depending on the value types used, increased TFP also indicates an increase in public welfare -more social benefits are achieved with the same or less social costs incurred.
As with cost-benefit analysis, TFP and its regression based methods can be used not just on a national level, but also for smaller economic units such as industries and individual firms as well as on supranational level.

Determinants of TFP
While the TFP residual represents otherwise unaccounted external effects in the production function due to measurement constraints it is possible to identify some categories of the most important determinants for TFP. Knowledge is the key determinant.
The most important source of knowledge is considered to be R&D, but other knowledge producing activities create knowledge as well. Other sources of new knowledge are a high level of education and the experience and concentration of producers of new knowledge such as scientists and engineers.
Knowledge has increased productivity through automation of production processes and other ways of improving organisational forms. In this sense, TFP also reflects increased capabilities and use of technology including information technology.
The positive impact of knowledge on productivity in principle applies to all industries and countries where the knowledge is available, but the fact that information, knowledge and technology is available alone does not necessarily lead to increased productivity. It also needs to be utilized and applied to actually realize existing productivity growth potential. The extent to which this potential is realized determines TFP as well.
One aspect of this is peoples' ability to 'absorb' knowledge. This ability is higher when more knowledge is already absorbed, i.e. if the level of education is higher, but also depends on health, wealth and other life conditions. Another important determinant of TFP is physical infrastructure as well as infrastructure and standardisation in general. The level of market competition, structure of corporate formations, international trade openness and other market conditions affect TFP as well. While TFP is an expression of productivity, in turn, innovation is underpinning "productivity gains that drive economic growth and social welfare" (McDonald et al. 2012).
Specialization enhances "by improving the total sum of knowledge. Ideas combine to create new ideas and this process is self-generating and self-feeding in a dynamic way" (Isaksson 2007). Innovation is induced through market transactions, but also to a significant extent through real externalities which are reflected in TFP. TFP growth can be used as an index of innovative output (Branstetter 1998).
TFP reflects capabilities, use and adoption of information technology, but also other conditions on how information can be used such as legal conditions have an impact on productivity.
Related to this and in general the regulatory and legislative framework factors in alongside institutional and organizational forms as well as norms and societal attitudes.
In sum, productivity growth heavily builds on innovation which is often induced through real knowledge externalities.
How much innovation takes place depends on how innovation-friendly conditions are in a society. Innovation policy and innovation objectives feeding into other regulatory and legislative measures such as technology policy, science policy and intellectual property determine how innovation-friendly conditions are and therefore determine TFP.

Information externalities as non-pecuniary externalities
For analysis of effects a change in copyright law would have on national level productivity and public welfare the characteristics of externalities deriving from creative works as information goods need to be considered.
Knowledge spillovers in general and information externalities in particular are mainly positive and they are mainly real externalities, also called non-pecuniary or technological externalities.
Non-pecuniary externalities represent the social value of goods beyond the private value realized by market participants through market transactions. They represent "benefits (or costs) realized by … parties [which are] … agents who are not participating in the relevant market and thus have not transacted with the provider of the benefits or costs" (Frischmann & Lemley 2006).
With the macroeconomic perspective focussing on the national level overall effect of externalities the term "third party" when used to describe those agents denotes affected parties not only outside of a 22 transaction but more generally parties which are not affected as participants in a relevant market by operating on the price mechanism.
The macroeconomic perspective focuses on the characteristic of non-pecuniary externalities to associate "gains and losses with changes in underlying outputs in the economy" (Dnes 2011) as opposed to pecuniary externalities which redistribute wealth without changing total outputs in the economy.
For a market transaction between one market and its downstream market the existence of nonpecuniary externalities means that there are not just benefits (or costs) transferred into the downstream market, but also benefits (or costs) are 'leaking' out of the system of relevant markets altogether.
In the same vein, inter-market and intra-market substitution (e.g. as a result of government intervention) needs to be viewed with third parties in mind.
While an entirely pecuniary externalities-based substitution effect would mean that decreased output for A is offset with increased output by B and the negative pecuniary externality for B clears the positive pecuniary externality for A with no effect on total output, a substitution effect may entail nonpecuniary externalities and asymmetry between negative and positive externalities. For example, as a result of an intervention a negative pecuniary externality (B's loss) may be associated with positive pecuniary externality (A's gain) as well as additional positive non-pecuniary externality (third party's gain). In this case, the net value of the substitution represents an increase in total output, i.e. there are social benefits and the net effect on public welfare is positive.
While externalities deriving from knowledge embodied in a product or service may be appropriated and traceable in the market, a large proportion of information externalities cannot be detected in any market: "to the extent a particular innovation is embodied in a product or service, its social product is computable in principle. How it actually will show up in our national product accounts will depend on the competitive structure of the industry and the ingenuity and energy of the "price" reporting agencies. In principle, a complete hedonic calculation would produce the right prices in the right industry and would allow us to attribute productivity growth where it actually occurred. Its influence in downstream industries could then be viewed as just another response to declining real factor prices, a pecuniary externality … The more difficult to measure and the possibly more interesting and pervasive aspect of … [knowledge] externalities is the impact of the discovered ideas or compounds on the productivity of the research endeavours of others. This is a non-pecuniary externality which is not embodied in a particular service or product, though it might be conveyed by a printed article or a news release" (Griliches 1992).
Information externalities as they are mainly non-pecuniary knowledge externalities are not only difficult to trace within markets, but are at the same time essential for innovation and nation-level productivity growth. Just with pecuniary knowledge spillovers "innovation occurs in the upstream sector, but the benefits spill over downstream. There is no presumption, though, that this financial windfall will create further innovation … At most, it may allow firms in the downstream sector to move along … It is not the stuff of which endogenous growth is made. Without further innovation in the upstream sector … there will be no further growth. … this kind of spillover can be contrasted with a disembodied knowledge spillover … In this case, … knowledge … becomes part of a general pool of knowledge -the "state of the art". Subsequent innovators are able to build upon this foundation of general knowledge, using it as a complement to their own R&D activities. With these kinds of spillovers, innovations tend to beget subsequent innovations … It is ultimately this kind of spillover that can produce both endogenous growth and … comparative advantage among nations. … At the firm level, the most intense knowledge spillovers may be those which take place between direct competitors who buy nothing from one another" (Branstetter 1998).
The fact that information externalities play an essential role for innovation and endogenous growth is consistent with the fact that positive non-pecuniary externalities represent social benefits and suggests that more non-pecuniary information externalities leads to more total output and public welfare.
This thesis may be supported by viewing information as intangible infrastructure which might reveal that the potential to increase public welfare for information externalities is less limited than for spillovers from physical infrastructure due to the fact that not only the spillover has public good characteristics, but information goods themselves are pure public goods. As pure public goods with a much lower level of subtractability than physical infrastructure information goods can be used by a much larger number of people without degrading than this is the case for physical infrastructure irrespective of who produces the infrastructure.
The difference in level of substractability and consequent difference in potential and limitation to increase public welfare between spillovers from information goods as intangible infrastructure and spillovers from physical infrastructure (and difference in implications for measurement) prevails even if the infrastructure is produced by the public sector: "There are other public goods which raise somewhat similar measurement problems [as those for nonpecuniary knowledge spillovers]; the provision of roads to the motor transport industry, of airports and flight controllers to the airlines, and of security services to private businesses. All of these have certain aspects of increasing returns to them but are also subject eventually to congestion in use and hence reasonable pricing schemes are feasible in principle" (Griliches 1992).
In this sense, the 'leaking', 'spilling' and transfer of wealth, at least for non-pecuniary information externalities, is not like spilling a portion of some constant total amount from one pot into another, but information is rather like a matrix for human thought without a predestined fixed value of what those thoughts might produce.
From the motivations of shifting to endogenous growth theory it might be rather assumed that more non-pecuniary information externalities -although not without interplay with other essential factors such as time available to information users and institutions in place -leads to more total output and public welfare.

Information externalities measurement
Measuring information externalities' value directly on a national level (i.e. not through regression analysis) is problematic in many ways. For example, the social value from an information good is defined by its "productive" use, but "productive" use is often deemed to occur only explicitly and only in corporate settings. Another issue is how capacity limits are accredited to the information itself instead of to its carrier medium. Furthermore, it is challenging to identify measurable bases for estimating the value of information flows as well as models to infer value from those base measurement categories -volume of bits being one proposed measurement category (see e.g. Dienes 2011). Consequently, concluding information externalities' impact on productivity cannot be achieved through direct measurements, but should be measured indirectly through their impact on production activities as a knowledge externality. How to measure knowledge externality and its impact on productivity is mainly discussed in the context of R&D knowledge production: The "productivity research community has been interested in the causes, effects, and implications of R&D spillovers for a long time. … this topic and its fundamental importance with regard to economic growth in the long run was introduced to a larger audience within economics through the "endogenous growth" … literatures" (Branstetter 1998).
The potential of information externalities in particular for productivity, economic and public welfare growth may be lower than for knowledge spillover in general in the R&D context. On the other side its impact might be higher due to the pure public good characteristics of information goods. In any case, the approaches for measuring knowledge spillovers in the R&D context can be applied also for measuring information externalities.
The fact that information externalities exist and are mainly non-pecuniary externalities implies two aspects of flows of value through information externalities. First, trading information goods leads to wealth 'leaking' out of the information goods market and its downstream markets into the unknown.
Secondly, wealth deriving from information externalities feeds back inward from the unknown in production activities of potentially any 'third' party, including firms and individuals, not based on those parties participating in the information goods market and its downstream market -the "demand side of the social benefits of spillovers" (Frischmann & Lemley 2006).
Information externalities can affect those parties positively just by otherwise interacting with other firms and individuals, by observing the market, consciously or unintendedly adopting new processes and picking up information and innovative ideas from others. These inward feeding information externalities are part of the 'costless' input 'factor' for producers leading to overall society-level productivity growth. As it is the case for any non-pecuniary externality this input cannot be accredited to any specific investment or source. Therefore, regression based models are used to measure the impact deriving from such unknown sources of productivity growth. Griliches (1992) explains how such unknown sources as "R&D spillovers" can be taken into account when measuring impacts of R&D spending: 25 "There are basically two types of estimates to be found in the literature: estimates of social returns to a particular well identified innovation or a class of innovations whose effects are limited to a particular industry or sector and can be measured there; and regression based estimates of overall returns to a particular stream of "outside" R&D expenditures, outside the firm or sector in question. … [For the regression approach] measures of output or TFP … across firms or industries, are related to measures of R&D "capital"… A subset of such studies includes also measures of … "borrowable" R&D capital in an attempt to estimate the contribution of spillovers to the growth in productivity." The inclusion of a measure for "borrowable" capital in the production function equation represents the recognition and inclusion of unknown and otherwise unaccounted input sources. It also represents the mathematical tool to express with a specific value the effect those unknown sources have on output. It is not an absolute value, but a coefficient value gained from data through regression analysis. Its validity depends on measurement concept and interpretation -it can be used to interpret the relationship between knowledge investment from unknown sources and TFP and thereby informs about the magnitude of impact of knowledge externalities on productivity growth. Branstetter (1998) shows similar approaches for national level measurements. One such approach measures the rate of innovation, taking into account "a stock of general knowledge … which is presumed to be costlessly accessible to all innovators" as an inward feeding input from unknown source as well as 'leaking' outward. "Each innovation yields both a new product design (whose benefits the innovator can appropriate) and a unit addition to the stock of general knowledge [i.e.
social benefit]. Over time, this foundation of knowledge grows, and this allows more innovation to occur without an increase in its resource cost. Thus, knowledge spillovers serve as "engines of endogenous growth", allowing economic growth to proceed indefinitely without diminishing returns setting in" (Branstetter 1998).
While heterogeneity of producers, data aggregation and fluctuations in output influenced by other factors than innovation poses measurement challenges, specially constructed production functions or cost functions with spillover variables similar to the 'borrowable capital' measure can be used to measure the impact on productivity of intra-national (country internal) R&D spillovers as well as R&D spillovers feeding in from other countries.
Another way to show the significance of knowledge spillovers is to calculate a social rate of return on R&D investment. "Social" in this context means to select firms or industries other than the investing firm where the investment yielded fruit, presumably all firms or industries which are deemed to be affected. The social rate of return, or conversion rate, is an expression of, in this sense, overall impact of the investment. By contrast a private rate of return is the impact on the investing firm only.
The significance of R&D spillovers has been shown in numerous studies. Concerning the 'leaking' aspect it has been shown (see e.g. Griliches 1992, Frischmann & Lemley 2006) that the social rate of return of an investment is predominantly and significantly larger than the private rate of return. Griliches (1992), among others, concludes that the 'feeding in' aspect and knowledge spillovers from R&D investment are both prevalent and important, and, potentially, a major source of endogenous growth (Griliches 1992).
Depending on the private rate of return and productivity growth a firm can achieve through its own R&D investment and without its own R&D investment (yielding fruit of the stock of general knowledge) the firm may conclude not to invest in R&D or to invest in R&D on a higher up economic entity level. The firm could invest on industry level with a pre-competitive research budget brought together by several firms of that industry -for entirely economic reasons, just acknowledging the reality of how knowledge spreads and yields fruit. In fact, investment from public funds in "basic research" as opposed to more specialized firm financed R&D can be seen as partially driven by such rationale.
However, the aim of this section is not to explain how R&D investments should be made or measured, but to give an idea of how information externalities in general can be measured as a subset of knowledge spillovers on a national level. Methods for measuring information externalities are essential for public welfare analysis in the context of changes to copyright, because measuring externalities is an essential part of public welfare analysis. A government intervention of copyright alteration would be an intervention affecting information goods production and use on a national level and how information externalities from those goods contribute to national public welfare.
Certainly, many investment projects and other interventions such as publicly funded education or infrastructure projects are sources of increasing social returns and are evaluated and appraised on such a scale, but undoubtedly, many questions remain how such a measurement could be conducted for a change in copyright. For example, information goods are frequently reused for productive purposes (Frischmann & Lemley 2006) and potentially any literate individual in a country, buying information goods or not, appears as a potential producer. How does heterogeneity of producers play in then?
Related to this, what is the influence of concentration of knowledge in cities, cluster, etc. and "proximity"?
How is wealth derived from information defined? Is it actual use or the extent to which it is possible to use information with wealth, assumed to arise from the serendipity thereof and the generativity of the system as a whole?
Related to this, do negative information externalities exist, how are results possibly distorted through depreciation of other negative externalities and how does absorptive capacity factor in? How relevant is the interplay of information with other market goods and externalities and how do network externalities from the information itself and also from information technology play in? Human capital is often excluded or it is subsumed in the knowledge capital factor beside R&D expenditure and costs for research papers, patents, licenses, etc. So, what is the measurement concept, what type of residual parameter should be used and which input factors and cost and benefit value types should enter the production function to begin with (affecting what is reflected in the residual parameter)? What are appropriate value types to measure innovation outputs? What other factors could be a disincentive to actually use information for productive purposes? How are inter-temporal spillovers valuated? How can data be accumulated on a national level or on supranational level, e.g. in the economic area of the EU?
Whatever the approach, the fact stands that if otherwise unaccounted external effects not related to a particular intervention are assumed constant and TFP or another type of residual parameter used increases as a result of the intervention then the net effect of the intervention is positive and would be expressed as such in the residual parameter. "[P]roductivity at the aggregate level will increase if productivity in each constituent industry rises … (and so on, down the aggregation hierarchy)" (Hulten 2001), and productivity growth means increase in public welfare. TFP or a similar residual parameter specifically designed for measuring information externalities seems to be a useful tool to measure national and EU level effects on productivity growth as a result of a contemplated market intervention affecting information goods production and use such as changing copyright legislation.

Complementary market effects
Complementary markets (also called aftermarkets) are markets producing goods which are complementary to the goods produced in the primarily affected market, but are neither upstream nor downstream markets.
Complimentary markets nonetheless often concern "downstream uses" and their conditions and benefits, but those uses are irrespective of the commercial relationship between the user of a good and the primarily affected market and therefore should not be confused with "downstream markets"which imply a commercial relationship of the user with the primarily affected market -and are distinct from complimentary markets -which may imply a commercial relationship with the user, but not with the primarily affected market.
They rather represent their own value system, but may share some customers with the primarily affected market. Due to the complementarity of the goods produced and customer overlap integration of businesses of both markets in one company has some attraction in certain constellations.
Complementary goods are for example a printer and its cartridge or DVDs and DVD players. Markets for hardware and software goods which facilitate use and production of information goods in turn are complimentary to the information good market.
In the wider sense, there could be pecuniary externalities in the case of inter-market substitution, but there are no pecuniary externalities from an upstream-downstream market relationship and there is normally no wealth transfer.
On the other side, due to the complementarity of the goods produced in the two markets there is some correlation between the market activity in the primarily affected market and the complementary market, but this correlation can have different forms.
For example, if an intervention primarily affects the information goods market and subsequently its downstream markets and production and use of information goods in general it can also affect complimentary information technology markets.
More production and use of knowledge in general leads to economic growth which is also constituted by increased economic activity in research and educational sectors as well as information technology industries.
Increased economic activity in a market can mean increased output from existing products and services as well as providing additional products and services. A market intervention such as changes to copyright leading to more information production and more use might induce increased economic activity in information technology markets which can facilitate information producers as well as information users in markets downstream to the primarily affected information goods market.
Increased information use without increased production of information goods, for instance due to an increase in positive non-pecuniary externalities deriving from the information goods market, might generate additional economic activities in complementary information technology markets from facilitating information users. Facilitation of users can include users which are also downstream market players but the additional products and services or increased output from existing products and services based on increased facilitation potential based on increased non-pecuniary externalities occurs irrespective of the user's role as downstream market player. Just the development of information technology can lead to increased economic activity in those markets irrespective of developments in the information goods market.
If information goods markets and information technology markets are both affected by the same intervention, although there is no upstream-downstream relationship, effects in both markets can be seen as an inter-market substitution. While reallocation of budget as well as a shift in distribution of gains between both markets can be seen as the pecuniary aspect of the substitution effect, additional economic activity in the information technology market due to increased information use and increased non-pecuniary externalities derived from the information markets appears by itself as a nonpecuniary externality to the information markets. The substitution effect is a combination of pecuniary substitution with no effect on total output and overall wealth and yet additional non-pecuniary externality in the form of increased economic activity of the information technology market which represents increased total output and overall wealth.
Estimating an inter-market substitution net effect of this sort in an intervention appraisal would not only consider pecuniary substitution but also weigh the changes in output in the information markets against changes to non-pecuniary externalities including those from increased output in the information technology market.
Although non-pecuniary externalities are external effects in the case of complimentary market effects they may be partly detectable in market data of the complimentary market. The interpretation of complimentary market data as a non-pecuniary external effect of an intervention requires additional 29 information about the proportion of business activity from information technology products and services related to the overall business activities of firms operating in the information technology market.
As a firm can produce goods from within businesses across different markets, businesses of a complementary market can be run by firms which also operate in a downstream market or in the primarily affected market.
For an intervention appraisal it might become difficult to get market data broken down on business level as firms may not be required to report results on a business unit level. Consequently, with market price data alone it may prove difficult to measure intervention impacts in complementary markets as distinct from impacts in downstream markets or the primarily affected market.
Estimating the proportions of business unit contribution to overall profit can help to tackle this measurement problem. Also non-market data and qualitative data about the businesses each company runs can be drawn in to get a better picture of the distinct effects in the primarily affected market, downstream market and complementary market.
Information technology markets contribute to an overall increase of output in the form of their own economic activity, but also facilitate information use in that they facilitate productivity growth. Such effects at least could not be detected in the information technology market data, but would be reflected -alongside other unaccounted effects deriving from intervention impact on the complimentary information markets -in the TFP or similar residual parameter when measuring the total impact of a contemplated intervention.

Information and network externalities
Network externalities are mainly positive externalities and are based on network effects, which means that the value of a network grows exponentially with the number of nodes connected to the network.
Consequently, with an increasing number of network users the products or services constituting the network gain value and each user of that product or service increases utility.
"Positive network externalities of consumption are a kind of demand side network economics of scale" (Poon 2012).
Network effects in one market can indirectly also increase the value of a complementary market which in turn can increase the value of the market with the network effect. If two complementary markets are both markets with network effects they not only can increase each other's value mutually, but also network effects can multiply.
It follows that markets for goods with network externalities have distinct features with regard to competition and market structure. The value of infrastructure and to which extent it is a source of network externalities also depends on technical standards and interoperability -the more interoperable a network or infrastructure is the more value, additional network externalities and innovation it generates.

30
Standards can include web standards, format and data standards and metadata standards, but in a wider sense also standardised legal terms and legislation can be seen as standards increasing interoperability.
The most prominent example for a network good with high level of interoperability, externalities and innovation potential is the Internet itself, but telecommunication infrastructure and often software applications are a source of network externalities. The value of scientific journals, as a perpetual publication platform, depends on quantity and prestige of authors using the platform and can involve network effects. Different layers of software are also typical examples for complimentary markets with inter-market network effects. The concept of network effects and network externalities can also be applied to networks of knowledge and information.
In the context of information goods, network externalities can derive from information use as well as from information technology. In the former sense the information itself is the network infrastructure and in the latter sense the information technology building the network or building on it represents the network infrastructure.
Information goods' network externalities are implicit in the concept of intangible infrastructure which Frischmann & Lemley (2006) use to explain the economic characteristic of information goods to possess significant positive externalities. Also Ramello (2002), among others, traces back developments in the information good market to network externalities "on the demand side".
Information good network externalities are enhanced through increased technical integration of information goods, through digitisation and linking together of literary works through computer networks. This implies an unprecedented potential value gain from integrating and linking information with decreasing granularity and using corpora of large numbers of individual literary works in integrated and federated ways.
While the Internet itself exists as information technology and infrastructure with massive positive externalities more specifically relevant for information goods are information technologies, which not only enable distribution and linking of documents but also processing of information. Markets for this latter type of information technology are a complementary market to the information good market.
Processing information often in one form or the other integrates information leading to information network effects which go beyond information externalities deriving from information goods as selfcontained pieces of information. Those extra positive externalities can be assumed to additionally increase productivity on every level including the national level.
To show how such productivity increase through use of information technology occurs and how it can be measured, a firm-level change seems most suitable. The following example specifically illustrates the full impact of such effect in that it boils down the information goods market of information producer and information user (as a producer in a downstream market) to one party.

Poon (2012) uses a production function to show the impact on a firm of adoption of an Enterprise
Resource Management (ERM) system. This type of information processing information technology is used to organise, categorise, produce and exchange information goods within the firm as well as among suppliers and customers of the firm to optimise its workflows and resource allocation. Using the ERM system adds value deriving from information network externalities to the existing information externalities through automation of processing and integration of information goods as well as standardisation. This value is added just by one firm adopting the system and increases with an increasing number of organizations adopting the system.
With the production function which takes account of the costs for the ERM system as an input factor and by keeping other variables constant a productivity increase resulting from information network externalities can be shown. The investment in the ERM system enables the firm to produce a higher output per unit of raw materials input in particular due to its capabilities to leverage information network effects through integration of information goods.
Focusing on the inter-firm adoption of such a system it can be assumed that some of the increase of productivity is promoted by the information technology network effects of the system itself. This example shows how increasing positive information externalities by adding information network externalities through use of information technology can lead to increased productivity.
The productivity growth here does not come solely from unknown sources because the firm itself is the producer, but the extent to which information network externalities add value is unknown and productivity growth can be expressed with a residual parameter in the production function.
Besides the productivity growth of the firm which bought the ERM system the information technology vendor which sold the system increased profits and this impact of the project to adopt the system could be measured in the technology vendor market. There might be also non-pecuniary externalities as a result of the system adoption. At least theoretically any positive effects not captured in the firm's or the technology vendor's accounting could be measured with similar methods as commonly used for R&D impact, but the exercise with the example is not to show how to specifically measure network externalities, but to show that they exist and that they need to be taken into consideration when changing the market-setting for information production and use such as changing the setting by changing copyright law.
Both changes to information network effects as well as information technology network effects, as a result of an intervention concerning information production and use on productivity can be detected on firm level, industry level or national level using regression based approaches. The additional benefits from such externalities are captured by the user of the information (which in the example here is largely also the producer of the information) or leaked out as non-pecuniary externalities. Increased economic activity in the information technology market is a market effect complementary to the information goods market. 32

International trade
Not only are domestic markets and market externalities deriving from those markets with national borders relevant for national level public welfare analysis but also how goods are traded between countries and how market external effects spill over national borders in both directions. Externalities leak out of the national economic system into other countries and externalities feed into production processes of potentially any person or firm or other organisation within the national economic system.
The key indicators for assessing the impact of a contemplated market intervention which affects the country as a whole are market price data on national performance as well as on imports and exports, social indicators which take into account more realistically market external effects in general and regression based methods to estimate the impact of an intervention on national level productivity using a residual parameter and taking into account externalities which are not accounted for otherwise. The focus is on those costs and benefits affecting the residents of the country where and for which the assessment is conducted.
The primary aim is still to maximise the national public welfare and not the public welfare of other countries. Therefore, the intervention option which leads to the most positive public welfare contribution on a national level would be selected.
International market price-based effects are represented by imports and exports. Exports are reflected in increased producer welfare in the exporting country while the consumer welfare -where pecuniary externalities are significant -accrues in the county to which the goods are exported. Along the same lines, imports are reflected in increased consumer welfare in the importing country while the producers benefit in the county from which the goods are imported.
Through foreign affairs and international agreements, countries seek to improve export conditions for businesses residing in that county as well as to improve their residents' private welfare by policies leading to lower prices for imported goods. Countries also seek to avoid negative market external effects spilling over into their territories or otherwise affecting their residents.
The overall balance of trade in a country is a record of all economic transactions between those in the country and those in other countries. It can be split off for different types of goods, for example splitting out a separate balance for physical goods and one for immaterial goods and services. The immaterial goods and service balance takes record of the value of a whole range of immaterial goods such as tourism, transport services, financial service and patents and licences e.g. Export and import is not meant as shipping something over national borders, but refers for example to people living in one country paying for services in other countries which would count as an import.
For example, a tourist paying for a tour guide abroad would count as an import of this service although it is rendered abroad. At the same time being abroad is not a necessary condition to consume a service rendered abroad. For instance, a service operated in another country and consumed via the Internet would also count as an import.

33
The overall balance of trade as well as its part balances are used to get a picture of the quantities and of what types of goods a country exports and imports. If the value of the goods and services sold by companies in one country to other countries is more than the value of all goods bought from other countries then the trade balance is positive and the country is a net exporter. To the extent that public welfare is measured by indicators such as GDP the value of this net export may be interpreted as public welfare benefit. Alternatively or complementary other indicators and assessment models can be used.

International spillovers
Also, externalities are first and foremost estimated with the focus on those costs and benefits affecting the residents of the country for which the assessment is conducted. For example, negative externalities affecting the country's residents rooted in environmental damage in a neighbouring country are weighed in, but negative externalities spreading out of the country would be considered in other ways but not weighed in the assessment. In other words, if an intervention would lead to changes across countries which represent an increase in the combined public welfare of all countries affected, but as a consequence of negative externalities spilling over from the other countries into the country where the intervention is contemplated, leading effectively to a decrease in national public welfare, then the intervention measure would not be worth pursuing.
Similarly, if positive externality spills over into another county at the cost of the national economy the spillover is not considered a benefit weighing in against the cost at national level, because the spillover is not to the benefit of the residents of that country.
If the external effects crossing national borders become predominant international agreements may yet be a better option than pursuing an intervention option just based on weighing the effects on national residents: "The failures of unilateralism often suggest that the problem at issue is closely linked to externalities created by political, economic or other forms of interdependence. In other words, when behavior by one State negatively impacts the welfare of another, the potential for conflict increases … When the benefits of mutual cooperation outweigh the privileges associated with sovereign discretion … States will generally seek, and respond favorably to multilateral solutions. Thus, international agreements typically offer benefits that in the aggregate are considered welfare-improving for members over the alternative of an unregulated domain" (Hugenholtz & Okediji 2008).
However, the public welfare rationale and how costs and benefits and interests of different stakeholders are weighed against each other is continued on the international stage. In fact, the international market external impacts of an intervention are still estimated where it is feasible and reasonable, and data about those effects together with the data from the national public welfare assessment are also the basis for international trade agreements. Therefore, national public welfare assessment is still the first call for many types of contemplated interventions.

34
Externalities in general gain recognition in public welfare assessments. Assessments of market intervention concerning information goods production and use and the positive externalities of knowledge spilling into the country for which public welfare is assessed becomes increasingly important. Some economists extend the earlier introduced R&D spillover measurement studies to world level and propose that knowledge spillovers in general are the key to sustained economic growth: "The more flows of knowledge within and across national borders via Internet, the more knowledge stock is enhanced and the more productivity-enhancing new knowledge-based services it can develop, permitting sustained endogenous economic growth in a virtuous spiral" (Yoon 2004).
Paul Romer in 1994 in "The Origins of Endogenous Growth" in the context of international knowledge transfer asked the famous question: "In a developing country like the Philippines, what are the best institutional arrangements for gaining access to the knowledge that already exists in the rest of the world?". Today, with information technology advancing, information externalities gaining importance as a source of growth and other factors of growth further depleting, the famous question can be asked for developed countries as well, with copyright laws being part of the institutional arrangements.

Copyright's public welfare objective
Copyright is legislation applying to information goods. Historically, copyright laws were justified by natural law theories as well as utilitarian theory. Natural law theories justify copyright laws as a means to enable authors to monetize their work as a result of their labour and to protect the authors' personality embodied in the work. Utilitarian theory holds up the objective of promotion of public welfare as a justification. Utilitarian theory based justification is mostly associated with US copyright law where the advancement of public welfare as the primary goal is deep-seated in the Constitution, copyright legislation and case law.
But also the legal tradition of the common law countries and EU copyright law explicitly recognizes the goal that the grant of rights over the fruits of creative enterprise "is directed first and foremost at the promotion of the public interest" as their foundational element (Hugenholtz & Okediji 2008).
Natural law theory is reflected in countries in Continental Europe more than in the US or common law countries, but the overriding public welfare objective at EU level is highly visible nonetheless. Besides the fact that it is common practice to apply welfare economics to assess EU policy measures, Koelman (2004) specifically referring to the EU copyright law, points out that its recitals are "reminiscent of the Anglo-Saxon perspective on copyright. The author has to be able to recoup his investments, because that is to the advantage of all (i.e. of social welfare)." Previously, the EU's leitmotif were the "Four Freedoms" (free movement of goods, people, services and capitals). In 2008 it added a "Fifth Freedom" which, among other goals, aims to remove barriers to the free movement of knowledge in the EU and constitutes a principal welfare objective (Reichman 35 & Okediji 2009). Public welfare objectives of copyright laws are also recognized by international agreements on copyright such as the TRIPS agreement of which EU member states are signatories.
Beyond this there are technical and economic reasons why governments, including the EU, resort to interest balancing -which effectively is a public welfare approach. One technical reason is that due to the public good and zero marginal cost properties of information goods and because copying occurs by default mainly for technical reasons, natural law theories as the basis for a working copyright system has been rendered unworkable.
An economic reason particularly in developed countries is that knowledge and information has become of paramount importance as a driver for economic growth and national competitiveness and is an essential factor in international economic policies. Therefore, copyright laws are increasingly viewed through the lenses of wider economic and innovation policy goals with increased focus on welfare economics and the copyright's social welfare function.
It could be added that due to the omnipresence of the copying of information, consumer law, competition law and fundamental rights objectives gain recognition as well and increasingly need to be considered when assessing copyright.
As a result of those developments, copyright has moved to the centre stage of government policyshifting the focus in justification of copyright laws further away from just enabling remuneration of rights holders toward public welfare goals. Remuneration of copyright holders and copyright protection are a means to the end and must give way if exclusive control over works would decrease public welfare. Copyright legislation today is market intervention with the overriding objective of maximising public welfare.

What is text mining
Often, text mining (or text analytics) involves the copying of copyrighted literary works.
Historically, the discipline of text analytics started to develop in the government intelligence sector in the 1940s as elaborated methods to, in the beginning manually with pen and paper, systematically identify distinct elements of text across larger numbers of text documents such as transcripts and news articles. The methods involved identifying those elements based on certain criteria, interpret those parts along certain criteria, weigh them according to scaling systems and draw up reports based on weight counts and other statistical methods. From such reports conclusions could be drawn confirming or defeating a thesis or suspicion or serve project aims in other ways.
This process became increasingly automated leading to distinct research methods called computational text analytics or text mining which can be subsumed under computational research.
Text mining is a type of human language technology. Human language technologies include technologies for speech-recognition, speech-text-speech conversion, transcription, translation, sub-36 titling, information retrieval, search, information extraction, annotation, text categorisation, summarisation and language modelling, among others. The common theme of those technologies is that they are sophisticated information technologies, essentially, of automation of processing of human language.
Text mining is a type of human language technology specifically applied to natural language text documents. It is sometimes subsumed under a broader category of "text and data mining" (TDM) which also includes data mining applied to fully structured data such as empirical alphanumeric data sets captured in a table or generic database. In the context of some areas of research, this broader category reflects the increasing importance of data sets as a means to communicate, integrate and leverage research results.
The term "text mining" is specifically used for natural language texts. Sometimes it is used for relatively simple processing types such as cross-corpora automated Boolean queries or word frequency analysis, but typically, at least in the context of research text mining, it refers to more complex linguistic analysis and knowledge representation modelling building on a set of information processing technologies, most importantly natural language processing and named-entity recognition and extraction, data mining and semantic annotation. A typical text mining process can comprise different stages which can be combined into a single workflow as shown in Figure 2. Text mining thereby involves identifying elements and grammatical structure in text documents such as terms, entities, facts, assertions and other linguistic patterns. It seeks to build a machine-readable knowledge representation of the texts mostly based on predefined models comprising concepts and relationship types.
For this, text mining typically involves automatically extracting structured information from the natural language documents based on natural language processing and thereby turning 'unstructured' 37 documents (unstructured with reference to the level of granularity of structure used for machine processing) into machine-'readable' documents.
It thereby tries to capture the meaning of the expression and can deal with phenomena such as synonyms, acronyms, dialects and slang which represent a challenge for automated language processing. Vocabularies of different granularity and structure, including ontologies, can be used as the predefined model but can also be derived and improved by means of text mining (and then be used as the predefined model for other text mining projects).
Text mining technologies basically turn the natural language documents into a database. It generates a formal representation, although, of course, not all meaning is represented and made machine-readable, but only those elements of the original expression in the text documents which are captured in the extracted structured data.
Annotation is used throughout the later steps of the process but can also be used for semantic annotation according to a widely used annotation system, enabling linking and integration of text documents on granularity levels down to word level (i.e. concept level) across large numbers of documents and corpora of documents. In this sense text mining is a semantic technology and contributes to what is called Linked Data. If the linking and integration is aimed to take place on an Internet scale, text mining in this sense is also contributing to, and arguably is an essential part of, global scale Linked Data and the Semantic Web and its applications.
Based on the formal representation of text documents, a number of different applications can build on text mining including visualisation applications. Hence, the arrangement of the text mining process and the applications realized with it represent a range of applications varying by type of content textmined and the type of text mining user.
The term "research text mining" for purpose of this article is defined as the use of text mining in scientific research or as 'scientific-purpose' text mining. Research text mining in the humanities may build on historic and fiction literature while a legal researcher could focus on patents and a researcher in biomedicine might focus on existing biomedical scholarly literature. Often part of the text mining process includes copying of natural language documents which are subject to copyright laws -as shown in The rendering of a text mining service can be delivered as a standard service via the Internet, e.g. with a standard web interface, irrespective of location of the user and personnel of the service, but can also act as a bespoke solution and involve preparatory case-by-case technical assistance which could involve face-to-face consultations.
Text mining services thereby can come in a range of different forms and can be bundled with other services and content-based products. For example, a standard service delivered remotely can be solely a website available to anyone capable of using the Internet. A more customised service can be adjusted for specific needs for end users within an organisation and a corpus of text content the organisation has exclusive access to. A highly bespoke solution can be a combination of a software service combined with personal assistance for technical implementation or just a one-off analysis of a particular corpus of text material.

Benefits
The majority of researchers using text mining technology relies on services offered by the organisation they are affiliated with or other service businesses. In some domains substantial resources are spent by the organisations themselves for developing text mining based services. At the same time there is a drastic growth globally in the number of researchers and number of published research articles as well as natural language text in general.
Particularly with regard to researchers keeping up with research articles it is widely acknowledged that it has become extremely difficult for researchers to keep up with new publications in their domain.
Fraser & Dunstan (2010)  Text mining is also unprecedented in its ability to find unsuspected knowledge. TDM has already enabled new medical discoveries through linking existing drugs with new medical applications, and uncovering previously unsuspected linkages between proteins, genes, pathways and diseases (Hargreaves 2011b).
In the humanities, the application of text mining to literary texts published over the centuries allows new questions to be pursued: "One example of text mining is research that compares the frequency with which authors used "is" to refer to the United States rather than "are" over time" (Baer 2012). Implicit in this, it enables discoveries human effort alone could not achieve due to information 'overload' or overlook and "filter failure" (Shirky 2008).
Therefore, text mining technologies' potential lies in solving essential problems, increasing quality and quantity of research results, making the research process more efficient, cost-effective and productive and creating new value.
The beneficiaries are the researchers themselves, the organisations they are working for, other businesses offering text mining services as well as the public at large since research with the help of text mining technologies can deliver more input into production processes in other industries and additional overall social and economic benefits. Those benefits include an increased level of innovation, productivity growth and economic activity.  While the production of the actual research results represents the mammoth share of the total costs, the time spent by researchers searching constitutes costs at a similar scale as the costs for the whole science publishing industry combined.
As the UK Government (2012) notes, "costs to British researchers from the time spent looking for the right material to read form a significant proportion of the overall cost of research. Reducing those costs would free researchers' resources for other uses." Also the 20% of time spent for reading as presented in Figure 4 and the associated costs would be an area where efficiency gains would be of considerable value. TDM could reduce human reading time by 80% and could increase efficiencies in managing both small and big data by 50% (McDonald et al. 2012).
Hence, a wider use of text mining technology and promoting its adoption and development has huge potential to realize productivity gains by contributing to such cost reductions which would translate into increased public welfare.
"The potential of TDM technology is enormous. If encouraged, we believe TDM will within a small number of years be an everyday tool used for the discovery of knowledge, and will create significant benefits for industry, citizens and governments" (Kelly et al. 2013). One aspect of the public welfare increase potential is the increased economic activity (or its emergence to begin with) of the research text mining service market.

Market
The research text mining market comprises businesses which facilitate the use of text mining technologies with research purposes. It is comprised of businesses which provide text mining technology services as their main business to other organisations or individuals and thereby facilitate users as well as businesses which provide such services to other business units within one and the same organisation. Typical cases where the service is rendered in-house are research intensive companies such as pharmaceutical companies as well as universities and publishing houses. Publishers also increasingly diversify into the text mining facilitation business.
While text mining technology and software components are sometimes sold as a one-off-purchase product without providing a service, the predominant form of facilitating the use of text mining technologies is an ongoing business relationship of aiding the technology user. The text mining market is constituted mainly of use licenses on proprietary technology with a major service component and throughout servicing businesses which includes software-as-a-service business types.
Such services are offered as stand-alone services as well as integrated in larger service packs and solutions. The text mining market overlaps with enterprise and web search markets as well as knowledge management systems and business intelligence markets. It is also to some extent converging with those markets. For example, search technology services increasingly use text mining technology to complement keyword based search technology with semantic search technology.
The volume of the global overall text mining market has been estimated to be between $400mn and $1.2bn with growth rates of at least 25%. Such estimates are often limited in their accuracy due to the fact that text mining products and services are often part of a larger pack or solution and text mining businesses integrated within text mining service providers' business portfolios and text mining businesses not aimed for other organisations but provided in-house are not reported distinctively.
Some explicit estimates are limited to the technology vendors' turnover and would be more limited still. For the purpose of this article and to provide an idea of the dimensions, the total global value of text mining businesses is assumed to be around $1bn.
While the total text mining market can be partitioned in verticals like health care, fraud detection, law enforcement, customer relationship management, etc., science and medicine as well as the publishing industry as a whole can be seen as such verticals.
While estimating the size of the global research text mining market would go beyond what can be covered in this article, it is known that there are over a hundred companies in all sizes which in one way or the other offer text mining products or services specifically targeting researchers and research intensive organisations. The total Science, Technology and Medicine (STM) text mining sales value was estimated at $40mn (Bousfield 2010) with similar growth rates as the overall text mining market.
The value of the research text mining market would go beyond the sales value and beyond text mining services for STM literature.
It is assumed that the total value is more than $60mn by now -which compares to about $90bn resource costs for searching, reading and publishing and distribution in the global communications system as presented in Figure 4 (CEPA 2008).
To realize the full benefits of text mining, the full-text is needed. Text mining users and services facilitating research uses depend partly on access and legal conditions for use of copyrighted works and copyrighted works increasingly are not obtained by a customer by copying them on the customer's server, but through an access license and web interface to the publisher's server with downloading of files taking place only on an individual basis, e.g. in form of PDF files. Since, text mining needs the full-text of large quantities of copyrighted works, publishers as copyright holders, if they are not providing the text mining service themselves, receive text mining "requests". In such cases the use of copyrighted works for text mining can be facilitated through mass-downloading of self-contained files such as PDF files, through crawling the publishers' websites or through APIs (Application Programming Interfaces). (2011), specifically with reference to science publishers, found that such requests come from corporate customers, individual researchers and academic groups, information processing intermediaries such as abstracting and indexing services as well as from other providers of information products and services.

Smit & Van der Graaf
To which extent full-text literary works, research articles and other works, can be used depends on legal and access conditions determined by copyright law as well as on the market structure and power relations between copyright holding publishing intermediaries, downstream information market customers and text mining service businesses.
"A more open market for the development of analytical technologies has potential to offer opportunities to new and existing businesses, with potential knock-on impacts for growth" (UK Government 2012). What market openness means and how businesses and the society as a whole would be affected from this will be explored in this article.

Players
Text mining services basically depend on three things: software capabilities; access to literary works and other natural language text documents; and, for copyrighted material, legal conditions to apply text mining software to the material.
With this, the extent to which research text mining services can be offered depends not only on the core asset software and technical expertise the text mining service provider can offer, but also on a complex environment of interlaced conditions of access to and rights to use natural language text documents, which can be broken down to three types of players. Types of players in this context, although with a helicopter view of contribution of those players to public welfare, can be largely defined by the type of business they operate and the business interests they have with regard to the potential benefits text mining technologies represent. The types of players therefore are not a template 43 for companies and other organisations as an organisation can have stakes in the 'game' in the role of more than one player.
The first type of player is a user who wants to apply text mining technology to natural text documents.
If a firm or university for example wants to use the technology within its intranet so that a larger number of its employees (or members or patrons) can effectively use it, it would still be the company which is the primary user of the technology. The second type of player is a text mining service business facilitating users. The third type of player is a copyright-based intermediary business, typically a publishing house, in the sense that its main business is to trade pieces of natural language documents as information goods relying on the excludability of those documents or corpora of documents conferred through copyright law.
The fact that the types of players are not a template for organisations can mean that a company appears in the role of a user as well as text mining service which it renders in-house to itself. Equally a copyright-based business can render text mining services. If a text mining service is operating mainly as its own business, i.e. appears as outsourced for the user and copyright-business, then it is sometimes called a text mining vendor.
Beyond the three players it is also relevant for the consideration of the business environment that natural text documents, in many cases also of high relevance for research text mining, are publicly available through open licenses or implied licenses and for some documents copyright does not apply at all.
Users who use copyrighted material appear as downstream markets to the literary works information goods market which would be the primarily affected market of a government intervention changing copyright. Text mining service businesses to the extent they render their service for copyrighted material appear as complementary markets to the literary works information goods market. Copyright law thereby becomes a determinant to which extent the potential public welfare gains of text mining technologies can be realized. How copyright determines the legal conditions as well as access conditions for use and development of text mining technologies will be explained in the following, after explaining the underlying public welfare rationale of copyright law.

Origin
With invention of the printing press, reproduction and distribution of authored literary works became a lucrative business -at the beginning without government intervention. Systems of granting printing privileges were utilized in Europe from 1469. The first legislation applying to literary works started to be adopted in the 17th century in book trade in the form of patents, but that was expensive and complicated. Around the same time internal trade conventions were used by printers and book sellers to secure and increase their return on investment. "You needed to be a member of a local guilt and there was a rule about reserving particular titles to particular individuals so that their investment in producing an impression wouldn't be blown away by the first re-printer coming along. This is done by registration and for a small fee. These conventions included private enforcement done by printers and booksellers, and were not valid beyond the local trade community ... The first copyright law which was invented in 1710 in England was a result of 20 years of lobbying by booksellers and printers with the aim to exclude and outlaw people they labelled as pirates" (Brito 2010).
"As such "the genesis of modern copyright law was trade -a commercial battle between booksellers in eighteenth-century England over proprietary rights in manuscripts … With the promulgation of the world's first copyright statute, the Statute of Anne of 1710, the privilege was codified and given to authors, instead of publishers. …" (Okediji 2000). Today similar copyright laws are adopted in most countries.

Scope of protection
Copyright is a type of intellectual property (IP) and it applies to creative works including literary works. Today literary works under copyright law not only comprise books but most types of authored texts of sufficient length and even computer programs. By means of copyright laws, governments confer to authors exclusive rights to empower them to exclude others from using their works.
Copyright is a bundle of exclusive primary and subsidiary economic rights and moral rights. The scope of copyright protection is further confined to protected subject matter and by copyright 'limitations and exceptions' (short: copyright limitations).
The protected subject matter does not comprise ideas and facts, but is limited to the "creative expression", i.e. the individual way an author presents his intellectual output. This "idea-expression" distinction guarantees "subsequent authors the necessary breathing space to make their own contributions by adding to, re-using, or re-interpreting, the facts and ideas embodied in the original work" (Sag 2009). Copyright protection is often described in three dimensions: the period of time, the scope and the level of enforcement of existing laws. In general, the term 'scope' in this context is not used consistently, but in this article refers to the subject matter which is defined as protected under the law and uses of protected subject matter which are not exempted from copyright protection by some form of copyright limitation.

Copyright limitations
Copyright limitations "accommodate more specifically a variety of cultural, social, informational, economic and political needs and purposes" (Hugenholtz & Senftleben 2011) by exempting particular types of uses from copyright protection. If a type of use is exempted then a user doesn't require obtaining permission from the copyright holder for this type of use. Copyright limitations have evolved together with the copyright system as a whole.
There are basically two approaches for copyright limitations in the law. Both approaches can be combined.
One approach is an exhaustive list of designated limitations. Each designated limitation explicitly describes a specific type of use which is exempted. This approach is currently in place in the EU and will be further explained in later sections.
The other approach is a more open-ended type of limitation where the question whether or not a specific type of use is exempted from copyright is addressed with a set of normative factors. This latter approach is codified in the law typically with a fair use provision. The fair use factors are not exhaustive and do not have to be considered in a cumulative way. Often the provided factors concern the type of work, the type of use, the extent of use, and the impact the use presumably has on the market of the original work. Beyond the factors a fair use provision outlines, exemplary uses giving more indication in which cases a use would be exempted from copyright.
Fair use is more flexible to accommodate new types of uses in the law than designated limitations. On the other side defending a fair use claim in court can be expensive favouring corporations with considerable financial resources. It has been also criticised to feature a lack of legal predictability, but others found that fair use in fact is predictable (Sag 2012) and that for example copyright in the EU is far less predictable than commonly assumed (Hugenholtz & Senftleben 2011). Fair use court case decisions from one country often can be used in other fair use countries which increases overall predictability.
Fair dealing and fair use have historically the same roots in the former British Imperial territories. In the context of this article the most relevant types of copyright limitations are limitations for text and data mining/analytics, for search engine uses, generally for uses with scientific purpose, limitations for temporary copying (including caching and transient or incidental copying) and to a certain extent limitations on copyright for libraries.

National law
Copyright is national law. It automatically grants copyrights to authors depending on where the work is created, and it applies to users and is enforced in the country where the work is used.
National constitutional law can include language which can justify copyright laws to be anchored in legislation. Most notably, the US Constitution authorizes the US Congress to "promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries" (USA 1787).
The significance of case law, also known as common law, concerning copyright varies and depends on the type of political system a country has adopted. In the US case law is given more power and flexibility to determine how copyright law is interpreted. This effectively can lead to changes in how law is applied which in countries with a civil law approach could be achieved through parliamentary process and changes to legislation only. Accordingly, changes to copyright law in this sense are slower in civil law countries. EU member states are operating a civil law system. In case law, court cases are more frequent as well as more important as a reference for users of works to anticipate whether a specific type of use would be exempted from copyright. The fair use provision in copyright law is typically associated with a case law system.
While common law and civil law systems function differently, in principle though arguments brought up for a specific type of conflict of interest follow the same logic in both systems.

International and supranational law
The potential conflict of author rights granted in one country with the use rights given by copyright law in another country are mitigated by aligning national copyright laws through international agreements. Such agreements and conventions are mainly about ensuring equal protection of copyright holders against potentially infringing uses in other countries.
Historically, for example France, which had a strong interest in the protection of its own works abroad, was in conflict with Belgium in which the printing industry had specialized in reprinting French works. Von Lewinski (1999) recaptures how this conflict was solved: France employed its general economic superiority by arranging a packaged deal. France was ready to sign a general trade agreement, which would bring about advantages for Belgium in other areas of trade than in intellectual property only under the condition that Belgium would agree to reciprocal protection of French literary property. The interests of other Belgian industries prevailed over those of the Belgian reprinting industry. The agreement was signed.
Bilateral agreements have evolved to larger regional and near-global international agreements constituting today's international copyright law to which national copyright laws have to comply. In this sense, in contrast to national law which applies to citizens of a State, international law primarily 47 applies to the States. It also differs in that it applies only if a State has expressly consented as opposed to national law which applies to a State's citizen by default. Another EU level agreement worth mentioning is the EU Database Directive which confers protection rights to database creators. A "database" is defined distinct from just a collection of individual works made available online or otherwise in that additional investment is made in its creation separate from the production of the works themselves and that this investment is not made by routine (Reichman & Okediji 2012). While this Directive may applies to results from text-mining literary works it appears not to apply to the use of literary works as such, because the investments made in repositories and 'databases' containing the literary works are made by routine.
The Berne Convention, WCT and TRIPS agreements have been signed by nearly all countries worldwide and are less restrictive than the EU Copyright Directive. The EU law also stands out through a higher level of standardisation of national laws among its member states.
International as well as the supranational EU copyright law sets minimum standards for copyright protection such as a minimum copyright scope. Those standards include provisions on protectable subject matter and "outer boundaries" (Reichman & Okediji 2012) to the extent copyright limitations are recognized and permitted to be implemented by its signees and member states respectively.
The EU approach to copyright limitations as well as the approach by its member states is to have a set number of designated copyright limitations. The Copyright Directive lists one mandatory limitation and all other limitations listed are optional to be implemented in national law. Additionally, the Directive has included the so called Three-Step test, because this Test represents an outer boundary to copyright limitations under the international TRIPS and WCT Agreements to which the EU has to comply.
The Three-step test is a normative set of rules similar to fair use and has become the outer boundary for copyright limitations on a near global scale as it is included in slight variations in all the major international copyright agreements. In the TRIPS Agreement, for example, the Test appears in Article 9. It obliges WTO members to comply with the Berne Convention (except the section on moral rights) and that they "shall confine limitations or exceptions to exclusive rights to certain special cases which do not conflict with a normal exploitation of the work and do not unreasonably prejudice the legitimate interests of the right holder" (WTO 1994).

48
The Test's interpretation as it is applied is often quite restrictive which is somewhat contrary to the intensions leading to its inclusion in the respective agreements. Its intension is to leave countries space to shape national law according to their specific interests and also to create new copyright limitations in response to technological developments (see e.g. Reichman & Okediji 2012).
Besides a generally stronger emphasis on economic objectives at EU level compared to national level, EU level copyright law is also distinct from many national copyright laws in that it does not oblige to adhere the idea-expression principle: It does "not contain provisions on the subject matter" of copyrights (Koelman 2004).

Flanking rights, anti-circumvention and contract law
Historically, the reproduction right which is part of the bundle of rights conferred by copyright law has been added to the bundle as a "flanking right" (Koelman 2004) to the distribution right. It provides an additional protective mechanism.
The reproduction of information goods as such does not yet harm the producer of an information good who wants to sell the good, i.e. reprinting a book and storing the reprints in a warehouse without distributing them does not harm the commercial interest of the seller of the original prints.
Disallowing the reproduction as such though prevents the activity preceding the distribution and therefore gives the seller an additional legal device to prevent unauthorized distribution by preventing the preceding activity.
Along the same lines, anti-circumvention law is an additional protective mechanism for producers, sellers and other intermediaries in the information goods market to prevent copying. Anticircumvention laws were meant to adjust copyright to the new technical condition of mainstream adoption of computers and computer networks for production, dissemination and use of information goods by complementing copyright laws.
It complements it by protecting technical protection measures (TPMs) and determining the legality of using certain technologies and practices to circumvent TPMs from the producer-side perspective as well as protecting circumvention from the user-side perspective.
For example, TPMs can be used to prevent the use of software to automate and thereby speed up the otherwise manual process of copying information goods through computer networks. If a TPM is circumvented by a user to copy the good anyways then this use may well be unlawful under anticircumvention law even if the use would be lawful under copyright law.
In legal terms, anti-circumvention laws represent a "nexus" between law applying to technology and law applying to information goods. TPM protection by anti-circumvention law has been interpreted as a means of copyright enforcement (Westkamp 2011).
In most cased they represent an extension of the scope of copyright protection in that they can be used to prevent uses through computer networks of subject matter not protected under copyright law as well as uses exempted by copyright limitations.

49
In such cases anti-circumvention laws overrule copyright law. Computer code has been described as a form of law to highlight this issue. Copyright limitations are not overruled by anti-circumvention law in relatively few countries (e.g. in Norway and Australia).
In the same vein, contract law is interlaced in copyright law in that it governs through end user contracts, market transactions between a producer-side seller of an information good and a buyer (which is not necessarily the user in cases of institutional purchases e.g.).
It can be used to impose conditions upon the transaction, such as conditions limiting the users' freedom to use subject matter not protected under copyright law as well as uses exempted by copyright limitations. If the interlace of the two legal systems is of that kind then contract law too overrules copyright and represent an extension of the scope of copyright protection.
"The emergence of technological protection mechanisms (TPMs), often reinforced by one-sided contractual provisions, have enabled copyright owners to exercise an unprecedented level of control over both the access to and the utilization of creative works worldwide" (Hugenholtz & Okediji 2008).
The most important laws on national and supranational level where anti-circumvention laws are inscribed are the US' Digital Millennium Copyright Act (DMCA) and the EU's InfoSoc Directive.

Text mining under copyright law
Applying text-mining technologies to literary works is not explicitly considered in international copyright law but is increasingly gaining attention on a national level. The focus in this article is on how copyright law applies to text-mining uses of literary works and not on how copyright law applies to text mining software or to text mining results. Basically, there are two ways how text mining is treated under the law today -either text mining uses are exempted from copyright or they are not.
If they are exempted they are exempted either with a designated copyright limitation specifically for text mining or another type of designated exemption defined in a way that text mining falls within this limitation or text mining uses fall under the fair use copyright limitation subject to case precedence and evolving interpretation of text mining uses as being "fair". Exemptions can be further differentiated in exemptions which apply to text mining uses only if they are for research uses, i.e. with research purpose, as opposed to all uses. Exemptions can also be differentiated in exemptions which apply to text mining uses only if they are for non-commercial uses as opposed to all uses. Such distinctions would primarily decreases the number of users which could exercise such an exemption.
Also, to which extent contract law or anti-circumvention law can overrule copyright limitations determines to which extent an existing exemption can actually be exercised. With this types of legal treatment can be further differentiated with regard to the extent and types of use cases and users where text mining is de facto exempted.
Not least, copyright also defines the conditions for applicability of the exemption with regard to technical access to literary works by other intermediaries beyond copyright-based intermediaries (i.e. beyond most publishers). Technical access is obtained by other service businesses such as abstracting 50 and indexing services, search engines, discovery services and document management services just as part of their business relations with publishers or 'end' users of literary works. Accordingly, text mining copyright limitations could be further differentiated by how such 'third party' access may provide enough legal ground for exercise of an exemption by such businesses. While the distinctive definition of types of 'end' users which can exercise a text mining exemption indirectly affects all text mining service businesses, the question of what type of access is defined to qualify for applying the exemption primarily concerns the market for text mining businesses. This market and overall use of text mining technology might look very different if third party businesses could offer text mining services based on a text mining exemption for literary works they have technical access to.
The only example of a country where text mining currently is exempted with a designated exemption (seemingly not limited to non-commercial text mining) is Japan. The "Japan Copyright Act (2011) makes explicit provision to allow text mining, with Article 47 making a limitation to copyright: "For the purpose of information analysis ('information analysis' means to extract information, concerned with languages, sounds, images or other elements constituting such information, from many works or other much information, and to make a comparison, a classification or other statistical analysis of such information; the same shall apply hereinafter in this Article) by using a computer, it shall be permissible to make recording on a memory, or to make adaptation (including a recording of a derivative work created by such adaptation), of a work, to the extent deemed necessary'" (McDonald et al. (2012) citing CRIC (2011)).
In the UK and Ireland, changes to copyright law by introducing a designated copyright limitation specifically for non-commercial research are on their way, probably to be implemented in the UK in spring 2014.
The only example where text mining was explicitly considered within a fair use country was in the US landmark case The Authors Guild vs. Hathitrust Digital Library (HDL). In this case, text mining uses by the HDL, a large library group, were found to be fair: "The use to which the works in the HDL are put is transformative because the copies serve an entirely different purpose than the original works: the purpose is superior search capabilities rather than actual access to copyrighted material. The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining" (Baer 2012).
The case is of high significance around the world. It concerns the copyrighted part of the around 10mn books digitised by the libraries in cooperation with Google. Accordingly, a similar court decision is pending in a case Authors Guild vs. Google which differs slightly, as Google is a commercial entity, in contrast to the HDL. Nevertheless, the case is anticipated to be found fair use as well.
In Australia, text mining is explicitly considered in ongoing consultations on copyright reform which will conclude and lead to legislative reform this year. The approach there is to introduce a US-style fair use system and as an alternative include text mining uses within the fair dealing copyright limitation.
Fair use case precedence can be leveraged in other countries with fair use. Besides the US also Israel, Singapore (with effect 2006), the Philippines (2007) and South Korea (with effect 2012) have a fair use approach. Taiwan has also been cited as a fair use country where similar conditions occur as in the other fair use countries. The term "fair use" is not strictly distinguished from fair dealing in that the fair dealing copyright limitation is also based on a normative factor assessment and gains traction on a global basis to be applied more flexibly to also include uses through new digital technologies which can include text mining uses.
Countries like the UK and Ireland depend on EU legislation. The copyright limitations they are likely to introduce soon are sitting under the research copyright limitation under EU law which is limited to non-commercial research. Hence, the UK and Ireland could not introduce a research text mining copyright limitation including commercial research uses without EU copyright law change. EU copyright law is under review as well, particularly considering research text mining. A decision on change of EU copyright law, possibly including the introducing of a copyright limitation option or obligation for member states applicable to research text mining including commercial uses, is planned to be made in 2014.
In most countries the application of text mining technologies to copyrighted works in general is treated as it was covered by copyright, i.e. text mining is treated as a form of derivative use and those uses are covered by the bundle of rights conferred to the author by copyright law. Accordingly, the application of text mining which normally involves making a copy of the works which are text-mined is considered to require a separate type of license beyond an access license -similar to an extra licence for a translation of a work.
For appraisal of a contemplated change to copyright law with regard to text mining uses an understanding of the economic justification of copyright law in general is indispensable, i.e. an understanding of copyright's intervention rationale subject to the overriding goal to maximise public welfare.

Internalising positive externalities
Literary works are information goods. Copyright applies to information goods, which are goods that have intrinsically public good character and are typically associated with a high level of positive externalities. That is, in absence of some economic institution such as copyright law information goods once they are available would spread quickly and widely and basically are available to all and there would be not much inclination by users to pay as the benefits can be enjoyed anyways. This also implies the private benefit of an individual user reflected in her inclination to pay is lower as the social benefits of such a good. Consequently, there is not much inclination for private producers to invest in such goods, because the producer would have a hard time offering a market price perceived appropriate. Benefits are enjoyed anyways without entering in a market transaction and "payment cannot be exacted from the benefited [third] party" (Pigou 1932). Due to those positive externalities an information goods market would barely come into existence and there would be an under-supply and under-consumption of such goods.
Similarly, for goods with negative externalities compensation cannot be enforced on behalf of the injured third parties (Pigou 1932) and there would be oversupply and overconsumption. For both types of externality the market and its participants' rationale behaviour does not achieve the overall optimal production and consumption of the good which is referred to as a "market failure" and is detrimental to the objective of public welfare maximisation. In general, different types of market intervention are used to deal with externality problems, most notably property rights, but also tax-subsidy schemes and public sector production.
For example, tax-subsidy schemes are used to alleviate market failure. The goal is to internalise the external effects in the market price system by accrediting external effects to those parties which are actually causing them as they are involved in the market transaction concerning the good. Basically, producers and consumers are involved in the market transaction and taxes and subsidies can be imposed on the producer-side as well as consumer-side. For example, a tax would be imposed on producers of goods with negative externalities, but a tax can also be imposed on consumers of goods with positive externalities. A power plant owner could be called to account for air pollution by having to pay an extra tax. The costs incurred on third parties in the form of suffering from pollution are internalised in the market in the form of the tax. They are internalised in that the producer bears additional costs, the tax itself or costs for a new filter system e.g., and in form of the costs passed on to the electricity consumer who has to pay more for the 'clean' electricity. If the externality is internalised in the market through continued taxes the public funds function like a market mediating pool of money with tax income and subsidy costs. The ingoing and outgoing amounts of money values are not earmarked.
A similar type of market intervention would be price control. Here the government does intervene in the pricing mechanism by setting minimum or maximum price limits to which market participants have to comply, but does not disperse or exact money-worth values on behalf of affected third parties, which often is the public at large.
Another type of market intervention to deal with externalities would be public sector production. The function of the self-regulating 'free' market of private producers to ensure optimal production of goods through price formation is not mediated by public funds, but relinquished to the public sector altogether. Instead of enticing (or discouraging) certain behaviour of private producers and consumers to make a market for goods with externalities function, the public sector is producing the good. This is also a kind of subsidy as it is paid out of the public funds. As the public sector acts as a producer it can still be seen as an internalisation of externalities; someone takes care of the production whatever the nature of the good produced.

53
Property rights are another type of market intervention that deal with externalities. Copyright laws represent the granting of property rights to deal with the positive externalities of information goods.
For example, a patch of land can have no owner and have no fence, but this may lead to many people or cattle using the land depleting the land's limited natural resources. Privatising the patch of land makes someone the owner of the land giving the owner rights to exclude others from using it, for example by using a fence. On the other side, the land owner can be held accountable when negative effects from the land spill over to another persons' land. The owner could also impose access conditions to exact payment if other people have some demand for using the land. Hence, with the property rights market, external effects can be internalised in the market and price system. Incentive for the owner to take care of the land, e.g. to make it arable, as well as liability, are established by granting rights for the owner to exclude others which effectively confers excludability characteristics to the good (i.e. the land) itself. To the extent that the good is still subtractive, the good is turned into a club good or private good. A patch of land becomes a club good to the extent that people using it do not subtract from the benefits other people could get from using it. On the whole for larger numbers of users and depending on the type of use, a patch of land would appear as a private good, for example if the patch of land, or the patch of real property for that matter, is small enough in size so that use by one person already would subtract from the benefits another user could gain from using it. Along those lines, tangible goods, i.e. physical goods somehow tied back to natural resources, can be managed relatively well with property rights. It is relatively clear to define property rights according to the physical properties of the good. "Physical objects suggest at least a core definition [of property rights] congruent with their physical attributes" (Sag 2009).
The question whether or not to privatise a tangible good often does not arise at all, because the excludability of physical goods is part of their intrinsic character. Trespassing the rights of the property owner can be relatively clearly identified as the physical or spatial intrusion by others in the owner's 'exclusion area'. Therefore, conferring property rights to tangible goods at large is an appropriate type of market intervention.
While some questions about the scope of tangible property rights remain, on the whole managing tangible goods in this way is beneficial and increases public welfare (investment by private producers is ensured). This has been done in this way more or less intuitively for thousands of years.
Property rights can prevent the negative externalities of depletion of land -third parties unaccounted 'business' of ransacking land by chopping trees e.g. would be prevented by the land owner's private interest to preserve the value of the land. Tangible property rights ensure that investment in the preservation and production of tangible goods is made by legally backing up the owner. They increase efficiency of overall production of goods and in how resources are used and decrease transaction costs.
Tangible property rights in principle represent the same rationale on which taxes are imposed on a factory owner's unaccounted business of ransacking clean air which -if 'only' on a global scale -is 54 degradable. The rationale is internalisation of externalities to make a market more efficient. Based on this rationale producers of solar panels are subsidized and educational services are produced by the public sector. Without internalising the positive externalities of those public welfare enhancing production activities, public welfare benefits would not be realized due to the lack of investment incentive for private producers.
Subsidies are disbursed to the 'clean-energy' producer who avoids depletion of clean air by substituting fuel-based electricity production with comparably inexhaustible solar-energy based electricity production. Education is produced by the public sector because of private producers' lack of incentive to invest in making human minds, in this sense, arable. Without such market interventions the public welfare benefits from education and cultivation would not be realized, because the social benefits from education in the form of knowledge spread throughout the society and over generations without much chance to capture those benefits in a market transaction.
In all of these cases, the rationale is to ensure the optimal level of the production of goods by internalising externalities in some way, leading to maximisation of public welfare. The question now is how does the internalisation of externalities work for intangible goods such as information goods?
More specifically, how do governments deal with positive externalities deriving from literary works?

Literary and creative works property rights
Information goods are pieces of information that carry specific utilities. Literary works as a type of creative work are information goods. The internalisation tool and type of government intervention of choice for literary works is property law. Copyright law confers property rights by default to the producer of a literary work, instituting intellectual property.
Conferring those property rights to the producer also confers excludability characteristics to the intrinsically non-excludable literary works goods. This enables the producer to recoup production costs by the ability to impose conditions on reproduction, distribution and use of and access to the produced works.
Thereby copyright grants to the producer a trade advantage for marketing the literary work as a market good, but intentionally limits the conferred exclusive exclusion rights by defining the protected subject matter as not absolute as well as defining not all types of uses as excludable. The scope of copyright protection is limited in that the subject matter does not comprise the ideas and facts conveyed in the protected expression and in that it does not comprise the types of uses defined by copyright limitations. In this way copyright law not only incentivises production of marketable information goods, but also guarantees a certain level of positive externalities deriving from such goods. The ideas and facts contained in the expressions and uses of a literary work defined by copyright limitations based on the purpose of the use are intentionally left out of the literary works information goods market. The intension is to ensure the allocation of property rights on information goods at an optimal 55 level of information externalities benefitting the society as a whole beyond the producers and consumers of the literary works goods.
Among the reasons for this limitation of scope are that the utility and value deriving from information goods is elusive and not tied to the use of a particular tangible good. Therefore property rights are naturally limited in their ability to capture those values in a market system and in any case it is hard to identify who is actually (not) increasing utility.
The famous Coase theorem explains how property right schemes can tackle the externalities problem and achieve efficient market outcomes. It requires that property can be clearly defined and that all parties benefiting (or burdened) can be involved in organizing market transactions. Both conditions cannot be fulfilled for information goods.
Relative to this, the production of literary works often builds on reusing existing literary works and at a certain level of granularity of the property rights scheme and at a certain level of complexity of the sequenced and self-feeding production processes the transaction costs would become prohibitively high. More generally, the transaction costs would become too high and appear as a negative effect for any type of reuse of a literary work, even if the work is passively consumed or reused for production of other goods beside new literary works. E.g. negotiating a price with every person benefitting from basic education services would be as infeasible as negotiating a compensation with every person suffering from polluted air. In this sense, the limited scope of copyright represents the level to which a property rights scheme is feasible and economically justifiable as a means to internalise the positive externalities deriving from literary works and other creative works.
The positive externalities intentionally 'left out' of the property scope from the very beginning a literary work is produced do not distort market allocation, but are an integral part of the property rights scheme taking into account the natural good characteristics of literary works. Market allocation of rights on literary works as well as increasing information externalities are part of the economic rationale of copyright law to maximise public welfare. "Even where externalities distort market allocation, those distortions may be social welfare enhancing. … Conversely, extending property rights to internalize externalities may distort market allocation in a manner detrimental to social welfare" (Frischmann & Lemley 2006). Frischmann & Lemley (2006) explicate this symmetry of copyright between externalities and property rights: "Property rights distort resource allocation in the same way as externalities, although perhaps in the opposite, or at least a different, direction. … whether an externality is positive or negative depends on how and to whom we allocate rights. … When relevant, both externalities and property rights distort the market allocation of resources; when irrelevant, neither does." On the one side copyright encourages production of literary works and other creative works to increase public welfare by conferring partial excludability to the works. It ensures "that private producers have the necessary incentive to create and second, to preserve the ability to appropriate value from the utility users derive from the created work" (Okediji 2000). On the other side it ensures stimulation of information externalities to increase public welfare and guarantees that information externalities from using literary works can feed into the common pool of knowledge resources and into subsequent productive activities including production of subsequent literary works without the producer being involved in market transactions, but nonetheless generating social returns.
In public welfare analysis the two sides and the respective stakeholder interests need to be balanced.
This balancing act has been expressed in many ways. The UK Government (2011) in the context of a welfare assessment concerning changes to copyright law puts it this way: "There is a trade-off between the increased incentives and rewards given to creators, and the economic and cultural benefits that flow from this, and the negative impacts on users of copyright works who face restricted supply, increased transaction costs, and less freedom to use knowledge, data, and cultural works." Houghton & Gruen (2012) emphasise that more copyright also leads to "reduced production of copyright works where those works build on other copyright works [which together with underconsumption of goods] may detracts more from welfare than any gains more stringent copyright holds out for copyright industries." Okediji (2000) alludes to the potential increase of public welfare, non-monetary values and total output of wealth through non-pecuniary information externalities: "The fact that copyright is more than just the sum of the economic values it can generate is one marker of its social welfare function.
There has never been serious question that the protection of intellectual property imposes certain costs to the public" Sprigman reminds (Brito 2012) that future generations are stakeholders too and that limiting access to knowledge and making the spread of knowledge more expensive can hamper scientific and educational progress and have terrible knock-on effects on the society in the long-term.
Finding the right balance and determining the copyright scope which maximises public welfare is a complex endeavour and requires a closer look at the copyright rationale with a focus on its dynamic dimension.

Dynamic view and balancing scope
Striking this balance to maximise public welfare is the core tenet of copyright law rationale. Property rights in general are rights to use the property in certain ways and exclude others from using it in certain ways. Copyright balance defines the scope of the rights copyright guarantees for producers of works and thereby defines to which extent users have rights to use the works. The scope of copyright determines the level of incentive and reward for producers based on the level of control the rights holder has over the work, i.e. which subject matter is covered and which uses of the commoditized parts require permission.
By also determining to which extent users have rights it also determines private benefits for users and the level of real externalities spreading into the society at large condensing in productive activities of potentially any person or organisation in the society. The scope of copyright draws the line between 57 the "private rights component of copyright law [improving] investment incentives" (Frischmann & Lemley 2006) and the "commons component" comprising parts of the subject matter, uses of expired works in their entirety and uses exempted by copyright limitations.
Since new literary works are often produced based on use of existing literary works the system of production of new works and reuse of existing works is a self-feeding process. To grasp the public welfare implications of the self-feeding dynamic, copyright is viewed in two dimensions: The static view describes the market inefficiency copyright introduces. Through the monopoly right the copyright holder is protected from competing producers including unauthorized copying of works, i.e. less competition takes place. "The author's exclusive rights under copyright law provide a buffer against price competition. This buffer to competition allows the author to charge higher prices than she otherwise could" (Sag 2009). Because more competition among producers fosters innovation and keeps the prices down, to the benefit of consumers and society as a whole, less competition means there is market inefficiency which represents unrealized public welfare benefits (i.e. costs for the society as a whole). The "rationing of ideas through copyright increases the costs for follow-on creators and excludes certain individuals from creative activities" (Ramello 2002). The increased costs represent an increased social welfare loss due to under-utilization.
The dynamic view is based on the assumption that under perfect competition the level of new works created would be well below the optimum for public welfare and stresses the incentive theory and how it ensures knowledge production through creation of new works over time -'re-establishing' market efficiency. Yoon (2002) shows that "an increase in copyright protection … will always decrease the social welfare loss due to underproduction".
The two views brought together in economic analysis of copyright give the "standard theory of copyright … [which] rests on the assumption that the social cost of the monopoly granted by the right is effectively less than the expected benefits, with a positive balance that maximises welfare. In other words, the static inefficiency associated with the monopolies granted by the right is offset by the expected dynamic efficiency resulting from the production of an optimal level of new ideas" (Ramello 2002).
However, this static-dynamic view focuses on the information goods market and monopoly on and production of new creative works. Beyond this, also other productive activities are based on using existing creative works. Particularly for literary works can be said: While the dynamic efficiency and incentive concerns only producers of the works which are incentivised by the exclusive exclusion rights, the static inefficiency applies to users of all works, which can be producers of goods other than new literary (or creative) works in any downstream market. To grasp the full dynamic of copyright's balancing act the costs for those producers of other goods need to be taken into account as well.

Transaction costs
Transaction costs are costs incurred to initiate and carry out a market transaction. They include costs for finding the right good and owner or seller (search costs), bargaining costs, administrative costs, as well as costs for exacting payment and preventing infringement (infringement monitoring and enforcement costs). "Transaction costs are central to an economic understanding of property rights because they dictate both the scope and the form of private rights. … The allocation and definition of property rights determines both which individuals have the authority to decide how a specific resource is used and to whom the costs and benefits of that use will flow" (Sag 2009).
Accordingly, transaction costs are in some way shared between producers and consumers and subtracted from the combined consumer and producer surplus. They also determine the efficiency of the market and the total value of the combined consumer and producer surplus, which contributes to public welfare. If costs for enforcement are incurred on other parties than the producer in the information goods market -e.g. on the State by providing the legal system and enforcing it -those costs would additionally subtract from the public welfare contribution of that market.
Generally it can be said that with the adoption of computer networks transaction costs have decreased because someone who is looking for something can just search the Web to find the 'something', compare different sellers more easily and pay online. This decrease in transaction cost would in turn increase the consumer and producer surplus and thereby also increase public welfare. However, this logic does not hold if the 'something' is an information good.
Property rights on information goods are a legal construct not based on the physical properties of the good. The value deriving from information is to a large extent a real externality, which has no predetermined fixed value. It is also hard to define and capture the scattered 'bits' of value within a property rights scheme and to a large extent (third party) beneficiaries cannot even be identified. For all those reasons the property rights on information goods are by economic calculus limited in scope and payment is not expected to be exactable for all benefits. Anything else would incur transaction costs too high to be justifiable under the public welfare rationale, or, in the words of the staticdynamic view, with increasing copyright scope the transactions costs increase over-proportionally.
While only few people benefit from the positive externality of a private piece of land which the owner keeps free of pest plants and prevents spreading of pest plants into the neighbours' gardens and the public park nearby, the information externalities of an influential book represents benefits to a very large number of people irrespective of their physical location, many of them unaware of their benefits or otherwise unidentifiable.
The increased value of the public park untroubled by pest plants as well as the social value of the book potentially affecting people around the world needs to be taken into account in public welfare analysis.
The book externalities are knowledge spillovers and, unlike the untroubledness of parks, multiply less bound to physical limits.

59
The underlying character of knowledge spillovers can also be viewed from the perspective of the owner of the good from which the positive externalities emanate. For the owner those externalities would appear as transaction costs (including infringement monitoring and enforcement costs).
Enforcing rights on remuneration for a pest plant prevention service and exacting payment from the neighbour or the city council for this service may still be conceivable. Exacting payment for tiny bits of utility emanating from a book without predestined fixed total value which spread out over large numbers of beneficiaries, which are barely aware of their luck would be insurmountable. Knowledge spillovers are boundless and this characteristic is exacerbated if the knowledge is not embodied and conveyed in a physical good but conveyed by an information good which by itself has public good character.
Because of the elusiveness of knowledge spillovers, the scope of property rights on information goods is defined to a certain level and does not even attempt to capture the whole process of value streams spreading out from an information good and feeding into follow-on production of subsequent information goods and other goods. Accordingly, "the definition of intellectual property rights [for intangible goods] must be even more sensitive [than for physical objects] to transaction costs, not just those between willing parties, but those imposed on the rest of the world. … transaction costs tend to be higher in intellectual property because it is frequently difficult to identify such property because by definition it has no unique physical site … the need to keep transaction costs low explains the ideaexpression distinction" (Sag 2009).

Literary works as production input
The self-feeding circular process of production and use of creative works by itself is a mix of wealth transfers via market transactions and non-market value streams in form of positive externalities, i.e. wealth feeding out in the common pool of knowledge and from there back into the creation of new works without market transaction. Beyond this, literary works as an information good are a source of information externality which is a type of knowledge spillover, and the utility from literary works not only feeds into production of new literary works. Copyright scope also sets the scene to which extent spillovers from literary works, the ideas they contain and the knowledge they convey, feed into production of other types of goods. It is a mix of wealth transfers via market transactions and nonpecuniary externalities which through a 'detour' into the common pool of knowledge, feed into the production of new goods without market transactions.
Although mostly discussed as such in the context of patent law, copyright too determines the level of innovation-generating positive spillovers as a demand-side information access and reuse condition.
For both types of production, new literary works as well as other goods, and whether through market transactions or not, producers can be not only authors of new literary works, or only scientists or only particular firms or only research organisations, but a broader range of non-passive consumers and organisations (assuming a minimum level of constructiveness and, for the literary works market, also literacy). In the context of this article the focus will be on uses of literary works with scientific purpose, which suggests that the potential magnitude of information externalities feeding into the production of new goods is higher than average.
In any case, both, pecuniary and non-pecuniary information externalities from literary works, are an important input for production of not only new literary works but also other goods. Thereby, they are important for innovation and eventually productivity growth and increased public welfare.
Copyright law prejudices to which extent, where and by whom those positive externalities from literary works are appropriated and how the potential increase of social value derived from literary works is turned into practice, depending on the making and breaking of virtuous cycles of innovation.
Hence, the scope of copyright represents a condition of how innovation-friendly a country is and how much of the potential productivity growth and public welfare gains from literary works can be realized.

Balancing with copyright limitations
The market system for literary works maximises public welfare because it does not cover all subject matter and uses of the works. "Flexibilities may be found in all elements of [the copyright] structure ...
Regardless of the relative fluidity of these and other core concepts of copyright law, limitations … are obviously the main instruments of flexibility" (Hugenholtz & Senftleben 2011). Copyright's limits and limitations guarantee that public welfare costs from protection do not outweigh the public welfare benefits gained from protection. To hold the balance between the conflicting stakeholder interests as well as for adjusting copyright by changing its scope, copyright limitations are the most common legal tool.
The public welfare function of copyright and specifically of its limitations has been explained in numerous occasions: Striking the balance in copyright law is pivotal to the public welfare goals and this balance is maintained through "doctrines in copyright law such as fair use" (Okediji 2000). Fair use sanctions uses of copyrighted material generating positive spillover (Hamdan-Livramento 2009). Ghafele & Gibert (2012) with reference to the copyright limitation for private copying note that "when private copying encourages widespread use [of works] it can increase total welfare even if this results in fewer originals being produced". Furthermore, countries with the overriding objective to maximise public welfare such as the UK, the US and Australia are currently reviewing their copyright laws including proposals for new copyright limitations.
The question is how much copyright protection is best for public welfare and following from this which types of copyright limitations are best suited to maximise public welfare? When technology changes the conditions then the question becomes how can the copyright scope be adjusted to achieve maximum public welfare under new conditions? Technology has enormously increased the potential uses of literary works and positive externalities emanating from them including information network externalities. Some types of uses possible with new technologies might not lead to the kind of market failure that copyright exists to pre-empt. Freeing these uses would create welfare gains, measured in terms of access to works (Sprigman 2009). Information technology makes production and dissemination of information goods less costly and therefore less depended on monetary rewardmore works are being created irrespective of the incentive copyright seeks to provide.
But even if uses possible with new technologies would lead to decreased output in the information goods markets this does not automatically mean a decrease in public welfare. Distorting one market can mean driving another market and beyond this can mean overall increased social benefits. "A balance has to be found between the gain in social[/public] welfare of the increased incentive and the social loss of hindering non-rival usage. In cases where a higher level of excludability does not increase production, the social loss of hindering non-rival usage increases while that loss is not compensated by a gain in social welfare of an enhanced incentive to create. In such circumstances, additional control must be inefficient" (Koelman 2004). Non-expressive uses of literary works represent relatively new technologies and they might also represent such a circumstance where making the uses excludable does not increase incentive to produce more literary works (benefits) to the extent it hinders non-rival uses of literary works (costs), with overall detrimental effects on public welfare.

Non-expressive use copyright limitation
Certain types of copying of a literary work are exempted from copyright due to the function the information technology used for copying the work fulfils. For example, a transient copying copyright limitation exempts copying of works for transmission through computer networks.
Copying of works as part of technological processes in many cases is not limited to pure transmission, but offers unprecedented ways to store, manage, process, search, analyse and integrate literary works.
Those uses represent a whole range of technical applications, which not only increase productivity in information goods markets and downstream markets and business activities in the respective technology markets, but also increase the benefits for users of literary works and the society at a whole.
The effects of copying of works as part of technological processes on producers of literary works have been interpreted in different ways including as additional benefits and as added costs from the higher risk of infringement and substitution of the original works. At the same time, in many cases the potential public welfare increase from technology-enabled uses is held back by outdated copyright laws. However, new legal concepts to frame copyright laws according to developments in information technology are on the rise. In one such concept can be summarised different proposals to exempt from copyright types of copying of works which are part of a technological process but go beyond transient copying.
Transient copying is sometimes defined by criteria such as whether or not human intervention is required for deletion of the copy (as opposed to an automatic deletion) or the time duration the copy is kept in the computer memory. In other cases transient copying is mixed with what is called "temporary" copying, but from a legal perspective many technology uses are not covered by transient and temporary copying copyright limitations or there is a high degree of uncertainty whether or not they are covered. Legal uncertainty represents risk and costs for the information technology businesses relying on copying literary works for non-expressive use. If technology uses such as using a search service would be deemed to not be exempted from copyright the transaction costs for running such services would be prohibitive.
The new legal concept is referred to as copyright limitation for non-consumptive use, non-expressive use or non-display use. In other economic contexts, the term "non-consumptive" refers to uses of resources which are not used up, such as water for a decorative fountain. The term non-expressive use seems more appropriate for the context of intangible goods and the specific property definitions of copyright law as it reminds what copyright protects: expression.
The proposed copyright limitations subsumed in the concept of a non-expressive use limitation have in common that they seek to exempt copying which is part of a technological process but does not trade on the underlying purpose of the expression of copyrighted works. The purpose of the expression is to be consumed or enjoyed by a human. Non-expressive use by contrast does not involve consumption or enjoyment of humans.
The concept of non-expressive use is thereby strongly related to doctrines such as the idea-expression distinction, which limits the subject matter protected by copyright law to the expression.
In the context of the landmark case The Authors Guild vs. Hathitrust where fair use was found Sag (2012) argued that "non-expressive uses have no potential substitution effect on any legally cognizable market for copyrighted works, because copyright only protects markets for expression, and not markets for discoveries, ideas, facts, principles, or concepts".
Referring to the public welfare objectives inherent in the fair use assessment he continues: "the mass digitization of books for text-mining purposes is a form of incidental or "intermediate" copying that enables ultimately non-expressive, non-infringing, and socially beneficial uses without unduly treading on any expressive -i.e., legally cognizable -uses of the works" (Sag 2012).
The concept of non-expressive use is also strongly related to the concept of transformative use, which is often used as justification for exempting uses of works from copyrighted protection. The concept of transformative use is rooted in the US copyright law's fair use factors. "Transformative" in this harming the financial interests of copyright holders but delivering additional products ("transformations" -a positive technological externality associated with a negative pecuniary one) and not supporting simpler products (labeled derivative) that merely redistribute revenues from the copyright holder." With the concept of the non-expressive use copyright limitation, the criteria for what type of use is exempted from copyright moves away from the technical act of literally copying information to the purpose of the copying. For example, to answer the question whether or not a non-expressive use limitation applies for a certain type of use the duration a copy is kept on a computer would be just one of many criteria or not be relevant at all.
Technology applications that are the target for this category of proposed copyright limitations have been named copy-reliant technologies. Such technologies, besides search engines, also include text mining, plagiarism detection software and reading devices for visually impaired and deaf people.
The idea of the non-expressive use limitation is to adjust and optimise the initial allocation of rights to the reality of advancing information technology and omnipresent copying predominantly as part of the technological processes to maximise public welfare. Lawmakers should optimize initial allocation and allocate property rights to their best initial use so as to minimize the harm caused by inevitable failures to reach private agreements (Sag 2009) -i.e. licence agreements, permissions, etc.

64
The concept of non-expressive use rights shifts the exclusion boundary represented by the scope of copyright away from the copying itself toward the actual purpose of the copying. The core tenet of limitation of copyright scope to cover only expression but not the underlying facts and ideas is applied in that the property right should protect only expressive uses, i.e. uses with the intended purpose of expression of humans reading it, but not copying for the purpose of the functioning of technology, i.e. non-expressive machine uses. By defining property by the purpose of the use the copying itself and whether or not a piece of content sits on someone's computer server is of secondary importance.
A non-expressive use copyright limitation also attacks the inherent additional transaction costs problem arising from the development of computer networked information technology. It is not only the potential utility emanating from information goods which have public good character and are elusive and hard to capture in a property rights system, but it is the conveyer of this utility, the information goods themselves, which have public good characteristics. Under the condition of advancing information technologies it becomes increasingly difficult if not impossible to base the capturing within the property rights scheme on the technical act of the copying of pieces of information.
A non-expressive use copyright limitation by shifting the exclusion line to the purpose permits the copying as part of technological processes by default and may reallocates transaction costs to the copyright holder in that it is the copyright holder who has to explicitly opt-out of the default rather than the user who has to explicitly obtain permission for non-expressive uses. While for some rights holders this may indeed increase transaction costs as the opting out represents some private costs, for all rights owners and users or literary works combined exempting non-expressive uses might decrease transaction costs and increase public welfare.
Transaction costs decrease especially for integrated cross-corpora non-expressive uses of works such as conducting a Web search query. Exempting non-expressive uses also enables additional nonpecuniary information externalities in the form of additional information network effects and increased economic activity of complementary information technology services, e.g. a service facilitating crosscorpora Web searches.
The legal concept could materialise as a separate designated copyright limitation, as an extension of the temporary copying limitation or through interpretation of fair use by court practice in fair use countries.
Beyond the de facto application of the non-expressive use copyright limitation in several countries besides the US for search engine and other copy-reliant technology uses, it has been de facto applied explicitly for text mining uses in at least one case in the US -in the case The Authors Guild vs.
Hathitrust which found that fair use applies. In consultations on copyright reforms it has been expressed that firms in the US regularly rely on text mining uses being exempted from copyright by fair use. The Hathitrust, the defendant in the Hathitrust case, in possession of digital copies of millions of copyrighted works among its vast library holdings, after the ruling announced that its infrastructure permits non-expressive research on copyrighted text without transmitting the text itself to the researcher.
Non-expressive use and text mining is used in a research context by numerous companies, universities and other organisations as well as individuals. It is used on publicly available works via the Internet as well as works purchased by organisations for use within the organisation's intranet.
Along the lines of the example in this article of a company which could increase its private welfare by leveraging information externalities and information network externalities using an Enterprise Management System for automating information processing and integrating information goods, the information technologies addressed with the concept of non-expressive uses represent an unprecedented potential to use works in welfare-enhancing ways -enhancing private welfare of information users as information good downstream market participants and complementary market players as well as beyond this enhancing public welfare.
Much of this potential and the actual gains can be seen in reports on "Big Data". Big Data includes data in form of natural language text documents including copyrighted literary works. For example: "According to the McKinsey Global Institute's (MGI) 'Big Data' report [(MGI 2011)], the generation of information and data has become a 'torrent', pouring into all sectors of the global economy and is predicted to increase at a rate of 40% annually. Exploitation of this vast data and information resource can generate significant economic benefits, says the report, including enhancements in productivity and competitiveness, as well as generating additional value for consumers. For example, MGI predict that effective and creative use [including non-expressive and text mining uses] of these large data sets in the US health care sector could generate more than $300bn in value per annum and reduce national health care expenditures by around 8%" (McDonald et al. 2012).
The potential of economic benefits and productivity gains from non-expressive uses of literary works is particularly high for text mining uses due to its inherent feature of leveraging information network externalities through large-scale fine-granular concept-level integration of literary works as well as such integration of literary works with other forms of data.
The private economic benefits and productivity gains realized in firms and other organisations are reflected in productivity gains and other benefits on a national level, representing an unprecedented potential to increase public welfare derived from copy-reliant technologies and non-expressive uses of literary works such as text-mining such works.
On the other side the investment in production of literary works as far as it is incentivised by the copyright exclusion mechanism and public welfare benefits deriving from this need to be taken into account too. The question in public welfare analysis of the right balance and the right scope of copyright with regard to non-expressive uses and copy-reliant technologies in general may be answered with the legal tool of a wholesale non-expressive use copyright limitation. Hargreaves (2011) proposed an "adaptability" or non-consumptive use copyright limitation on EU level which would fall in this category: "The UK should give a lead at EU level to develop a further copyright 66 exception designed to build into the EU framework adaptability to new technologies. This would be designed to allow uses enabled by technology of works in ways which do not directly trade on the underlying creative and expressive purpose of the work".
For text mining for research as a specific type of non-expressive use, the question can be answered also with a more specific research text mining copyright limitation. Hargreaves (2011) also proposed such a more specific copyright limitation for the EU level for text and data analytics including commercial uses.

Anti-circumvention law and non-expressive use
Technical access to a literary work and legal rights to use the work are two different conditions and copyright limitations and rights to exercise them are often overridden by TPMs. Due to this, anticircumvention law has unavoidably become a focus point of many debates about copyright limitations and their public welfare function. In the context of non-expressive uses of copyrighted works, including text mining uses, anti-circumvention law plays in as an additional control measure to exclude unwanted application of text mining software to copyrighted works.
It thereby stands in stark contrast to the non-expressive use copyright limitation which seeks to do the opposite -not redefining the property rights on literary works by extending control over technical access, but redefining the property rights away from access control towards criteria of type of use and its purpose, with technological access being of secondary importance.
Use of TPMs in the context of text mining can mean that a user is required to identify herself and explain the purpose of her text mining request before being permitted to text-mine. It can also serve as a way to distinguish between text mining requests from commercial organisations and 'non-profit' organisations and therefore potentially as a price discrimination tool -i.e. charging the former.
While "substantive copyright law allows the exercise of a [copyright] limitation for general public welfare purposes" (Westkamp 2011), TPMs cannot distinguish between infringing uses and uses of literary works exempted by, for example, a copyright limitation. Therefore, the interpretation of anticircumvention law as a means of copyright enforcement and the implied "normative hierarchy" between anti-circumvention law and copyright law is misguided (Westkamp 2011).
At least changes of copyright law concerning copyright limitations are heavily related to TPM protection rules (and the associated primacy of the contract law). A public welfare assessment concerning copyright limitations would need to take into account the potential increased control over the use of literary works and consequent market power copyright holders effectively gain by means of anti-circumvention law. This market power reflects the extension of copyright scope and excludability of information goods through TPMs. The extension thereby diminishes the intended positive information externalities which are at the core of copyright rationale.
In the EU context for example copyright limitations are effectively waived by strong TPM protection rules, which raises the question how copyright limitations' function to maximise public welfare can still work. Hilty (2007) notes that "it remains a mystery how the European legislature can argue that the aim is to foster development [in the EU] … or that the Directive in question complies with the 'fundamental principles of law including the freedom of expression and the public interest' … The possibility of promoting the [EU] Information Society in a manner involving division of labour -i.e.
through the use of specialized information brokers -... has been scuppered by the European legislature." Westkamp (2011) concludes that there is no evidence of welfare benefits from current EU anti-circumvention law.
Specifically for research text mining copyright limitations the abandoning of the overrule power of contract law and anti-circumvention law has been proposed in the context of a UK copyright reform consultation and very likely will be set in practice in April 2014. Copyright limitations can already not be overruled by contracts in Ireland and in Australia copyright limitations can not be overruled by TPMs. In both countries exemption of text mining uses as legislative reform is under review.
In any case, the way anti-circumvention laws are shaped and how they relate to copyright limitations meant to enable non-expressive uses also affects the power relations between copyright holders, users of literary works and service providers facilitating non-expressive uses. The business activities of all three players and the wider social benefits are affected determining public welfare. The role of service providers facilitating non-expressive uses will be explored in the following.

Non-expressive use based complementary markets
The scope of copyright and specifically which copyright limitations are in place in a given country determine how much value a user of copyrighted literary works gains as a consumer in the literary works information goods market and as a follow-on producer in markets downstream.
The scope of copyright and copyright limitations also determine to which extent information externalities from the literary works are non-pecuniary and lead to increased social benefits in the form of a widened common pool of knowledge from where it additionally can feed back into productive activities.
While complementary markets depending on permission from copyright holders can always serve copyright holders, the scope of copyright and copyright limitations a legislature chooses to set in place also determines to which extent complementary markets can serve downstream market producers and in general producers which benefit from the common pool of knowledge. Such complementary services can be information technology and software services as well as education and research services.
More specifically the scope of copyright and how copyright defines which type of access qualities for exemption determines the extent to which complimentary services relying on non-expressive copying of copyrighted works can facilitate users of literary works. What types of access qualify, in the verbiage in copyright law, is defined by what counts as "legal access".

68
Two types of conditions can be distinguished: the condition to which extent the actual users have to obtain an extra license beyond the legitimacy to use works by means of purchase or an access license agreement with the copyright holder; and the condition which types of access beyond the users' legitimate access qualify.
The first condition concerns the number of users which could be served by the complimentary business without the user having to obtain an extra license based solely on the legitimate access to works the user has. In any given country, this could be: none, in the case where non-expressive uses are not exempted; only users with non-commercial research purpose if the copyright limitation applies only to non-commercial non-expressive research uses; only users with research purpose, including commercial research, where all non-expressive research uses of the works are exempted; or any nonexpressive uses if a non-expressive use exemption is not limited to non-commercial uses or research uses. The user would still need to have legitimate access for a specified corpus of literary works for the complimentary business rendering the service for this specified corpus of works.
Besides the condition to which extent the user has to obtain an extra license, the second condition concerns the number of users that could be served by the complimentary business based on technical access the complimentary business has gained irrespective of the users access to the full-text works.
The question is whether such technical access counts as "legal access" under the law. Technical access does not differentiate between protected subject matter and unprotected subject matter and does not differentiate between exempted uses or copyright protected uses. Either someone has access to the full-text or not.
Technical access is also obtained by other information processing service businesses such as abstracting and indexing services, search engines, discovery services and document management services typically by means of an open or implied license for literary works publicly available via the Internet or by means of service agreements with copyright holders of the works or other users of the information processing service.
Where an open or implied license applies, the complementary service typically obtains access by crawling the websites where the works reside, although APIs and automated downloading can be used as well. A license is implied where a copyright holder making the work publicly available does not opt out of the crawling process by including a machine readable opt-out notice in the website's code such as a robots.txt signalling to the crawling service that inclusion in the service is not desired. In this sense, a licence agreement can exist as a two-sided license agreement in the form of a contract or an open license or implicit licence for literary works publicly available.
A service agreement with a copyright holder with a conditional access business model (i.e. where the works are not publicly available) is used to gain access to the works when a search engine, discovery service or abstracting and indexing service provides its service to the copyright holder, e.g. driving traffic to the copyright holder's website, thereby increasing usage and exposure of the works and eventually increasing sales and revenues to the conditional access business.
A service agreement between the information processing service provider and its users is used to obtain access to the works for example in the case of a document management services. Here a large number of agreements with a large number of users of the works instead of a small number of agreements with a small number of copyright holders enables the service provider to gain access in aggregate to a similar and in some areas even larger corpus of literary works than would be possible with service agreements with the copyright holders. The larger number of service agreements can be accompanied by a more streamlined process to get to such agreements, e.g. the agreement is part of the terms of service of the information processing service which can appear to the user as a website and the agreement is made just by clicking a button when the user starts using the service.
In any case, this "legal access" condition is crucial, because if the scope of copyright is defined so that non-expressive uses are exempted not just for literary works the users themselves have legitimate access to, but also for works any information processing service provider has access to, then the extent to which complimentary services relying on non-expressive copying of copyrighted works can facilitate users of literary works with non-expressive use based information processing services is obviously enlarged. More works could be text-mined to serve users irrespective of any legitimate access users have to those works. Combined with the differentiation on the first condition this would similarly result in four options, but ridden of the condition that the user needs to have legitimate access. It thereby also concerns the number of users which could be served by the complimentary business not based on the legitimate access to works the user has and thereby the question of whether or not the user has obtained an extra license becomes obsolete. The number of users could still be limited to research users and non-commercial users. The four options within a given country where the specific copyright law applies would then be either no users could be served based on the exemption, in the case where non-expressive uses are not exempted (technical access does not matter) or users with non-commercial research purpose only, users with research purpose only or all users depending on the type of exemption. In the group, the information processing service provider with technical access to the literary works (e.g. which are desired to be text-mined by the user) could offer nonexpressive use-based services (e.g. text mining services) to the respective user group without the users themselves having to obtain an extra license.
If non-expressive uses would be exempted from copyright in a given county and, additionally, technical access would be deemed "legal access" and sufficient as a condition to provide nonexpressive use based services then a search engine, document management service or abstracting and indexing service could offer those services without the requirement that the users of those services have legitimate access to the works accessed.
A search engine might offer a specific text mining service or semantic search service to noncommercial researchers when in the country where the search service operates a non-commercial research text mining copyright limitation is in place or a document management service might offer a text mining service to all researchers when in the country where the service operates a commercial research text mining copyright limitation is in place without need for licenses.
Considering the similarities between search technology and text mining technology it makes perfect sense that the Hathitrust library group's text mining services indeed fall under fair use as they are at least as transformative as search technology.
The libraries within the group have gained legitimate access to the literary works by acquiring it and would not need an extra license for extra copies required for a non-expressive use and neither do service providers facilitating the library in those non-expressive uses need to obtain extra licenses.
At the same time limiting the services to library patrons would still be more limited than offering such services to any researcher or any user -something Google could do at least for the millions of books digitised in its agreement with the Hathitrust if fair use would be found in the case Authors Guild vs.
Google. Google in this context can be seen as a non-expressive use based service provider, which in case the court case decides in favour of Google would be offering services beyond search such as text mining services just based on the technical access it has gained through its deal with the library group.
If the court found those (in this case commercial purpose) non-expressive use services to be fair then Google could and would be offering those services to any user without those users having to obtain any license, similar to a web search service. If the case is found to be fair then this practice would be legal not just for Google but for any non-expressive use based service provider that wants to offer such services to any users. There is not much reason why this should not also be the case for other ways of obtaining technical access to literary works and other types of literary works.
Similarly, it makes sense that in Japan text mining (or information analysis) as well as search uses are exempted from copyright and seemingly technical access is also legal access and thereby the only condition for services based on either type of use to offer the service. It would not be surprising if the providers of those non-expressive use based services would be the same companies in many cases because a service which has obtained access as part of the one service business automatically also has technical access for offering the other service business.
In the case of a non-expressive use copyright limitation if the access condition is defined in such a way then the providers of services based on non-expressive use are additionally less dependent on the copyright holders consent to offer their service. The non-expressive use based service market would be more open and decoupled from the literary works information goods market. More businesses and companies could offer such services to more users as complimentary businesses to the literary works market. Information externalities which copyright rationale intends to promote would increase and could be leveraged more broadly, presumably leading to an increase in public welfare.
Viewed the other way around, if more types of users are required to obtain a license for nonexpressive uses and the less non-expressive use based service businesses have to rely on their users' legitimate access to literary works (and instead could offer such services based on the technical access to literary works they have by other means), the more those services would depend on the copyright 71 holder and the less open such a service market would be. To the extent businesses want to offer nonexpressive use based services applied to literary works beyond the specified corpus of works a user has legitimate access to, they would have to obtain a license themselves from the copyright holder and to this extent would be more a downstream business than a complimentary business and would have to bear much higher transaction costs. That is, the non-expressive use based service itself would appear as a consumer in the information goods market and transact with many copyright holders property rights for large numbers of literary works. If a search service would have to obtain permissions for every single piece of content it indexes it would become prohibitively expensive to offer such a service. Similarly, other non-expressive use based service providers (and their customers) would have to bear such costs subtracting from the economic activities of the actual rendering of the service.
Transaction costs of this sort would not only stifle the take-off of the non-expressive use based service market, but also withhold development of the copy-reliant technology. Non-expressive uses over and across large corpora of literary works would entail prohibitive transaction costs and to copyright holders, depending on the number of works they hold copyright for, effectively monopoly power would be granted also for the information technology service market.
There is not much reason to require non-expressive use based services to obtain permissions, because they do not trade on the expressive aspect of the works and do not even use the expression as it is intended to be protected by copyright.
At least for the specific type of non-expressive use based service of providing text mining services the additional value created by the service should not be seen as deriving from the intellectually created expression of the literary works incentivised by copyright law. The value is derived from the text mining technology as an information technology leveraging additional information externalities including information network externalities through automation of information processing and integrated cross corpora non-expressive use of literary works.
Beyond this, research text mining in particular can be assumed to leverage additional positive information externalities well beyond the average level of additional positive information externalities which could be leveraged by non-expressive uses due to the nature of what people with research purpose use work for and what they do with the benefits gained from using them. Research text mining is a type of research and research is widely acknowledged as an essential driver of public welfare gains and is funded in many other ways. Exempting such uses from copyright would just mean another way of funding research by tuning down the intangible property rights trade advantage 'subsidy' for the copyright holders and tuning up the freedom for researchers to use technology for their research to increase research productivity.
Because the additional value is derived from the technology anyways rather than the expression contained in the literary works protected by copyright and because the above-average level of positive information externalities and value generated from this due to the purposes of research activities themselves, research text mining should be seen as a pure complementary market to begin with. A situation where a research text mining service provider would have to obtain permission from a copyright holder (as well as the requirement that a user who has legitimate access already would have to obtain an extra license) should rather be seen as the exception. Therefore, research text mining markets are treated as complementary markets in this article.
New technologies require adjustments to the scope of copyright. If the new technology enables more value gain from using literary works through additional positive externalities deriving from the works, the weight scale shifts toward the user side in the public welfare analysis and the copyright scope needs to be decreased. For the emergence of text mining technology, the balance can be re-established by specifically or as part of a broader non-expressive use copyright limitation exempting research text mining from copyright and opening up the market for research text mining services. In this way public welfare can be maximised. This argumentation is also plausible when explicitly considering the competition aspect of copyright.

Copyright competition objectives and non-expressive use
Government policy objectives are seeking to ensure competition in markets as a means to the end of optimal allocation of resources to achieve maximum public welfare through these markets. Those competition objectives are deeply interlaced with copyright objectives as described in the static and dynamic dimensions of copyright rationale.
The dynamic efficiency of the literary works market describes the production of more new works through incentive for the producers of those works in form of property rights. Copyright rationale assumes that the dynamic efficiency outweighs the static inefficiency and costs caused by the same property rights. While this static-dynamic view focuses on positive and negative effects of copyright law on public welfare in general, copyright laws' competition objectives additionally play in as they concern the market structure and power relations between the literary works information goods market, downstream markets thereof and complementary markets.
Research text mining services as a complementary market to the literary works market depends on the components text mining technology, access to literary works and for copyrighted material also on legal conditions concerning the application of the technologies to the works.
In practice, since text mining uses are often deemed to not fall under any existing copyright limitation, the literary works information goods market and the text mining service market are partly coupled through copyright laws. Therefore, a competitive advantage of a text mining services provider can derive not only from the technology component -which in any case would be reasonable for a technology service -but also from the access component and the legal conditions component.
Changing the scope of copyright and copyright's ancillary legal regimes and its definition to which extent businesses with technical access to literary works can offer text mining services determines the level of competitive advantage of some players in the research text mining market as well as disadvantages for other players. Therefore, copyright laws' and its ancillary regimes' determination of market structure and level of competition in the text mining service market is another aspect of how those laws affect public welfare.
Primarily they affect public welfare depending on the different types of text mining exemptions which are or could be set in place as options besides the option to not exempt text mining uses from copyright at all. They affect public welfare by the relationship between anti-circumvention law and contract law to the copyright limitations. Public welfare is also affected by how copyright and its ancillaries determine the type and number of businesses likely to become active in the text mining service market just by defining whether or not technical access is sufficient to render the service under a copyright exemption, i.e. whether technical access falls within what is defined as "legal access". This in turn heavily affects the number of users for which a text mining service could be rendered without copyright holder permission and thus the text mining service market is more open.
If only research text mining uses or only non-commercial research text mining uses are exempted then the other respective types of uses not exempted are still coupled to the copyright-based literary works market. The more types of uses are exempted and decoupled the more users could be easier served under the exemption and the text mining service market would be larger and more competitive.
If technical access counts as "legal access" then more types of businesses and more businesses can serve more text mining users just based on technical access they have from other business relationships but without requiring the text mining user to have legitimate access to the works in question.
Here based on the legal access condition the text mining service market is more or less decoupled from the literary works market. If technical access is a sufficient condition then yet more users could be served and more businesses and business types could enter the game, and the text mining service market would be yet larger and more competitive.
The legal access condition lies squarely to the 'purpose and type of user' condition, because even if technical access suffices to offer the service under the exemption it could still be rendered to the respective type of user as defined with the exemption. For example, services just based on technical access the service provider has could still only be rendered to non-commercial research users if only non-commercial research text mining is exempted.
Making the text mining service market more competitive and larger, i.e. opening it up to a larger extent by decoupling it from the literary works market through less restrictive copyright, ancillary rights and access conditions, would not only increase business activities in the text mining service market, resulting in better and cheaper text mining services and more choice for text mining users, but would also increase the information externalities from literary works feeding into the production processes in downstream markets and information externalities feeding out of the market into the common pool of knowledge, increasing overall productivity and public welfare.
Due to the importance for text mining services of scale and comprehensiveness of corpora of literary works which can be text-mined and its inherent feature of integrating literary works and enabling integrated uses of such works additional information network externalities would arise when the text mining service market is more open, to the additional benefit of downstream market users as well as to the additionally increased social benefit on a national level. When more users can draw more insight from larger corpora of literature using text mining technology due to less requirement to obtain permission from the copyright holder the potential of increased public welfare from the technology and integration of works can be realized to a larger extent than would be possible with more requirements to obtain permission.
The market structure of the interrelated text mining services market and the copyright works market and the power relation between the (potentially) involved market players, including players like search engine service providers, determines the overall outcome of those markets and their contribution to public welfare. It also determines contributions to public welfare from downstream markets and any producer in a market economy even without transacting with any market through benefits of nonpecuniary externalities from literary works.
How market structure and power relations shape out depending on copyright law has implications on the public welfare objectives based appraisal of an intervention concerning copyright applicability to text mining uses.
Both the definition of copyright scope and the definition of what counts as "legal access", in different ways concern the weighing of changes in output in the research text mining service market against other effects of such a government intervention.
For assessing the impact of changes to copyright with regard to text mining uses viewed from a competition and market structure perspective, the two conditions can be conceptualised with a model of vertically layered markets and neutrality rules as competition ensuring policy measure. With this model the market structure and competition aspect of copyright market intervention appraisal can be compared with economic evidence from other vertically layered markets.
While the definition of copyright scope determines to which extent users can apply text mining technology to literary works irrespective of additional licenses for such uses, i.e. in a neutral way, the legal access condition beyond this determines to which extent text mining service providers can offer their service to users irrespective of what licenses and access the user has and irrespective of licenses it has directly with the copyright holder.
The legal access condition also represents an overlay of text mining service market power and market power in other markets where those (potential) text mining service providers gain technical access. For example, the market power a large search engine service has in the search market overlaps its potential power in the text mining market. This not only implies that anti-competitive market structure in the form of excess market power, in horizontal or vertical power relations, by individual players in the other respective service markets could be inherited by the text mining market. This could potentially cause anticompetitive market structures in the text mining market, but also creates intricacies of triangular power relations between the text mining market, the other service market and the literary works information goods market implying a coopetition power relationship between the other service market player and the literary works market player.
For example, a publishing house could depend on web traffic from search engines for its business of selling pieces of copyrighted information goods to as many customers as possible (cooperation) and at the same time sees itself faced with the search service provider as an unwanted text mining service provider (competition). The latter may be unwanted because text mining services may not be seen as driving traffic to the publisher's website but rather representing substitutions to the pieces and collections of pieces of copyrighted literary works the publisher is selling or it is unwanted when the publisher wants to diversify into the text mining market itself hoping to leverage existing access and customer relationship assets.
It is in this sense also that the definition of a market and of a business defined by the type of good produced and business models is not a template for companies to follow. Diversification is a common practice and growth of a market can also mean a company operating in more than one market is growing its business portfolio in that market.
Market power overlap between the text mining service market and another service market where technical access is gained could be seen in the supposed excess market power Google has in the search service market. The question whether or not Google has excessive market power in the search market has been assessed by the antitrust authorities in the US and it was found that such excess market power does not exist. In the EU a similar assessment is pending.
In any case, the consideration of the legal access condition in a public welfare analysis concerning a contemplated exemption of text mining uses or other specific non-expressive use copyright limitations would require assessment of those other service markets and exceeds the scope of this article.
Nevertheless, the conditions copyright sets on the power relationship between the copyright-reliant literary works information goods market and the complementary text mining service market can be seen in comments on copyright law's impact on "secondary" markets. For example, Geiger et al. (2008) in reply to a Green Paper issued by the European Commission on copyright reform in the EU, with reference to copyright limitations, note: "The Three-Step Test should not be applied in a manner that safeguards anti-competitive practices or impedes the establishment of a harmonious balance between the legitimate interests of right holders, on the one hand, and competition (especially competition in secondary markets) on the other ... The Three-Step Test should be interpreted in a manner that respects the legitimate interests of third parties, including ... interests in competition, notably on secondary markets; and other public interests, notably in scientific progress and cultural, social, or economic development" .
In response to the same Green Paper, Hilty et al. (2008) propose to separate copyright protection for producers of original expression and a range of services building on top of the primary publication in order to alleviate the negative effects of exclusivity of rights granted by copyright law on the service level: "[T]he intention is to allow competing content providers (e.g. secondary publishers, database providers, indexes, archives, information brokers, etc.) to enter into competition with the original publisher with respect to the same content, but with differently prepared and refined products (e.g. layout of the document, machine-readability, file size, etc.) or services (citation linking, data extraction, information broking, etc.). As a result, the specific added value should be invested by any disseminator independently, which leads to freedom of choice on the side of the user with regard to which kind of refinement he wants to pay for" ).
This freedom of choice represents the competition on the service level and the independence of the "disseminator", such as a data extraction service, represents the decoupling of this type of service market from the copyright-based literary works market.
The economic model of vertically layered markets, vertical integration (i.e. coupling) and neutrality rules (sometimes also referred to as separation or open access rules) can be applied to copyright law, copyright limitations and "legal access" as a conditions for text mining service provision.
The verticality of the layers are represented by the fact that the technology is applied to the copyrighted work by a user or by a technology service on its behalf, subsequently or building "on top" of the copyrighted content layer. A neutrality rule separates (or decouples) the one layer from the other while the public welfare objective is to improve competition in the vertically layered markets and forestall abuse of power of a player in one layer against a player in the other layer based on the vertical dependencies which occur automatically due to technical reasons. The neutrality rule ensures that a market player which is depended on in one layer does not extend its power into other market layers using the technical dependencies in anticompetitive ways.
For example, network neutrality rules forestall that hardware infrastructure market players discriminate service market players operating on top of the infrastructure and in this sense technically depend on the infrastructure. The network is made "neutral," a "dumb pipe," to prevent the network provider extending its market power into the service markets and thereby pushing the 'on-top' service providers out of the service market. Such vertical integration is private welfare enhancing for the network provider, but decreases competition and innovation in the service markets and with this decreases overall public welfare. As in this example, often the underlying layer is an infrastructure and less vertical integration also represents increased non-pecuniary infrastructure externalities through more open access to the infrastructure. Decreased competition and non-pecuniary externalities lead to less innovation and technology development and are detrimental to public welfare aims overall.
A similar analogy could be drawn with the famous antitrust case where Microsoft had to unbundle (or decouple) its Windows operating system from its Internet Explorer 'on-top' service. The competition objective here was similar: because market power in the underlying market for operating systems (additionally through horizontal market power in the form of large market share) was abused to gain market power in the browser market, the government intervened to unbundle the operating system and browser and in this sense uncouple the operating system market from the browser market.
The rationale was also similar in that the decreased competition in the browser market and decreased real infrastructure externalities led to less choice for browser consumers, less innovation and technology development in that market and overall less public welfare contribution from both markets overall.
The Microsoft example is similar in principle. Neutrality rules are not usually associated with individual firms or individual antitrust cases though, but apply wholesale to the respective markets.
Although not triggered by anticompetitive behaviour, open data policies and the public sector production of satellite navigation data can be seen as a layered model where the on-top layer services' economic activity is economic justification to, in a sense, open up the underlying market. There have been service providers offering services based on public sector data before open data policies existed, but it was hard to get hold of the data, which was not publicly available, and was not standardised.
Open data policies (i.e. public sector information policies) remove exactly those two barriers: they make public sector information publicly available so that it can be used by any service provider who has a good idea what to do with it and it mandates certain standardised and machine-readable formats to make it even easier for the service layer market to flourish. Both measures can be seen as aspects of the same 'neutrality' rationale. Evidence also shows that this rationale pays off. Open data has led to increased economic activity (as well as, in a sense, more competition), innovation and technology development in the service markets which have evolved on top of the underlying public sector information. This is possible through leveraging the potential to increase information externalities of data which was to a large extent not easily accessible and not integrated before. The new services' economic activities themselves contribute to public welfare, but beyond this they have evolved to make producers in a broad range of other industries more productive and thereby increases public welfare in this way.
Satellite navigation data has also spurred a massive service industry based on the data which is often produced by the public sector and made available. How the data is used is not prejudiced by the data producer and any service can compete in the service market based on the same data. Other examples with a similar rationale and layered model are scientific data, weather information and maps as the underlying layers with service markets on top.
Open data, satellite navigation data and other types of previously mentioned information show the same strong correlation between less restrictions on or otherwise improved conditions for access to underlying data and increased economic activity in the service layer on top. They also show productivity gains for a broad range of producers who benefit more from the data and services, resulting in aggregate in increased public welfare. The openness of the data layer represents nonexclusiveness and through that, increased positive externalities from the data, which is similar to the 78 neutrality rules which exist for the hardware network infrastructure underlying a plethora of Internet based services.
The layered market and neutrality rule principle can also be applied to literary works, copyright and copy-reliant technology services. Copyright's limited scope or a copyright limitation would make a certain part of the work's subject matter or a certain use non-exclusive and open, leading to intended information externalities deriving from literary works. Service businesses can drive, building additional economic activity by facilitating users of exempted subject matter and uses not protected by copyright.
Specifically for text mining uses, the literary works content layer can be seen as an infrastructure layer and the intervention of exempting text mining uses from copyright overruling anti-circumvention and contract law and defining technical access as "legal access" represents a neutrality rule. The fact that users, irrespective of an extra text mining license, can indiscriminately and non-exclusively apply text mining technology and the fact that (potential) text mining service providers can render text mining services to users just based on technical access without transacting property rights increases competition among text mining service providers and increases positive information externalities as well as information network externalities. Without necessarily diminishing the value of the information goods market those effects, in turn, promote the development of text mining technology as well as more innovation within downstream markets, within the complementary text mining markets as well as through real externalities in any area and industry where people and firms benefit from more knowledge.
In sum, a text mining copyright limitation and the definition of technical access as "legal access" also viewed from a competition perspective and as a neutrality rule can be assumed to increase public welfare.
Confirmation of this assumption and estimation of its magnitude, again, could be detected in the market data, particularly in the copyright-based/copyright-dependent literary works information goods market and in the complementary text mining service market. It could also be detected in the market data of a broad range of downstream markets including firms in research-intensive industries such as pharmaceutical and biotech, but also in other industries which can benefit from more choice in using text mining technologies or engaging text mining service provider to leverage information externalities and productivity growth potential.
The benefits from a text mining copyright limitation and definition of technical access as legal access, here presented as a neutrality rule, would also materialise in the form of increased non-pecuniary externalities and could also be detected beyond the market data in the overall productivity growth on a national level. However, the focus of such rules is on competition and technology development in the service layer market. Promising technologies depending on another market layer cannot drive and therefor potential productivity gains based on the technology cannot be realized without a neutrality rule as they could with a neutrality rule. For text mining technology and research text mining uses this 79 means primarily that text mining technology cannot be used and further developed to tab the potential of increased productivity from large scale integration of text works and leveraging information externalities as it could be tabbed with a neutrality rule in place. Increased productivity could be achieved within production of new scientific publications and other information goods as well as the production of other research outputs and goods such as a new drug, wearable photovoltaic, a history exhibition, etc.

Text mining exemption public welfare effects
Taking all the insights from previous sections together, public welfare can be seen as a function of the level of control copyright laws and their ancillary legal regimes give to copyright holders over literary works or vice versa as a function of the level of user rights to non-exclusively use literary works in certain ways.
Literary works are a source of significant information externalities which can be seen as a subset of knowledge spillovers or as intangible infrastructure externalities. In any case, externalities are predominantly positive. These positive externalities are mainly non-pecuniary and have no predestined fixed total value when adding to a society's total output and public welfare. In the context of text mining at least they are enhanced by information network externalities.
Positive externalities from research uses of literary works increase public welfare more than the average of all uses of all creative works or all information goods. In the context of research text mining this is reflected by the fact that a large proportion of the works text-mined is scientific literature. This has implications on how the copyright balance applies for this type of literature since for most scientific literature such as journal articles the incentive is not to gain rights to exclude others from using the works. Also, a large proportion of the research of which the scientific literature is one result is funded by public funds. Those facts are the base for argumentation for exempting research text mining from copyright such as the argument that such exemption would increase the return of public investment and that the overall contribution and risk for those works are borne mainly by the public funds rather than publishers to which copyright is transferred. The latter argument is based on equity objectives in public welfare assessments. However, those arguments are specific to scientific literature and on the whole are only partly relevant because copyright applies to all literature and research text mining can be applied on all types of copyrighted literature. Nonetheless, they support the argument that positive externalities from research text mining uses of literary works increase public welfare more than other text mining uses.
In any case, copyright law and its ancillaries determine the level of control by copyright holders over literary works and thereby the level of competition in non-expressive use based service markets. They also determine the overall level of how information externalities serve as input in production processes of downstream markets, including individual active 'consumers' as producers. The laws also determine to which extent beyond the market transaction (pecuniary) based wealth transfers value can 80 (rather diffusely) leak out into the common pool of knowledge and from there feed back into production processes of potentially any producer without market transaction (non-pecuniary).
For text mining uses of literary works as non-expressive uses, the enhancement of information externalities through network externalities is particularly significant due to the contribution of text mining technology to highly integrated infrastructure-type cross corpora uses of literary works.
For copyright change concerning text mining uses, public welfare of a given country can be seen as a function of copyright limitations applicable to text mining, of how ancillary regimes relate to (e.g. not overrule) those limitations and the definition of "legal access" as further determining the level of competition in the text mining services market.
Public welfare impacts can be detected in market data of the intervention relevant markets as well as in national level productivity growth which, as it is derived through non-pecuniary externalities, cannot be measured directly and is estimated with regression analysis. The intervention relevant markets are the primarily affected literary works information goods market (copyright holders), markets downstream of the literary works market as buyers of rights on literary works (appropriating private value beyond the producer surplus in the primarily affected market) and the text mining service market mainly as a complementary market. Beyond the changes in market price-based data, the impact of a change to copyright can be detected indirectly through regression based methods in the aggregated productivity growth of all producers in the given country.
To illustrate the impact which a change in the legal treatment of research text mining would have on public welfare, three different states of society can be compared, which do not differentiate legal tools (including the ancillary regimes) to exempt uses of literary works and neglect the market power overlay effects from other non-expressive use based service markets. The three societal states are state of society A, where research text mining is not exempted; state of society B, where research text mining is exempted for non-commercial uses and technical access is considered a form of legal access; and state C with the same conditions but all research text mining uses are exempted.
Several other options could be considered as well, such as extending exempted uses beyond research uses or to all non-expressive uses -which would all concern the legal treatment of text mining.
Options where technical access is not sufficient to qualify as legal access for an exception could also be assessed. The purpose of the exercise, however, is just to bring across the idea of how public welfare in this context can be assessed and to show that exempting all research text mining uses, as opposed to just non-commercial research text mining uses or no text mining uses at all, would achieve a higher level of public welfare.
It is also worth noting that while the opportunity costs of not being able to apply text mining to copyrighted works without permission of the rights holder apply to all copyrighted works because copyright is granted by default to any original creative expression, the incentive theory does not apply by default, i.e. not all works are created incentivised by the exclusivity conferred to the literary works by copyright. For example, scientific publications are often not incentivised in this way. Hence, when comparing the benefits and (opportunity) costs of a particular copyright law in place -the opportunity costs representing the public welfare which could be achieved with the respective alternative option after government intervention -the creation of works not incentivised by copyright and benefits deriving from those works to a large extent cannot be counted as a benefit.
Option A is the state of society as it would be the case in most countries in the world today if copyright with regard to text mining uses would not be changed. The producer surplus in the information market would increase since text mining is deemed to require an additional license.
Therefore, an additional licensing market for text mining would emerge, enabling rights holders to generate additional revenues from selling text mining rights on literary works. Those rights would also be sold to researchers and research institutes who want to apply text mining technology to literary works. Those rights would be in addition to the access rights or ownership of the works which a user would have to acquire as well if she has not procured them yet. In practice for non-commercial research text mining such additional rights would need to be obtained. Often such rights are granted free of charge, but users depend on the benevolence of the rights holder and there is no guarantee that this practice will be continued after text mining technology is adopted en masse. Users must justify uses, provide information and in one way or the other, invest more time and effort to get to the point that they can start using the technology. This extra time and effort adds to the large transaction costs users are confronted with due to the fact that text mining is usually applied across large corpora of literary works for which rights are held by large numbers of rights holders and large numbers of permissions need to be obtained. This transaction cost problem is one of the most cited arguments for exempting text mining from copyright.
The transaction cost problem for text mining uses as for all non-expressive uses is acute and possibly insoluble without exempting such uses from copyright. The problem exists because of the sheer number of rights holders that would need to be approached, reluctance by rights holders to grant rights and the fact that a license can not be obtained for all copyrighted works. It also represents a high transactions cost incurred by the user for works which were not created incentivised by copyright. The scale of the problem is staggering: "The subject of data analytics will be varied ranging from publishers' journals, grey literature through to the internet. Given the tens if not hundreds of thousands of publishers that exist globally through to the approximate 645,000,000 websites (source: Netcraft) it is simply not practical to negotiate access to this material on a case by case basis in order to mine it for facts" (BL 2012).
Even if a text mining use is limited to scholarly articles, the problem would still be massive as there are more than 50 million scholarly articles published to date (Jinha 2010) and 1.6 to two million articles are added each year with an annual growth rate of about 4%. There are 90,000 different publishers in 215 different countries and more than 336,000 periodicals listed in Ulrich's Periodicals Directory with 25,000 of those being STM journal titles.

82
The costs for tracking down copyright holders at such a scale and clearing, or possibly negotiating, permissions with them would be prohibitive. While text mining just a few thousand books or articles can make sense in particular contexts it still often requires a reasonable level of comprehensiveness of the body of works mined, e.g. works in a specific domain, to unfold its value. The literary works of interest for a scientist using text-mining are often scattered across tens and hundreds of thousands of sources with hundreds or thousands of copyright holders.
There is duplicitous information available for the second aspect of high transaction costs, the reluctance of rights holders to grant rights. Some reports say among scientific publishers well beyond 50% of text mining requests are answered positively while others have shown that this is not the case.
Additionally, rights are often negotiated on a case by case basis. Both, uncertainty and time and staff for negotiations adds to the transaction costs. Collective licensing solutions are being developed, which would lower the negotiation and requesting costs, but this will only be a solution for works where rights holders have chosen to participate in the licensing solution, which will necessarily not cover all works needed for at least some types of text mining. Collective licensing on an international scale is an immensely complex process in itself.
Another transaction cost issue is that not all content can be licensed. In some areas more than 50% of all works are orphan works where no rights holder can be identified (e.g. it has also be contended that web content poses legal risks). Kelly et al. (2013), in the context of the EU level copyright reform process, note: "we are extremely unclear who -other than perhaps the government as was concluded in Japan -is in a position to "grant permission" for the mining of the open web and would be most grateful for clarification on this crucial point." Further adding to the complexity of the process of obtaining rights are differences in copyright laws among different jurisdictions and jointly held legal rights. Treating text mining uses as property rights would also entail that any such use must not inadvertently mine texts for which permission is not obtained.
There is no doubt that the emergence of text mining technology has brought additional potential value deriving from use of literary works. This potential can only be realized to a small extent, without exempting text mining uses. The prohibitive transaction costs often forestall the text mining use right away -researchers have limited resources and would rather choose another type of research which is less risky and time consuming. Then no additional information externalities can feed into the production process of downstream markets and productivity cannot at all be increased for those producers. In other cases, the transaction costs subtract significantly from the value the user gains from using literary works for research text mining. While it may be easy to extend existing access licenses to cover text mining it is still complex and laborious to do this across large numbers of rights holders even with standard licenses and collective licensing schemes. In both cased the licensing market is still inefficient.
There are significant untapped opportunities to leverage additional positive externalities through use of text mining on the side of the (potential) users of text mining technologies on literary works. There are less users who could tab the value and there are less works for which it could be tabbed. The added value is diminished by transaction costs for the few who would tab it for a comparably small portion of the overall number of literary works available. Additional non-pecuniary externalities are barely realized, preventing feedback of additional 'costless' value into production processes of individuals and firms without market transactions across the society.
Text mining service providers could serve much fewer text mining users and if they wanted to offer services beyond the few users which afford an extra text mining license they would have to obtain a license themselves. To this extent, they would be rather a downstream market dependent on the rights holder and it is inconceivable that a service provider would in fact enter into this role under those conditions. If copyright would apply to text mining uses then the text mining service market would be dominated by businesses which hold copyrights on large corpora of literary works. Therefore, the text mining service market would be coupled to the literary works information goods market and show a low level of competition and comparably little economic activity. Also, the development of the technology would be held back as fewer (potential) text mining service providers would invest in the technology.
The missed opportunity to increase value through text mining technology weighs significantly against the gained producer surplus from text mining rights transactions. The value of the combined consumer and producer surplus which contributes to public welfare is lower than it would be with the exemption of text mining uses. Additionally, the lower economic activity in the text mining service market would offset the producer surplus gain in the literary works market.
Text mining rights are an attempt to internalise value into market transactions which derives mainly from technology uses across large numbers of works for which a user would need to transact. It represents the attempt to capture the scattered 'bits' of value deriving from cross corpora integrated uses within a property rights scheme where it can barely be identified. Information network externalities characterise the extent to which a particular work has contributed to the overall value.
Text mining licenses represent a system where a property rights scheme is thought to be able to maximise the overall value from text mining uses of literary works just by providing a framework of reallocating those rights among market participants. It tries to solve the inherent transaction cost problem within the property right logic.
In state B this logic is abandoned for non-commercial research uses. The transaction costs problem is largely solved for this type of use (no permission requirement, negotiation, justification or fee). The system of property rights allocates rights to begin with where the value is created -on the user side.
For non-commercial research uses the potential of additional value from information externalities through cross-corpora integrated text mining uses could be tabbed and users benefit largely from this.
Those benefits are reflected in increased productivity of non-commercial producers such as universities. User organisations such as research intensive commercial companies could mostly not benefit from the exemption.
Non-commercial research uses would free up non-pecuniary externalities leaking out into a knowledge pool to the benefit of all producers in a society, but it would still be limited to leaking from noncommercial uses. For some rights holders, less text mining rights would be possibly diminishing revenues and perhaps some additional infringement monitoring and enforcement costs could arise. The diminishing of copyright holders revenues through a decrease of copyright scope cannot be taken as a given. A study on how the introduction of fair use in Singapore has affected the content industry as well as other industries has shown that gains in the other industries are not accompanied by losses of the content industry -it appears as an additional value (Ghafele & Gibert 2012).
In any case, in state B text mining services could serve non-commercial research text mining users irrespective of rights holder permission. The non-commercial research text mining market would be uncoupled from the literary works market leading to more competition and technology development for this part of the research text mining market. To this extent, text mining services operate as true complementary markets.
In state C, the same effects would also occur for other research text mining uses including uses with commercial purposes. The property rights scheme is abandoned for all research text mining uses, solving the transaction cost problem for all those uses by allocating rights from the beginning to the user side where the additional value from text mining literary works is created. The potential of additional value from information externalities through cross-corpora integrated text mining uses could be tabbed and all research users could benefit from this. Any organisation or producer could benefit more from transacted information goods as well as through non-pecuniary externalities feedback.
For the rights holder, text mining rights could not be sold to researchers anymore leading to revenue loss and possibly additional infringement monitoring and enforcement costs. Text mining services could serve all research text mining users irrespective of rights holder permission. The research text mining market as a whole would be uncoupled from the literary works market leading to more competition, efficiency and technology development in the research text mining market as a whole. To a larger extent then in state B, text mining services would operate as true complementary markets. The gained benefits for users and society as a whole and additional economic activity in the text mining service markets in states B and C would offset the decreased licence revenues and additional costs for rights holders. Overall, public welfare would be increased compared to the society in state A. The public welfare increase is in higher gear in state C.
Favouring state C is also encouraged due to the fact that a distinction between non-commercial and commercial research in many cases is impossible and detrimental to the intended productivity gains.
Researchers in both academia and industry rely on the same information. Non-commercial and commercial are often indistinguishable, such as in cases of university-industry cooperation. Treating them separately would again result in duplicated time, effort and expenses. Disallowing downstream commercial use also complicates further use of results, including publications. Researchers would face 85 additional risk and costs from possible commercial reuse of their results. A non-commercial research text mining exemption would also be limited in its impact because "most research is conducted by commercial entities such as pharmaceutical companies" (Murray-Rust et al. 2012).
In state B, as well as in state C, the exemption represents a transfer of wealth in form of rights from the producer side to the user side, but in both cases the loss of the rights holder is a pecuniary externality to the user side while the gain on the user side is a non-pecuniary externality for the rights holder. The gains on the user side represent increased and multiplied positive information externalities including network externalities which lead to an overall increase of public welfare even if rights holders and the literary works information goods market may lose in value.
This indicates that exempting research text mining uses (including commercial uses) and defining technical access as "legal access" would lead to a state of society with the highest level of public welfare. This would be true even if the overall effects of exempting research text mining uses from copyright in this way would be less favourable due to the priority of public welfare on the national level and the competitive advantage of the national economy against other nations. The British Library in the United Kingdom summarises: "the Japanese government introduced a limitation and exception for data and text mining in 2000, and US technology companies assert fair use under the US Copyright Act for this activity. We are therefore in a position of competitive disadvantage where Americans and Japanese can data mine UK material that we cannot. We also note that the US government is actually funding significant work in the area of big data and text mining, which no doubt will have potential spin-off benefits for US industry." (BL 2012) A government intervention exempting all research text mining uses from copyright and defining legal access to completely opening up the research text mining market would be the rational and justified response for the given country. This would reflect the technological developments and increased value potential text mining technologies represent. It also represents increased possibilities to leverage knowledge spillovers within a country and spillovers feeding into the national economy from many other nations. At the same time, the decreased private benefits of rights holders may lead to decreased export revenue from the literary works market, but the additional benefits from internal and crossborder spillovers might outweigh the lost export revenue. At the same time the complementary text mining service market drives with such an exemption, not only increasing economic activity from users within the country, but also exporting services. Overall, also from an international trade and spillover perspective, an intervention exempting all research text mining uses and a research text mining market decoupled from the literary works market would be the most appropriate government intervention. It would ensure copyright law serves its public welfare purpose by rebalancing the social benefits and social (opportunity) costs it causes by making text mining uses of literary works excludable. It would also represent a net effect of investment in knowledge, increased innovation and productivity growth on a national level.

86
This result is also confirmed by further evidence and plenty of indicative examples and arguments expressed in copyright reform consultations specifically addressing text mining uses in the UK, Ireland, Australia and on an EU level as well as in proceedings of text mining related court cases in the US. This result is also confirmed by responses from governments and legislative proposals in the UK, Ireland, Australia and the court decision in the Hathitrust case in the US as well as the legal treatment of non-expressive technologies similar to text mining in general, such as search technologies.
The hierarchical structure of the EU copyright system obliging its member states to stay with their copyright limitations within the outer boundaries of the limitations permitted under EU law leads to the conclusion that when on national level in an EU member state public welfare analysis finds that all research text mining uses should be exempted as opposed to only non-commercial research text mining uses this member state would have to wait or push for change on EU level. The ongoing EU copyright reform process therefore is of imminent importance for all its member states' and their options to choose the way how they want to treat research text mining under copyright law for their national interest.
This article illustrates that an intervention appraisal concerning research text mining use treatment under copyright law with the overall goal of maximising public welfare on the national level can be applied also to the supranational level for the economic area of the EU. Crucial for EU-level copyright law is not only which type of copyright limitation is chosen and how legal access is defined and how ancillary laws relate to copyright law, but also whether EU copyright law merely widens the outer boundaries to give more options to EU member states or introduces some form of mandatory copyright limitation applicable to research text mining.

Conclusion
The conclusion of the argumentations in this article is that on EU level a mandatory copyright limitation for all forms of research/scientific-purpose text mining should be adopted, ancillary laws should not be permitted to override such a limitation and technical access should be a sufficient condition to exercise this copyright limitation.
Such a limitation means that users in pursuit of science can use computers to process and analyse copyrighted text works without copyright holders' permission and permission-related transaction costs. To this end text mining based information services can be rendered in a country where text mining is exempted from copyright in this way with copyright-related legal risks eliminated. To the extent text mining uses are exempted and technical access to copyrighted works is a sufficient condition to exercise the copyright limitation the text mining service market is decoupled from the copyright-based literary works information goods market.
This leads to a more open text mining services market resulting in more competition and economic activity in this market and consequent stimulation of text mining technology development in that 87 country. Those services can be exported and in any case increase the overall contribution to national level output just from this market. Beyond this, those services enable information goods downstream markets as well as any production activity irrespective of market transaction with the literary works information goods market to better leverage information externalities for productivity growth. If the downstream market or other production activity resides in the country in question this productivity growth directly accumulates to national level productivity growth.
If the downstream market or other production activity resides in another country the productivity growth there would still benefit the country's public welfare, because the ability to increase productivity through text mining services would be reflected in the prices paid for the text mining services exported to the other country and because increased information externalities arising from text mining use in other countries feed back into the national economic system as knowledge spilloversthe former showing up in the service goods trade balance as increased national product and the latter showing up as increased national-level productivity from unaccounted sources from abroad.
Both, increased economic activity in the text mining market through services rendered domestically or for users abroad and increased productivity through increased text-mining-enabled intra-firm, intraindustry and intra-national information spillovers as well as international information spillovers, represent an increase of public welfare in the country where text mining is exempted from copyright.
Text mining technologies are tools with the potential to leverage knowledge spillovers from literary works to increase productivity and economic output as well as social benefits on national level and beyond and without trading on the underlying expression of literary works protected by copyright law.
What for knowledge spillovers in general is true is also true for information spillovers where the conveyer and constituent of the spillover itself is a public good and where value is gained from large scale integrated non-expressive uses of cumulative literature.
The potential to increase public welfare deriving from text mining technologies is potential has little upward limits due to the public good character of literary works and the network character of natural language text. This potential can be realized only to a small extent by treating text mining uses on literary works -individual works or collections of works which are necessarily small in comparison to the total amount of text -as intellectual property transacted in a text mining license or otherwise nonexpressive use licence market. To realize the full potential research text mining uses should be exempted from copyright, contracts and anti-circumvention laws should not overrule such an exemption and technical access to literary works should be a sufficient condition to exercise the text mining exemption. The potential to increase productivity, economic output and social benefits by exempting text mining uses in this way is realized to a lesser extent when only non-commercial research text mining is exempted compared to exempting all research text mining uses.
Text mining technology thereby plays a crucial role in increasingly leveraging positive information externalities and tabs the potential of the "costless" part of endogenous growth.

88
The findings of this article presented mainly for national level public welfare analysis can be applied to EU level: A mandatory copyright limitation exempting all research/scientific-purpose text mining uses including uses with commercial purpose from copyright, combined with the requirements that anticircumvention law and contract law must not prevent executing such a limitation and that technical access to literary works is a sufficient condition to exercise the limitation, should be implemented in EU copyright law as a means to implementation of such rules in EU member states' copyright laws, because this is the best way to maximise the public welfare of the EU.