The contribution of interregional and inter-field knowledge spillovers to regional Smart Specialisation

ABSTRACT Industrial policy based on Smart Specialisation emphasizes the exploitation of industrial linkages based on technological rather than intermediate product linkages. This paper develops microlevel analysis using the Organisation for Economic Co-operation and Development’s (OECD) patent citation database on the intensity of technical relatedness depending on the cited and citing technological fields. The results are used to estimate the patterns of interregional knowledge spillovers in the European Union and explain spatial differences in patent growth. The approach to identifying spillovers provides an improved toolbox to guide implementation of Smart Specialisation policies.


INTRODUCTION
Traditional industrial policies develop firm clusters based mainly on intermediate product linkages. When applied to high-tech fields, where linkages in knowledge are more important than input-output linkages, traditional policies face challenges. The problem is that the products of high-tech companies are often intellectual rather than physical. Consequently, they do not generate flows of intermediate product linkages. Cooperation in innovative activities is often in the form of knowledge and idea exchanges, which can occur across industries without physical input-output linkages. Simply agglomerating upstream and downstream companies following the conventional wisdom of traditional industrial policies can be ineffective for high-tech industries. Therefore, in the European Union (EU), Foray (2014) proposed a 'Smart Specialisation Strategy' (S3), an innovation-policy concept geared towards encouraging regional economies to build competitive advantage in new technological fields on the basis of local strengths. The S3 has become the central strategy of the EU Cohesion Policy, 1 which funds the Smart Specialisation of EU member states and regions (Foray, 2018). The implementation problem for S3 is: How should regions determine which S3 domains to support? Studies of S3 emphasize knowledge relatedness. The S3 domains that should be determined in each region are not made up of several industries (categorized by major industry/product classifications: HS, ISIC and SITC), 2 but technology fields (categorized by major patent classifications: IPC, USPC and CPC) 3 (D' Adda et al., 2020). In terms of technological upstream-downstream linkages, these S3 domains should be related to existing local advantages.
Because knowledge is not observable, there is no agreement on the measure of relatedness between tech classes. A prevailing approach is to identify co-occurrence. For instance, the probability of 'two IPC classes appearing on the same patent' (Balland et al., 2019;Kogler et al., 2017;Vlčková et al., 2018), or 'the probability that a region is specialized in a specific pair of IPC classes' (D'Adda et al., 2020;Santoalha, 2019). The shortcoming of a cooccurrence measure is the lack of a causal relation in which the upstream technology serves as one of the reasons/preconditions for the development of the downstream technology. Measures that consider the frequency and direction of knowledge exchanges can provide more insight into the relations among technological fields. Knowledge exchanges take place in two forms: voluntary technology transfer and involuntary knowledge spillover. Unfortunately, technology transfer data, such as patent licensing, is sparse and often limited to activities of firms within a sector or region for a short period. Spillover measures often rely on patent citations. Alternative data sources for knowledge spillover include patent interferences (Ganguli et al., 2020) and intersectoral labour The answer to Q1 leads to a spatial knowledge spillover network in which spatial factors affect the intensity of inter-field relatedness. On that basis, the answer to Q2 is the contribution of interregional and inter-field spillovers to local innovations (new patents) can be quantified. In the application to Smart Specialization, the method in this paper provides measures of agglomeration effects of innovative activities and an index of revealed comparative technological advantage for EU regions. This represents a new quantitative analysis toolbox that has been lacking in policymaking.
The remainder of the paper is organized as follows. The next section reviews the literature. The third section introduces the data source. The fourth section explains the regression models and methods for generating the key results, which are presented in the fifth section. The final section concludes the paper.

THE KNOWLEDGE SPILLOVER LITERATURE
The lack of consideration for both interregional and interfield knowledge spillovers is the central drawback in Smart Specialisation studies. The vast literature concerning knowledge spillover provides an abundance of references to studies that have a limited spatial component.
The seminal paper of Jaffe et al. (1993) finds geographical localization in knowledge spillovers using US Patent and Trademark Office (USPTO) data: cited and citing patents are more likely to originate from the same geographical location. The authors reach this conclusion by comparing every realized citation with a control sample that represents the cited patent's probability of co-location with a random patent from the same technology class and year as the citing patent. Thompson and Fox-Kean (2005) propose an improvement in the selection of controls for the existing geographical distribution of innovative activities and question whether evidence of intra-national localization exists. Griffith et al. (2011) finds that localization at country level has been falling, likely the result of a reduction in communication and travel costs. In their study of biotechnology patents from the USPTO, Johnson and Lybecker (2012, p. 21) claim that 'distance is becoming less important for spillovers with time'. Although there may be a weakening spatial effect for certain regions or technologies, the majority of the literature finds strong distance effects or home effects (at the country, state or metropolitan level) at both the USPTO and the European Patent Office (EPO), after controlling for the initial spatial distribution of innovation activities (Maurseth & Verspagen, 2002;Thompson, 2006;Bacchiocchi & Montobbio, 2010;Singh & Marx, 2013;van den Berge et al., 2017;Ganguli et al., 2020;Kwon et al., 2022;Abramo et al., 2020;Buzard et al., 2020). This is surprising considering that patent documents are globally accessible to inventors. As per Krugman (1991, p. 53), knowledge spillovers 'leave no paper trail by which they may be measured or tracked'. This suggests that patent citations should be regarded as imperfect paper records, rather than as channels, of the dissemination of knowledge. A likely reason is that a large proportion of knowledge is still diffusing in the form of tacit knowledge, which is not easily gleaned from patent documents. Distance is still an obstacle to this type of transmission. The channels unobserved in paper documents may consist of in-person communications and other interactions between inventors that trigger knowledge exchange and subsequent patent citations. Even when no 'actual' relation exists in a citation and there is no direct communication between inventors, there can be indirect linkages, maybe through a third party. In this interregional study, such indirect linkages, reflected in the citation probabilities between regions, The contribution of interregional and inter-field knowledge spillovers to regional Smart Specialisation 357 can still serve as an indirect measure of interregional relatedness and knowledge spillovers.
The study of the heterogeneity in the localization or distance decay effects across technologies is very limited, though it seems intuitive to assume that the effect of distance as an obstacle to technological diffusion depends on the features of the knowledge. Malerba et al. (2013) briefly mention that national/international and intra-/intersectoral spillovers vary across industries. A few others also acknowledge the heterogeneity in localization/distance effect across industries (Duranton & Overman, 2005;Ellison & Glaeser, 1997;Kwon et al., 2022) and (cited) patent classes (Murata et al., 2014). However, heterogeneity in distance decay across cited-citing class combinations has yet to be investigated. This paper tests the hypothesis that spatial knowledge spillover is more than a passive diffusion process determined by the features of the cited class alone. How the citing class matches with the cited class also affects the coefficient of distance decay.
When distance is omitted from the analysis, inter-field spillovers are assumed to occur in an aspatial knowledge network. For instance, Verspagen and de Loo (1999) estimate an intersector technology flow matrix in which distance is not considered a factor affecting the technological flows between industrial sectors. They also find that the technology flow network distinctively differs from the production input-output network. Acemoglu et al. (2016) estimate an inter-field technology flow matrix in which the USPTO patent classes are used as technology fields. Acemoglu et al. (2016) are consistent with the S3 in terms of the concerns about relatedness among technologies.
This paper tests the hypothesis that distance matters by adding it as the third dimension to a two-dimensional aspatial technology flow matrix. Therefore, this hypothesis can be formulated as: The technology flow matrix varies over space, depending on the distance between the knowledge sender and the knowledge receiver. Figure 1 illustrates the spatial innovation matrix. An interclass innovation matrix, such as the one estimated by Acemoglu et al. (2016), has constant coefficients. This paper develops the spatial innovation matrix by arguing that the coefficients fall with the distance (D) between inventors. Moreover, the coefficient of distance decay, captured by the function f, varies with the cited and citing classes. Empirically, citation-level regressions similar to Singh and Marx (2013) are used to estimate the heterogeneous coefficients of distance decay which affect the intensity of spillovers from one technological class to another. Such a heterogeneity has not been studied in the literature. With the estimation results from the regressions, the spatial innovation matrix (M(D)) can be used to calculate regional knowledge spillover conditional on the distance between regions, as illustrated by the example in Figure 2. Suppose there are four regions in the world: 'Triangle', 'Square', 'Parallelogram' and 'Rectangle'. The distances to Triangle from the other regions are, respectively, D1, D2 and D3, which determine M(D1), M(D2) and M (D3). Intra-Triangle distance is D0, which also determines M(D0). Given the initial number of patents by tech class (1, 2, … ) in each region, the product of the vector of patent numbers and the innovation matrix can be summed up to get the vector of knowledge spillovers to Triangle. Similarly, the spillovers to all regions can be obtained. The measure of knowledge spillover, combined with local initial conditions, is then used to generate an index of revealed comparative technological advantage for S3 policy analysis. This paper quantifies interregional knowledge spillovers and incorporates them in the calculation of regional innovation potentials, which is the major contribution to the S3 literature. The third section discusses the method formally.

DATA
In studies of innovation, a patent citation is often regarded as a knowledge spillover from the cited patent to the citing patent. Therefore, mining the accessible and abundant patent citation data is one of the few ways of estimating the intangible knowledge network.
Like many European studies, this paper relies on the OECD's Citations database, July 2020, 4 an extraction of the EPO's Worldwide Statistical Patent Database (PAT-STAT, spring 2020). It contains EPO patents with priority years 5 from 1977 to 2019. Each observation in the data contains information about a pair of cited-citing patents, with the locations of patent inventors indicated using NUTS (Nomenclature of Territorial Units for  Statistics) codes merged from the OECD's REGPAT Database. The geographical information about administrative boundaries is provided by EuroGeographics. After controlling for the data limitations described below, the cohort of cited patents filed from 2004 to 2008 is chosen for the estimation of the knowledge spillover network. The citing patents associated with the chosen cited patents are limited to a 10-year window after the cited priority year. 6 There are 409,340 pairs of citations in this cohort after excluding self-citations, non-EPO cited patents and citation pairs that are missing key information.
To control for the initial distribution of innovative activities, some unrealized citations are drawn from the sample using the weighted exogenous sampling maximum likelihood (WESML) approach from Singh and Marx (2013). See the third section and Appendix A in the supplemental data online for further explanations of WESML.

Data processing
The EPO data need to be processed before being used for analysis in this paper. In their study of USPTO patent data, Kuhn et al. (2020) demonstrate that some implicit assumptions about the nature of patent citations can lead to biased results. This paper, based on EPO data, has similar problems that need to be addressed.
The selection problem EPO patents are a selected sample. First, the cost of submitting an international application to the EPO acts as a potential filter for patents of low quality. An inventor can choose to file an application in a national patent office and/or at the EPO. Low-quality patents are more likely to be filed domestically. Because one concern with respect to using patent data is that low-quality patents do not represent innovative activities, the selection of higher quality patents at the EPO mitigates this problem. Second, cited patents are selected. Although cited patents can be found in global patent searches in the OECD's Citations database, the locations and IPC codes for non-EPO cited patents are either of low precision or missing, which limits the sample to patent citation pairs filed at the EPO and located in the EPO member countries. This selection problem is less of a concern for studies of knowledge spillovers within the EU.

Ambiguity in IPC classes and locations
In EPO data, a patent's main IPC class (the first three digits of the IPC code) and first inventor are not given. Multiple IPC codes are provided by the patent examiners, and multiple inventors of a patent are listed in the case of a co-invention. In this paper, the multiple classes given to a patent are ranked by frequency. The most frequent IPC class is assigned to the patent. See Appendix A in the supplemental data online for details.
If a patent is a co-invention, it is divided equally into sub-patents by the number of inventors. Each sub-patent corresponds to one inventor and is given a corresponding NUTS-3 location recorded in the OECD's REGPAT Database. The internal centroid of the NUTS-3 region is used to approximate the actual location of an inventor. Such an approximation results in inaccurate estimation of the distance between inventors. However, as EU regional development policies are implemented at the NUTS-2 level (a coarser geographical distinction), the distance measure based on the available geographical information at NUTS-3 level in the EPO patent documentation has an acceptable level of accuracy for studies of S3 domains (see Appendix A online for details). Intra-urban (Arauzo-Carod, 2021) or hyperlocal (Donegan & Lowe, 2020) studies can provide more insight about the efficient distribution of innovative activities within regions, but this paper has to leave it for future studies due to the limitation in geographical information.

Examiner-versus applicant-added citations
The USPTO requires applicants to disclose all relevant prior arts in their applications. There is no similar 'duty of disclosure' at the EPO (Bacchiocchi & Montobbio, 2010). When dealing with USPTO data, applicantadded citations are usually viewed as better proxies for knowledge flow among inventors. In EPO data, applicant-added citations account for only 20% of total citations. The absence of a 'duty of disclosure' discourages applicants from submitting references. In the EPO system, applicant-added citations consist of lists of prior arts that are at best incomplete. Submitted references at the EPO are more likely to support the importance and contribution of the citing patent, while the undisclosed references may be nearly identical works. On the other hand, some argue that examiner-added citations are more relevant than applicant-added citations (Alcácer et al., 2009;Hegde & Sampat, 2009;Kuhn et al., 2020). Therefore, this paper treats applicant-and examineradded citations equally.

METHOD
Citation-level regression Q1. Do knowledge spillovers across different cited-citing IPC class combinations have heterogeneous coefficients of distance decay? To answer the question, this paper uses citation-level data to estimate the effect of distance on the probability of a citation between two patents occurring. See Singh and Marx (2013) for the initial use of this regression model.
where i ∈ C m . C m is the set of backward citations of field m, including citations that were realized (cases, Citation mi ¼ 1) and potential citations that did not happen (controls, Citation mi ¼ 0). D mi stands for the logarithm of the distance (km) between the cited and citing inventors, calculated by ARCGIS based on the internal centroid of NUTS-3 regions. 7 The dissimilarity between two IPC classes is based on differences in their average propensity to cite The contribution of interregional and inter-field knowledge spillovers to regional Smart Specialisation 359 and spillover to different IPC classes. By including the interaction term 'dissimilarity*D mi ', the regression model differs from existing studies in that different cited-citing class combinations can have different coefficient of distance decay. Control variables in vector X include the home effects at the country and NUTS-2 level, cited and citing NUTS-2 fixed effects, dissimilarity, adjacency (between NUTS-2 regions), citation lag years (based on priority years), and proximity in language and culture. Because potential citations that did not happen cannot be observed, a choice-based sampling method (WESML) is used to generate a control group of unrealized citing patents for each citation, 'weighting each observation by the reciprocal of the ex-ante probability of its inclusion in the sample' (Singh & Marx, 2013, p. 2061. Two additional weights are multiplied to the sampling weights given by WESML (w WESML ). First, because of co-invention, there can be multiple inventors (and associated locations) for a cited/citing patent, complicating the calculation of distance. To solve this problem, a citation is divided into 'sub-citations' of cited-citing inventor pairs, each given a unique distance between the pair, and a weight: (see Appendix A in the supplemental data online). Second, the contribution of a cited patent in a citation pair is smaller if the citing patent cites a greater number of prior arts. Relevant works in the literature generally regard all citations equally. This paper instead assigns a weight: to each case-control group to account for the relative importance of the citation in contribution to a new patent. Consequently, the final weight is given by: After sampling and weighting, the data are fitted to a complementary log-log model, which fits rare events better than logit models. Equation (1) is estimated for each IPC class m, treated as the cited field.

The dissimilarity measure
To construct a dissimilarity measure, this paper first estimates the aspatial technology flow matrices based on the citation relations. Following Verspagen and de Loo (1999) and Acemoglu et al. (2016), a technological linkage can be found between the cited patent/class and the citing patent/class. The technology fields are defined by the three-digit IPC class codes. Classifying technologies to the three-digit level is common practice (Balland et al., 2019;D'Adda et al., 2020). Finer classification (e.g., to the four-digit IPC subclass level) may improve the precision in identifying technologies, but could also lead to selection bias, as matching a citing patent with a control patent in the same technology becomes more problematic due to the limitation of sample size. Cited fields are entered as row names of the matrix, and citing fields are entered as column names. k mn measures the average contribution of an EPO patent in field m to a new EPO patents in field n.
where C mn stands for the set of forward citations that field m received from field n. Citations are indexed using i. θ i is the reciprocal of the citing patent's total number of citations. It represents a weighting method that weights each citation by the average contribution of a cited patent to a new citing patent. Similarly, a mn measures the average number of class m patents required for a new patent in class n.
where η i is the reciprocal of patent citations received by the cited patent. In equation (3), if a patent cites a widely cited patent, its average 'consumption' of the knowledge presented in the other patent is given a smaller weight. The upstream-downstream relations estimated in matrices K and A can be interpreted as the average aspatial spillover effects found in EPO data, as they have not considered the effect of distance. Let Á be the transpose of matrix A. Horizontally join (column join) matrix Á and matrix K to get a 123 × 246 matrix ÁK ¼ (Á, K). Each row (representing an IPC class) of ÁK contains information about the average 'knowledge input' required from other classes (the first 123 elements in the row) and the average 'knowledge spillover' to other classes (the last 123 elements in the row). Using squared Euclidean distance, the dissimilarity between row j´and row j can be expressed as follows: where a ′ j ′ c (a ′ jc ) is the element in row j´(j) and column c of submatrix Á, and k j ′ c (k jc ) is the element in row j´(j) and column c of submatrix K. The resulting dissimilarity matrix is 123 × 123 (see Appendix A in the supplemental data online).
The contribution of knowledge spillover to patent output Q2. What is the quantitative contribution of spatial knowledge spillovers to regional innovation? The probabilities of citations between tech fields estimated in the method proposed in the third section help to calculate the knowledge spillovers received by regions. Assume that region r´needs to predict its patent output in IPC class m´in year y´(y´> 2008). Equation (1) gives a weighted average probability: Prob r ′ ,m ′ ,y ′ r,m,y ′ −lag of a new patent in class m´, region r´and year y´cites an existing patent in class m and region r with a priority year y ′ −lag (lag ∈ {1, 2, 3, … , 10}). The expected number of patent citations per patent in class m´, region r´and year y´is a summation over three dimensions: where N r,m,y ′ −lag is the number of patents in class m and region r with a priority year y ′ − lag. The aggregate spillovers received, or the total number of expected patent citations, in class m´, region r´and year y´, is given by: However, the patent output in year y´, N r ′ ,m ′ ,y ′ is the variable to be predicted and is therefore unknown. In equation (6), N r ′ ,m ′ ,y ′ has to be approximated by patent output in the last period (N r ′ ,m ′ ,y ′ −1 ). A panel regression model can be used to estimate the extent to which the spillover effect contributes to new patents, following Acemoglu et al. (2016).
The log-log specification of the regression model reflects the expectation that citations ( PC r ′ ,m ′ ,y ′ ) interact with the number of patents (N r ′ ,m ′ ,y ′ −1 ) in a multiplicative way. The lagged value also serves as an inclusive indicator to control for everything other than the spillover effect. Note that PC r ′ ,m ′ ,y ′ is the expected number of new citations rather than the number of new patents. However, it is used as an indicator of patent growth due to spillover. The λ's stand for region, IPC class and year fixed effects. Because annual changes in patent applications can be sensitive to random shocks, a cumulative approach similar to Acemoglu et al. (2016) Because PC's are estimated based on historical data, they can be viewed as exogenous and the coefficients b 1 and d 1 are unbiased and consistent. This paper conservatively refrains from adding more regional-level controls to equations (7) and (8) which may be subject to the 'bad controls' problem (Cinelli et al., 2020) and bias the estimates of the coefficients of primary interest. While other research may be interested in local factors affecting innovative activities, such as population, gross domestic product (GDP) per capita and infrastructure, these controls can be added with caution. For this study focusing on knowledge spillover, the effects of the 'omitted' regional controls are implicitly included in the fixed effects and the control for initial number of patents.

Coefficients of distance decay
This section examines the heterogeneity in the coefficients of distance decay for cited-citing IPC class pairs. A separate regression is conducted for each cited IPC class. Of the 123 regressions, 97 converge in the iteration process. 8 First, the b 0 m in equation (1) is estimated. The results are shown in Figure 3. According to the calculation of the dissimilarity measure, dissimilarity ¼ 0 for intra-class spillover, in which case the b 0 m represents the coefficient of intra-class distance decay of knowledge spillover. A total of 71 of the 97 coefficients are significantly negative at a confidence level of 95%. The other classes have insignificant coefficients. 9 The average marginal effects (MEs) of D mi given that cited and citing inventors are in the same NUTS-2 region have a median of -0.0000269. Compared with the average probabilities shown in Figure B1 in Appendix B in the supplemental data online, the magnitudes of the marginal effects are large. Activities of a technology class with strong negative ME should agglomerate over space to benefit from knowledge spillovers. The high-tech classes topping the list include B81 (Microstructural Technology), B01 (Physical or Chemical Processes or Apparatus in General), F16 (Engineering Elements or Units … ), and C12 (Biochemistry … ). For the full list of MEs, see Appendix A online.
Second, D mi interacting with the dissimilarity measure allows for the coefficient of distance decay to vary with combinations of cited-citing IPC classes, relative to intra-class distance decay. Figure 4 depicts the b 1 m 's, the coefficients of inter-class distance decay. For 28 out of 97 classes, there are significantly sharper declines in the probabilities of citations with greater dissimilarities between the cited and citing IPC classes. In very rare Figure 3. Coefficients of intra-class distance decay.
The contribution of interregional and inter-field knowledge spillovers to regional Smart Specialisation cases, dissimilarity mitigates the distance decay in knowledge spillover. For the other classes, dissimilarity does not seem to significantly affect distance decay, in which case the inter-and intra-class coefficients of distance decay are not significantly different. The results are partly consistent with the general conclusion of Malerba et al. (2013, p. 718) that 'intrasectoral knowledge flows are much less affected by distance than the intersectoral ones'.
In summary, (intra-and inter-class) distance decay still prevails in knowledge spillover. According to the specification in equation (1), the coefficient of distance decay is: Given the regression results and pairwise dissimilarity measure, the coefficient of distance decay can be estimated for 11,931 cited-citing pairs of IPC classes (see Appendix A in the supplemental data online). Of these, 9791 show significant distance decay (β < 0 and |t| > 1.96), and only five of them show significantly increasing probability of citations with distance. 10 Adjacency, home effects and lag effect The effect of the cited and citing inventors being in adjacent NUTS-2 regions is depicted in Figure 5. A total of 71 of the 97 classes show no significant adjacency effect. Only six classes show significantly positive adjacency effects.
And 20 classes show significant negative adjacency effects. These results suggest that, after controlling for distance, adjacency at the NUTS-2 level generally does not affect knowledge spillover. If more research can be done at a finer geographical unit, such as NUTS-3, adjacency is expected to positively affect spillover.
Results in Figure 6 show that 80 out of the 97 classes show positive home NUTS-2 effects, of which 39 are significant. In addition, as shown in Figure 7, 40 out of the 97 classes show significant positive home country effects.     Intuitively, the lag effect over time is generally negative, which is supported by the results in Figure 8.

An application to interregional relatedness
This study differs from the current literature in estimating the coefficient of distance decay for cited-citing pairs of IPC classes, which is of theoretical and practical relevance. In regional policymaking and firm location decisions, a region/firm must determine whether proximity to an innovative neighbour can be of benefit and what technology domains can benefit from such an advantage. Obviously, IPC class pairs with sharper distance decay should be more concentrated. The dissimilarity between two fields matters which means that a region can more easily learn from another with similar or related knowledge base, and be less constrained by distance. Regional S3's are particularly interested in interregional relatedness, which can be proxied by the intensity of knowledge spillovers. Following the method discussed in the fourth section, the citation probabilities can be used to estimate knowledge spillover flows by cited-citing region-field, which will be illustrated in an example of Latvia, a small economy with a less complicated knowledge network structure. To determine the S3 of Latvia (NUTS-2 code LV00), policymakers can ask three questions: . What are the regions most linked to local innovation? . What are the local tech fields that can benefit the most from interregional knowledge spillover? . Specifically, how can Latvia cooperate interregionally with its major partners?
These can be answered based on the knowledge spillovers estimated in equations (5) and (6).  Table 1 lists the estimated aggregate knowledge spillovers potentially received by Latvia by tech field. C12 Figure 9. Estimated knowledge spillovers to Latvia by NUTS-2 region. Source: Calculation based on the Organisation for Economic Co-operation and Development's (OECD), Citation database, July 2020. The contribution of interregional and inter-field knowledge spillovers to regional Smart Specialisation 363 (biochemistry; beer; spirits; wine; vinegar; microbiology; enzymology; mutation or genetic engineering) appear to have the greatest potential to benefit from interregional linkages, followed by A61 (medical or veterinary science; hygiene). Similar to Figure 9, detailed maps with knowledge spillovers disaggregated by tech field can also be generated to illustrate the spatial distribution of knowledge source of each IPC class. In the case of IPC class C12 of Latvia, the result indicates that Oberbayern (DE21, 19.78) is the major source. Table 2 shows the potential linkages between Oberbayern and Latvia. Note that C12 is the only active field between Latvia and Oberbayern.
The facts above are absolute measures of the intensity of interregional knowledge spillovers for tech fields in a region. The revealed comparative advantages in the next subsection can better determine the relative advantage of a region.
Regional patent output with knowledge spillover As discussed in the fourth section, the regression results of equations 7 and 8 are shown in Tables 3 and 4, respectively.
In models (1)-(3) of Table 3, a statistically significant relation is found between PC r ′ ,m ′ ,y ′ (the expected annual level of total citations by IPC class in a region due to spillover) and P r ′ ,m ′ ,y ′ (the actual patent output). Models (2) and (3) control for the initial level of patents and fixed effects for NUTS-2, IPC class and year. Because variation in patent growth rates tends to be larger for smaller P r ′ ,m ′ ,99−08 , P r ′ ,m ′ ,99−08 is used as the sample weight in models (1) and (2) to take care of heteroskedasticity. For instance, a region may initially have one patent in a particular class but three in the next decade. Greater randomness is present in the dramatic growth of this region than a region that started out with 100 patents in this class. This weighting strategy is consistent with Acemoglu et al. (2016). Unweighted results are shown in model (3). The results suggest a 1.56-4.42% increase in patenting for every 10% increase in the expected number of citations, which is consistent with USPTO classlevel estimations. Acemoglu et al. (2016, p. 11486) observe 'a 3-4% increase in patenting for every 10% increase in expected patenting'. Considering that this study is conducted at a finer geographical level, such consistency with prior research is more than acceptable. Knowledge spillovers explain 43.8% of the variation in actual output, as indicated by the R 2 for model (1); and 42.2-68.2% of the variation is explained by the combination of spillovers, the initial level of patents, and fixed effects, as indicated by the R 2 for models (2) and (3). Due to the weighting strategy, results in model (2) should explain innovation-active regions better. Table 4 presents an alternative cross-sectional and cumulative approach based on equation (8). Variation in spillover alone accounts for about 47.4% of the aggregate variation in actual regional patent output by IPC class, based on the R 2 for model (1). Meanwhile, variations in initial patent stock account for 78.8% of the aggregate variation based on the R 2 for model (2). Overall, model (3) fits the data well in terms of R 2 . Results support the dominating role of the local initial condition: a 10% increase in initial patent stock increases the number of new patents in the next decade by 5.14%.
Regional policies that only consider initial local strengths in target technology domains neglect the synergies in interclass and interregional knowledge spillovers, which are captured by PC r ′ ,m ′ ,y ′ . According to model (3), every 10% increase in PC r ′ ,m ′ ,09−18 contributes to a 2.41% rise in the patent growth rates, being a significant factor as compared with the 5.14% marginal effect of initial patent stocks. This model explains variations in patent output quite well with a R 2 at 89.6%, a more than 10% improvement from 78.8% in model (2). All indications are that including knowledge spillover in studies of regional production of innovations is important. Note that the effect of spillover may be understated because: (1) citations that happen after 10 years are not included in the calculation of spillovers; (2) new patents can lead to a second generation of patents; and (3) the spillovers from patents filed more than 10 years ago are absorbed into the effect of the initial level of patents.
The composition of PC r ′ ,m ′ ,y ′ summarized from the data is shown in Figure 10. Intra-class and intra-regional spillovers only account for 9.79% of expected new  citations. Interregional spillovers constitute over 85% of expected new citations, which highlights the importance of interregional linkages. Results in model (4) indicate that intra-and inter-class citations are not significantly different in their marginal contributions to new patents (0.0457-0.0534). However, interregional citations have significantly greater marginal contributions (0.105-0.129) to new patents than intra-regional citations. On average, intra-class and intra-regional citations certainly are more likely to occur. But conditional on occurrence of a citation, an inter-class or interregional citation is more conducive to innovation. Predictions about new patent output next year can be made based on the model in this paper. Assume that in year y´, region r´is considering a high-tech development project. First, given the estimation of equation (1), the local authority can obtain predicted probabilities of citing a patent from the interregional inter-class knowledge space. Similar to the example in Figure 2, given the number of patents by IPC class in each region from patent data and the citation probabilities, equation (5) calculates the measure of knowledge spillovers, PC r ′ ,m ′ ,y ′ . Based on equation (7) and results for model (2) in Table 3, the patent output of region r´in class m´in the next period is given by: It is possible to obtain a prediction of N r ′ ,m ′ ,y ′ +2 based on N r ′ ,m ′ ,y ′ +1 , and then similarly predict future patent output in the next decade, admittedly with degenerating reliability. The prediction of N r ′ ,m ′ ,y ′ +1 can deviate from the realized value of N r ′ ,m ′ ,y ′ +1 for various reasons. Regions and tech classes that underperformed in relatedness to others are likely to have less new patents than expected. Although more research is needed to further improve the accuracy of the predictions, this paper proposes a standard method to translate the predictions about patent outputs to suggestions about Smart Specialisation. Region r´can determine its prioritized technologies based on the expected patent output next year, which is a good indicator of the trending direction of a tech class. A standard Balassa index of revealed comparative advantage can be calculated to assist Table 4. Cross-sectional regression results.
The contribution of interregional and inter-field knowledge spillovers to regional Smart Specialisation 365 the selection of prioritized technologies.  (2020), also rely on varieties of the RCA index to discover regional advantages. But the RCA in this paper incorporates the effects of the interfield and interregional spillovers and is therefore a more comprehensive evaluation of regional advantages in a spatial knowledge spillover network. In 2017, compare the RCAs based on the predicted patent outputs ( RCA) and the RCAs based on actual values of N r ′ ,m ′ ,2017 (RCA). RCA is consistent with RCA, when both RCAs are simultaneously greater than 1 or less than 1, for 6152 of the 10,883 valid region-class observations. For 3527 observations, RCA . 1 and realized RCA < 1, suggesting unrealized potential for innovation. It is likely that connections to other regions and classes are so underrated that the local inventors are not given their fair share of opportunities to cooperate with the rest of the world.-RCA , 1 and realized RCA > 1 for the remaining 1204 observations which clearly outperform the average. The selection results based on technological RCAs need further evaluation by economists and technical experts. For instance, a region can be very active in a peripheral technological field -A43 Footwear. And it may also have an advantage in B82 Nanotechnology. A technological RCA is more central for the development of Nanotechnology than Footweara traditional industry in which the RCA in goods production matters more. But the advantage in Footwear innovative activities indicates that technicalization may be taking place in this traditional industry, for example, athletic shoes with air cushions. The regional Smart Specialization project can support such innovations in A43 rather than investments in the traditional part of this industry. However, when there is a trade-off between the two fields due to limited resources, criteria other than RCA matter. Balland and Rigby (2017) and Balland et al. (2019) suggest to support more complex technologies. Montresor and Quatraro (2017) propose to emphasize key enabling technologies (KETs) which are essential for regional development. More research is required to calculate the measures of complexity and KETs, but most probably Nanotechnology is the winner between these two options. In a prudent policy analysis, advice of technical experts from related fields should also be included.
Another limitation possibly comes from the intrinsic time lag in citation analyses. The technological linkages are found based on citation data in the last 10-20 years.
The corresponding results may be weakened for some tech fields experiencing structural changes in technological linkages with explosion of patents, for example, Industry 4.0 and green technologies. Follow-up research could benefit from a reduction from a 10-to a five-year citation window, to include more updated changes. However, there is a trade-off in that a narrow window covers less citations which reduces the number of observations. According to Verspagen and de Loo (1999), the average citation lag is 4.67 years and the standard deviation is 3.27 years. In other words, the loss of observations can be significant. This limitation of citation analysis is another reason for suggesting involvement of local technical experts in policy debates.

CONCLUSIONS AND IMPLICATIONS
This paper contributes to the knowledge spillover literature by acknowledging the heterogeneity in distance decay across cited-citing IPC class combinations. Because knowledge spillover is affected heterogeneously by distance, it has important implications for the agglomeration of innovative activities. It also adds spatial factors to the estimation of the knowledge spillover network, which to this point has generally been estimated in an aspatial manner or spatially in a limited range of technologies. The estimation results are used to explain the effect of interregional knowledge spillovers on regional patent growth and generate indicators for tech classes to be prioritized in S3 policy. The RCA in this paper is a more comprehensive evaluation of regional advantages. Unlike the national analysis by Acemoglu et al. (2016), regional analysis must consider not only the role of upstream technological development determined by technical relatedness, but also the distance to the agglomerations of technological advancements in other regions. For instance, if a region plans to branch into a new technology, it must consider not only the technical relatedness to its existing local strengths, but also the spillovers received from other regions after distance decay. Additionally, if there is dramatic distance decay in the spillovers to the technology to be prioritized, the region must also consider agglomerating upstream technological activities within the region. In this process, understanding the heterogeneous distance decay in inter-class knowledge spillovers based on results of citation-level regressions is important for the prediction of regional patent growth. Although local conditions have a dominant role in determining local advantages, results of the paper show that the interregional spillovers also have significant aggregate and marginal contribution to regional patent growth, which cannot be ignored in a complete evaluation of regional advantages. Technologies that are expected to be advantaged (technological RCA > 1) in the future should be prioritized in regional Smart Specialisation.
The method in this paper represents a new toolbox for Smart Specialization policy-making. But it is cautiously suggested that regions choose fields with more complexity and value added from the list of advantaged IPC classes.
There are some cases in which a region may even choose to prioritize a disadvantaged but essential technology. To support the prioritized fields, advantaged or disadvantaged, increasing the expected knowledge spillover is more practical than changing the initial stock of patents. There is little local authorities can do about the initial state of a region. Investments in R&D are common practice. But much more can be done to foster intra-and interregional knowledge spillover in order to increase the patent output in a class beyond the expected value, in ways that may even turn a marginally disadvantaged field into an advantage. For example, distance and time lag appear to have negative effects on spillover. To reduce the geographical distance to innovative inventors, policies can encourage immigration of high-tech inventors and the creation of local branch research centres of high-tech companies. From another perspective, if geographical distance cannot be easily changed, the economic distance can be reduced by lowering transportation costs. Because the calculation of the economic distance among EU regions is beyond the scope of this paper, more research is necessary to achieve a better understanding of the mechanism. Additionally, policies that promote the intra-and interregional communication among researchers and inventors (e.g., conferences, regional associations, patent licensing, and cooperative research projects) will likely reduce the time lag and increase the intensity of knowledge spillover. In addition, the sources of the home NUTS-2 and country effects needs to be investigated in more detail.