Distance from diasporas and immigrants’ location choice: evidence from Italy

ABSTRACT Diasporas play a fundamental role in explaining the location choice of new immigrants. We investigate the spatial dimension of diaspora externalities focusing on immigrants in Italian local labour market areas (LLMAs). We show that the net pull effect of diasporas spills over an estimated average distance of 82 km. We find evidence of negative spatial spillovers at greater geographical distances, suggesting a competition effect from neighbouring diasporas. Ethnic-specific labour markets and ethnic consumption externalities are important channels through which the distance–decay effects of diasporas take place. We also find that the spatial effects of diasporas are highly heterogeneous across gender and origin countries.


INTRODUCTION
Diasporasthe network of immigrants from a specific origin country living in foreign localitiesshape the geography of international migration flows by affecting the location choices of new waves of immigrants (Beine et al., 2011; see also, e.g., Winters et al., 2001;Bauer et al., 2002;Jayet et al., 2010;Jayet & Ukrayinchuk, 2007;Dimaggio & Garip, 2012;Beine et al., 2015;Garip & Asad, 2016). The magnet effect of diasporas depends on several channels, such as family reunification, reduction of search and information costs, or access to opportunities in the labour market, direct support by family and friends in the adjustment to the new environment, consumption externalities, etc. (Carrington et al., 1996;Chiswick et al., 2001;Coniglio, 2003;Munshi, 2003). The presence of diasporas might also generate negative externalities due to labour market competition, the hostility of natives and the associated stigma effect that newcomers might experience (Borjas, 1995;Gonzalez, 1998). Whether new immigrants will choose to locate near of far from diaspora communities will depend on the interplay between these positive and negative externalities.
From a theoretical point of view, these externalities are likely to be mediated by the geographical distance from the diasporas; in other words, their relative strength is likely to follow a distance gradient. For instance, in order to benefit from a direct support in terms of housing or food from members of the diaspora, geographical proximity might be crucial. On the other hand, externalities arising from information sharing or job market search might operate also at higher geographical distances. Consumption externalities due to the existence of a wider and cheaper set of ethnic goods are likely to be enjoyed at even greater geographical distances as immigrants do not have to move to these localized ethnic consumption hubs every day. Due to the existence of congestion costs or negative externalities, new immigrants may find it profitable to settle in areas neighbouring large diasporas but not too close to them. As the distance from diaspora increases, the externality of neighbouring diasporas may become negative as they will exert a force of attraction for potential new immigrants; in other words, different locations with large diasporas may be competing to attract new waves of immigrants. 1 In this paper, we study and measure the distance-decay effect of diaspora externalities using detailed data on immigration flows in 611 Italian local labour market areas (LLMAs) for the period 2007-18. Our study allows us to measure both the direct (localized) diaspora externalities as well as the indirect ones (spatial spillovers). The focus on Italy is particularly informative as, since the mid-1970s, Italy has experienced a transition from an emigration to an immigration country. However, only since 2002 (with the so-called 'big regularization') foreign immigration has started to determine a significant change in the composition of the Italian population (Bonifazi et al., 2009). At the beginning of 2019, the foreign population reached 5.26 million (8.7% of the total population) and included 191 different nationalities, with Romanians, Albanians, Moroccans, Chinese and Ukrainians as the most representative countries of origin.
Our contribution innovates over existing studies that consider exclusively strongly localized network effects, thus a priori ruling out the likely third-region effect, that is, the spillovers associated with the presence of a home country community in a third destination area (for instance, a neighbouring area). Our methodological approach allows us to shed light to the spatial distance at which diaspora externalities operate. We show that ignoring spatial spillovers leads to an upper bias in the estimates of the localized diaspora effects, but an overall underestimation of the pull effects of diasporas. To our knowledge, this is also one of the first studies that empirically investigates spatial competition of diasporas in attracting new immigrants. 2 Following existing studies, we first derive and test a non-spatial model of immigration flows in Italy with the restrictive hypothesis of strongly localized diaspora externalities (i.e., assuming that diasporas generate external effects only within a specific locality and not in neighbouring ones). We enrich the baseline model introducing spatial effects and investigate the role of diasporas in neighbouring locations in shaping migration inflows. Diasporas in the 611 Italian LLMAs are computed using past stocks of immigrants from the same origin country.
In line with the existing literature, we find in the baseline model a sizable network effect: on average, each settled immigrant present in an LLMA in 2007 attracted 0.66 additional migrants in the same LLMA within the following decade. However, the baseline (non-spatial) model cannot capture the geographical scope of network externalities which are not constrained by the borders of LLMAs. Significant positive spatial spillovers (indirect effects) are indeed evident up to the second-order spatial lag term, indicating a total (direct plus indirect) positive effect of about 0.75. Indeed, the attractive force of diasporas is exerted over an estimated average distance of up to 82 km. Finally, we find some evidence that at higher distances spatial effects turn negative, suggesting that a competition effect prevails as flows are diverted to localities with larger diasporas.
The estimated average effects mask a large heterogeneity in the estimated coefficients across gender and origin countries. Diaspora externalities exert an overall stronger agglomeration effect for new female immigrants compared with males as positive indirect effects are more intense for female immigrants. The different impact of diaspora by gender is largely explained by the different mix of origin countries and by gender heterogeneity in labour market integration. Another possible explanation of this gender difference can be due to the role played by family reunification. In fact, the location choice of immigrants who are joining family members is naturally more constrained and more likely to happen in areas with a high concentration of immigrants from the same country of origin. In the period considered in this study family reunification is significantly more intensive for female immigrants compared with males. 3 For male immigrants, we find a steeper spatial gradient of diaspora externalities for immigrants from culturally and economically distant origin countries. Indeed, for these countriessuch as Egypt, Pakistan or Bangladeshthe effects of the localized diasporas are significantly larger than the indirect effects. Conversely, for female immigrantswho are mainly employed as household collaboratorsthe spatial gradient is less steep as neighbouring diasporas exert net-positive externalities on the location of newcomers. We interpret this result as the effect of the strength of ethnic networks for access to job opportunities for immigrants from relatively poor countries. 4 The rest of the paper is structured as follows. The next section displays some stylized facts on immigration in Italy during the sample period. The third section presents the empirical model and discusses some econometric issues. The fourth section reports the results of the econometric analysis. The last section concludes.

IMMIGRATION IN ITALY OVER THE PERIOD 2007-18: SOME STYLIZED FACTS
Our analysis is based on official administrative data from the Italian population registers on migratory movements at the municipal level collected by ISTAT. We selected flows of immigrants aged 15-64 years from each origin country towards each LLMA of destination distinguished by gender over the period 2007-18. LLMAs are highly integrated clusters of contiguous municipalities, and their boundaries are identified according to the self-containment of commuting flows (ISTAT, 2015). According to the most recent map compiled by ISTAT in 2011, the national territory can be described in terms of 611 LLMAs.
In the following analysis we use aggregated immigration flows across the period 2007-18 as lags in the registration of the administrative data make the time-series dimension of the dyadic data problematic. 5 Table 1 shows the distribution by origin and gender of foreign immigrants aged 15-64 years legally registered in Italy over the period 2007-18. The majority of all migrants (50%) came from Europe: 34% from the European Union (EU) and 16% from Eastern Europe. Both Africa and Asia sent about 20% of migrants, while fewer than 10% arrived from the United States. The geographical distribution by gender shows a higher share of female immigrants from the EU and a higher share of male immigrants from Africa and Asia. The ranking of the first 10 countries of origin is reported in the lower part of Table 1. The inclusion of Romania among the EU members in 2007 guaranteed a special intra-EU mobility status to their citizens. This helps explain the strong incidence of EU immigrants over the whole sample period. All the other countries in the top 10 (except Poland), each of which has a share lower than 7%, are non-EU countries.
The LLMAs of destination can be first grouped in four macro-regions: North-West, North-East, Centre and Mezzogiorno (including the South and the two major islands, Sicilia and Sardegna). Table 2 shows that over the sample period, 30% of all immigrants aged 15-64 years went to the North-West, 23% to the North-East, 25% to the Centre and 21% to the Mezzogiorno. The distribution by gender is very similar to the overall one, except for a lower (higher) share of women that are directed to the Mezzogiorno (North-East). The top 10 ranking confirms the dominance of the largest metropolitan areas (Roma, Milano and Torino). However, immigration in Italy is not just a metropolitan phenomenon, but is also widespread in several areas of the territory, especially in industrial districts specialized in Made in Italy productions. Using flow maps, it also emerges that Romanians are quite spread over the territory, while the other ethnic groups are rather concentrated in a few LLMAs (see Appendix B in the supplemental data online).
Maps of local Moran statistics enable us to assess the sign of spatial association of immigration rates in the different local areas. The scatter maps (reported in Appendix C in the supplemental data online) show a significant cluster of high values in the North and the Centre, and a significant cluster of low values in the Mezzogiorno. Both of these clusters of positive Distance from diasporas and immigrants' location choice: evidence from Italy spatial autocorrelation are more pronounced in the case of female immigrants. Indeed, in the case of male immigrants, several LLMAs in the North show negative local spatial autocorrelation: low values surrounded by high values and high values surrounded by low values. A few spatial outliers are also detected in the Mezzogiorno. Moreover, a strong heterogeneity across countries of origin emerges. The above analysis highlights the importance of adopting a spatial approach for the analysis of the determinants of location patterns of new immigrants as well as the need to consider gender and origin-country-specific patterns.

ECONOMETRIC MODEL
3.1. The negative binomial spatial gravity model The literature on community network externalities (Bauer et al., 2005;Beine et al., 2011Beine et al., , p. 2015Zavodny, 1999) has typically used gravity models which rely on three types of factors to explain migration flows from an origin i to a destinationj: (1) origin-specific attributes that capture push factors; (2) destination-specific attributes that represent the attractiveness of the destination; and (3) origin-destination variables that capture dyadic factors that constrain or encourage new flows. The last category typically includes separation variables, such as the geographical distance between i and j, and the stock of migrants from the origin i settled in the destination j at the beginning of the sample period, capturing the net diaspora effect (in our case the number of foreign citizens from each country i settled in each LLMA j in Italy). All these studies find positive effects of networks on the location decision of newly arriving migrants, thus corroborating the hypothesis that networks can provide their newly arrived members with ethnic goods, better information on housing or employment opportunities, help with the settlement process, or financial and legal assistance. Most of these studies neglect the role of spatial dependence, implicitly or explicitly assuming that the network effect is strongly localized, that is, assuming that the community network only influences the location choice of new migrants for the spatial unit where the network is settled. However, this assumption may turn to be rather strong due to the presence of spatial spillover effects (substantive spatial dependence), as widely discussed above, or to model misspecifications (such as spatially autocorrelated heterogeneity, non-linearities or common factors) which can generate spatial autocorrelation (nuisance spatial dependence). 6 Therefore, following Nowotny and Pennerstorfer (2019), we consider the possibility of spatial spillovers by using a spatial econometric specification of the gravity model. 7 As also pointed out by Nowotny and Pennerstorfer (2019), community networks can hardly generate global spatial spillover effects, that is, externalities propagating to all other spatial units. Indeed, information or consumption externalities originating from a specific LLMA in Italy can propagate beyond the boundaries of the local system, but with a very strong distance decay. Therefore, in our model specification, we can rule out the existence of a data-generating process with a spatial diffusion of idiosyncratic shocks based on a spatial multiplier mechanism typical of a spatial lag model or a spatial Durbin model. Rather, we can base our model specification on the theoretically rooted assumption of the existence of spillovers that involve a limited number of neighbours.
More formally, our econometric specification is based on a theoretical random utility maximization (RUM) model similar to the one proposed by Beine et al. (2015), but including spatial spillover effects. 8 The dependent variable N ij measures the number of migrants aged 15-64 years who arrived between 2007 and 2018 from origin country i to LLMA j, distinguished by gender. Given the discrete nature of this dyadic outcome (count data with a large presence of zeros, and a right-skewed distribution), we apply a gravity negative binomial model (Winkelmann, 2008) to explain the count of movers, accounting for over-dispersion in the data. 9 The econometric specification is therefore given by: where M ij is the stock, registered in 2007, of foreign citizens from a country i settled in a LLMA j; and d ij is the great circle distance between each origin country and each LLMA of destination, calculated using the geographical coordinates of the centroids of origin and destination polygons. Both d ij and M ij influence local assimilation costs for potential migrants from country i to destination j. These costs increase with bilateral distance and decrease (increase) with the size of the diaspora network at the destination if positive (negative) externalities dominate negative (positive) ones. However, we cannot rule out the fact that network members may also influence the location choice of new migrants in different LLMAs by affecting their assimilation costs. To capture this additional effect, we assume that local assimilation costs may also decrease (or increase) with the size of migrants' networks from country i settled in LLMAs l (M il with l = j) belonging to the neighbourhood of j. We cannot arbitrarily define the geographical scope of the relevant neighbourhood generating spatial spillovers on LLMA j. More flexibly, we assume that spatial contagion effects may occur up to a certain spatial contiguity order (L) that can be empirically assessed. We only expect, as usual, a distance-decay effect of spatial spillovers (stationarity condition). Moreover, we cannot exclude that neighbouring LLMAs (or better the community networks settled within these areas) may compete with each other for the attraction of new immigrants. In this case, a negative spillover would dominate the positive contagion effect. 10 Thus, each l term l=j w l jl ln (1 + M il ) measures the spatially lagged value of the network variable at the l neighbourhood order, with w jl the elements of a row-standardized binary spatial weights matrix indicating the existence of a neighbourhood linkage between each pair of LLMA j and l. We computed these spatial lag terms using two alternative spatial weights matrices: (1) a distance-based binary weights matrix; and (2) a queen contiguity weights matrix. 11 Each element of the first matrix takes value 1 if the (great circle) distance between two LLMAs is less than a threshold distance, and 0 otherwise. The chosen critical cut-off (41 km) is the minimum distance ensuring that each location has at least one neighbour. Each element of the second matrix takes value 1 if two LLMAs share a boundary, and 0 otherwise. Both weights matrices are symmetric. Second-and third-order spatial weights matrices are also generated (the fourth and higher orders generate islands, i.e., non-connected LLMAs). All matrices, at all orders, have been row standardized before computing spatial lags of the network variable. Here, we only report the results obtained using the binary-distance matrix, while the results with the contiguity matrix are reported in Section E of the Appendix in the supplemental data online. Parameters (a, b l ) are the relative contributions of the network externality through the direct (local) and indirect (or spatial spillover) diaspora channels, respectively.
The model specification includes country-of-origin (m i ) and LLMA-of-destination (m j ) dummy variables, capturing unobserved origin-country and destination-location characteristics. This large set of fixed effects also allows us to account for the effect of multilateral resistance to migration (MRM), defined as the confounding influence exerted by the attractiveness of alternative destinations upon the bilateral migration flow (Bertoli & Moraga, 2013;Ortega & Peri, 2013).
Moreover, the inclusion of spatial lags of the network variable helps control for the case in which MRM generates spatial autocorrelation (or weak cross-sectional dependence). We may consider, for example, the case where neighbouring LLMAs (defined in terms of physical spatial proximity) share unobservable similar characteristics (i.e., unobservable sources of attractiveness for individuals) and this contributes to generating within-neighbourhood correlation in the stochastic component of utility. Examples of unobservable factors may be the demand and supply conditions for specific jobs that are under the control of a certain ethnic group i. The boundaries of these specific labour markets do not coincide with the boundaries of LLMAs. Lack of control for localized spatial spillover effects of migrants' networks generates a correlation between the realizations of the stochastic components of utility corresponding to any pair of destinations belonging to the same neighbourhood. This correlation may differ across origin country i. More formally, when local spatial spillover effects are not controlled for, the identification of the effect of migrants' networks can be confounded by the community network externalities generated in nearby LLMAs, and the error term of the RUM model becomes a composite error term: where the component r ij = l[L b l l=j w l jl ln (1 + M il ) represents MRM as it captures the influence exerted by the opportunities (and barriers) to migrate to other destinations upon migration from country i to LLMA j, and 1 ij EVT − 1. When b l , 0, then an increase of M il redirects towards l proportionally more individuals who would have opted for destination j than individuals who would have stayed in the country of origin i, thus reducing the bilateral migration flow N ij in equation (1), that is, generating a negative spatial externality. A positive spatial externality occurs instead when b l . 0.
We can check the relevance of weak cross-sectional dependence in the residuals of our regression model by using the CD test proposed by Pesaran (2015), and by treating the dyadic structure of the dataset as a balanced panel. The results of this test are discussed in the following sections.

Accounting for endogeneity bias
Another potential source of bias is represented by the endogeneity of the migrants' network variables. Our econometric model (equation 1) includes both origin-specific (m i ) and destinationspecific fixed effects (m j ). However, it does not control for unobserved bilateral factors (v ij ) affecting the bilateral migration flows N ij . These unobserved factors will contribute to determining a composite error term u ij = v ij + 1 ij , where 1 ij are i.i.d. random variables with zero mean and finite variance. If the unobserved factors also affect the network variable ln (1 + M ij ) as well as its spatially lagged terms, this leads to a correlation between the error term 1 ij and these covariates, generating an endogeneity bias. For example, origin-specific unobserved location preferences may affect both past and present location choices. In order to solve this problem, we can use a twostage instrumental variables (IV) approach, as suggested by Winkelmann (2008).
Appropriate instrumental variables must be correlated with the endogenous terms, but they must not have a direct effect on our dependent variable N ij (exclusion restriction hypothesis). Here, we rely on the use of the stock ln (1 + M ij ) observed in 1993 and 2002 as well of their spatial lagged terms as instruments for the same variables observed in 2007.
The use of the stock in 1993 and especially that one in 2002 as exogenous instrumental variables may appear inappropriate due to the short time lag to our (potentially) endogenous variable. For example, Beine et al. (2015) used the stock in 1950 as an instrumental variable for their analysis of the US case over the 1990s. However, it is worth reminding once again that immigration to Italy is a very recent phenomenon and that only after the 'big regularization' ('Bossi-Fini' Law) in 2002 foreign immigration has started to generate a real change in the composition of the Italian population. This is even more true for the stock of foreign population originating from Eastern European countries.
For example, emigration from Romania (the main ethnic group in Italy) started only in 1989 with the fall of Communism. However, substantial Romanian flows to Italy started only in 1996 and were mainly irregular flows. From January 2002, a new wave of immigration from Romania started due to the abolition of the visa requirement for Romanians visiting EU countries for fewer than three months. From that moment, Italy has become one of the main destinations of Romanians. Since Romania's accession to the EU in 2007 (the year of the measurement of the stock of foreign population in our empirical analysis), Romanians are no longer subject to a visa requirement for residence in Schengen countries. Therefore, we may consider the abolition of the visa requirement for temporary visits of Romanians in 2002 as a temporary shock that affected migration flows from 2002 to 2007 (and thus contributed to affect the stock formation in 2007), but did not have any direct persistent effect on migration flows after 2007 when Romania became a new member of the EU. These arguments allow us to rely on the spatial distribution of the population of Romanians in 2002 as a good instrument for the stock of Romanians measured in 2007. These arguments can be extended to other former Communist countries that joined the EU in 2007 (Bulgaria) and in 2004 (Poland, Hungary, Slovenia, Slovakia, Czech Republic, Lithuania, Latvia and Estonia). Altogether, these countries represent the origins of about one-third of immigrants to Italy over the sample period 2007-18 (Table 1), so that the corresponding community networks computed in 2002 can also be considered as a sufficient lagged value of the potentially endogenous variable.
The entire set of instrumental variables is therefore composed of the following variables: (a) the stock of the foreign population from nationality i that was settled in each LLMA j in 1993, as well as its first-, second-and third-order spatial lags; and (2) the stock of the foreign population from former Communist countries that joined the EU in 2004 or 2007 that was settled in each LLMA j in 2002, as well as its first-, second-and third-order spatial lags.

EMPIRICAL RESULTS
In this section we report the estimation results of equation (1) using the IV approach described above. The sample of data includes 56 origin countries for which at least 5000 foreign citizens were residing in Italy from 2007, accounting for 96.4% of total immigration in the subsequent decade. 12 We start the discussion of the empirical results from a comparison between the a-spatial 13 and the spatial IV-NB gravity models (Tables 3 and 4). 14 In the a-spatial version, we only include the localized network effect, while in the spatial model we include spatial lags of the network variable to identify spatial spillover effects. These spatial lags are computed using different binary weights matrices based on increasing values for the distance cut-off. Specifically, for any LLMA j, the spatial lag l=j w 0−41 jl ln (1 + M il ) is the weighted sum of the number of foreign citizens from country i settled in the surrounding LLMAs within a radius of 41 km. We also computed similar measures for the ring between 41 and 82 km ( l=j w 41−82 jl ln (1 + M il )), and for the ring between 82 and 123 km ( l=j w 82−123 jl ln (1 + M il )). For all migrants together, the results of the a-spatial model reveal that a 1% increase in the initial community network size in 2007 leads to a 0.66% increase in the bilateral flow of immigrants over the subsequent decade. A similar magnitude of the elasticity a is observed for male immigrants, while in the case of females, the elasticity is 0.73, suggesting a slightly greater diaspora effect for women. Therefore, these results confirm the consensus found in the literature, although the  elasticity is lower than the 1% level found in the US case, but higher than the magnitude (0.5%) estimated by Jayet et al. (2010) for the case of Italy. 15 The elasticity of the great circle distance between the country of origin and the LLMA of the destination is always negative and significant as expected; it is higher in the case of males and lower in the case of females. Finally, the results for diagnostic tests reveal no signs of strong cross-sectional dependence in the residuals of all a-spatial models, while some evidence of weak cross-sectional dependence emerges. 16 The evidence reported in Table 3 for the spatial model confirms the existence of a sizable direct network effect of about 0.62 (0.64 in the case of male immigrants, 0.66 in the case of female immigrants). The coefficients associated with the two spatial lags within the ring of 82 km are positive and significant, indicating that LLMAs benefit from being located close to other LLMAs with the same ethnic communities. However, the parameter of the secondorder spatial lag is much lower, indicating that positive localized spatial spillovers decrease as distance increases. In other words, the effect of community networks spills over not only to the physical neighbouring LLMAs but also to LLMAs up to a distance of 82 km, although with a lower impact. The use of higher order lags of the network variable, therefore, allows us to assess the existence of a distance-decay mechanism in the influence of community networks on immigration flows. The cumulated effect of spatial-lag coefficients is about 0.13 (0.08 for males and 0.21 for females), indicating the existence of non-negligible spatial spillover (indirect) effects. Therefore, the total (direct and indirect) network (or assimilation) effect is about 0.72 for males and 0.87 for females. In the case of males, it emerges a negative spatial spillover effect generated by community networks settled in the ring between 82 and 123 km. The magnitude of this competition effect is −0.059.
Finally, comparing the performance of the model specifications with and without spatially lagged terms, we may conclude in favour of the spatial specification. For example, in the case of male immigrants, we observe that the value of the AIC decreases from 181,139 (obtained for the model without spatial lags) to 180,805 (with spatial lags); the improvement is even higher in the case of female immigrants: from 174,985 to 174,107. Moreover, the inclusion of spatial lags of the network variable removes any sign of weak cross-dependence from the residuals. For example, using the subsample of male immigrants, the CD test statistic for weak dependence is −1.624 (p = 0.104), while in the case of the female immigrants, the corresponding value is 0.328 (p = 0.743).

Assessing the parameter heterogeneity across countries
The general results highlighted above mask interesting heterogeneity across the 56 origin countries that are analysed in this section. More specifically, we have estimated the IV gravity  Focusing first on the results for the top 10 countries of origin, in Figure 1 we report the estimated distance-decay effects of diaspora externalities. Lag 0 measures the localized or direct effects of diasporas on the attraction of newcomers, while lags 1-3 the indirect effects of neighbouring diasporas. The spatial gradient is captured by the slope or steepness of the lines in Figures 1 and 2: the higher the relative importance of the direct effects vis-à-vis the indirect ones, the higher the spatial gradients of diaspora externalities. Overall, we observe that diaspora effects have a decreasing spatial gradient which conforms the hypothesis of a strong distancedecay effect generated by immigrants' networks. Considering male immigrants, we find for some origin countries (in particular, Egypt, Pakistan, Bangladesh, Senegal and Nigeria) a relatively high direct effect of diasporas associated with a steep spatial gradient of the effects associated with neighbouring diasporas. In fact, as evident from Figure 2, the average indirect effect is even negative for some origin countries.
Access to job market opportunities might be the most relevant driver of these heterogeneous effects. For instance, the Pakistani community in Italy is highly geographically concentrated: two-thirds reside in the North of the country and are largely employed in low-skill jobs mostly in the industrial sector (43.3% in 2018;Ministero del Lavoro, 2018). Also immigrants from Bangladesh show a strong ethnic specialization in the labour market with a high share of employment in the service (mostly retail trade) and restaurant sectors (approximately 59% of the workforce in 2018). Ethnic networks are crucial for new immigrants from these origin countries as they play a key role as a source of employment opportunities as well as for consumption externalities as immigrants are largely young males living in shared housings.
For Chinese and, to a lower extent, Albanian immigrants, we find strong overall diaspora externalities with high values for both direct and indirect effects. Although the level of social and cultural integration for Chinese immigrants is relatively low contrarily to the Albanian communitywhich has a three-decade-long history of immigration in Italyfor both communities the level of economic assimilation is relatively high, and entry in the labour market is still largely realized through ethnic networks (Ministero del Lavoro, 2018). For these countries, diaspora matters, but their attraction force goes beyond the localized effect.
For new Romanian immigrantsthe largest ethnic community in Italywe find weaker diaspora effects compared with other immigrants. Cultural and language proximity makes the geography of migration of Romanians less affected by pre-existing diasporas. This result holds for both male and female immigrants.
Interestingly, we find evidence of stronger positive indirect effects for female immigrants. We interpret this gender difference as the result of a specialization of female immigrants in household services such as caretakers and domestic collaborators (49% of female workers against 8.4% for male; year 2019). Clearly, this specialization has a fundamental implication for the spatial distribution of female immigrants compared with male ones. In fact, job opportunities in household services are geographically dispersed as the services of immigrant female workers is demanded from Italian households in any corner of the country. Our findings suggest that diasporas play a fundamental role for the location choice of female immigrants, but the ubiquitous availability of job opportunities makes their role less relevant as a driver of location choice compared with males.
For the interpretation of these results, we must also consider that immigrants from some Sub-Saharan countries (such as Nigeria, Senegal, Eritrea, Ivory Coast and Ghana) mostly enter Italy as asylum seekers. Thus, their first residential location within the country is mostly driven by reception projects carried out by humanitarian organizations, such as the Catholic Church. In these cases, therefore, the endogenous network effect does not exert a primary role, despite the high cultural and economic distance of these countries from Italy. It is worth noticing that in the case of male immigrants, four out of the top 10 countries of origin are asylum-seeker countries.
Finally, in order to shed light on potential mechanisms behind our findings, we analyse the correlation between the diaspora effect and some measures of cultural and economic distance. Excluding from this analysis the subset of asylum-seeker countries listed above, we find that the direct effect of diasporas is stronger for origin countries that are culturally and economically distant from Italy (Figures 3-5). 17 This result is not surprising as the magnet effects of diasporas is stronger when newcomers have to navigate a new socio-economic environment that is significantly different from the one they left at home (Bauer et al., 2005).
Interestingly, we also find that the ratio between direct and indirect effects, which is a measure of the spatial gradientor distance-decay effectof diaspora externalities is directly related to economic distance for male immigrants, but not for female ones (Figures 6 and 7). Diaspora externalities are strongly localized for male immigrants coming from developing countries; most likely as their job market opportunities strongly depend on ethic networks. On the contrary, the role of networks for the job market performance of male immigrants from relatively more  developed countries and for female immigrantsas highlighted aboveis weaker; thus implying that their willingness to move away from diasporas is also higher.

CONCLUSIONS
In this contribution we have analysed the role of diasporas in shaping the geography of immigration flows in Italy and, in particular, we have provided novel evidence of the distance-decay effect of ethnic externalities. We have shown that ignoring the role of neighbouring diasporas leads to  biased estimates of the magnet effects of established ethnic communities on new immigrants, more precisely producing a significant upward bias in the estimate of localized (direct) diaspora externalities and a downward bias for their total effects, which includes spatially lagged (indirect) externalities. In this respect, our work contributes to the existing literature by adding new insights into the complex factors that shape international migration flows.
We have also shown that the spatial dimension of diaspora externalities greatly differs across gender and across origin countries. Explorative analysis suggests that cultural and socio-economic differences between origin and destination countries might be at the basis of the different importance exerted by local and neighbouring diasporas on immigrants' location choice. Also the specific nature of the demand for foreign workers might partly explain the observed heterogeneity. Most notably for the Italian case, the lower distance-decay effect of diaspora externalities for female immigrantsin particular, from Eastern Europe and the Philippinesis associated with their specialization in household services for which employment opportunities are ubiquitously available in the destination country.
One limit of our approach is the lack of a panel dimension due to data availability. 18 In fact, the observed heterogeneity might be due to cross-country features such as cultural and economic distance, but also to the 'vintage' of migration flows in Italy. A longer history of immigration toward a given country would imply that newcomers have access to a progressively better and wider set of information on opportunities in the destination country and, hence, the role of diasporas in shaping the choice of their location might be weaker over time.
As we focus on a single countryalthough an important destination country in the considered periodwe are not able to identify destination-specific characteristics (labour market conditions, immigration policy, domestic norms, etc.) which might also matter for diaspora externalities and their geographical dimension.
Additional and complementary insights might be gained using individual-level longitudinal data which would allow one to address the research question of this paper looking at individual-level characteristics and not only of first localization choices of immigrants but also their relocation over time. We leave these potential avenues of additional analysis for future research. also standard IV approaches can be used to control for the endogeneity of our main explanatory variable and its spatially lagged values. 11 We use two standard weight matrices that are purely exogenous. Often these choices are used as alternatives and, given the absence of a clearcut criterion for strongly preferring one over the other, as a robustness we decided to use both. We avoided using other weights matrices that might include elements of endogeneity in the estimation (e.g., a weights matrix based on commuting data). 12 We performed robustness analysis using different thresholds on the minimum size of diasporas (1000, 3000, 5000 and 10,000) in 2007. The results are qualitatively identical and are available in Section F of the Appendix in the supplemental data online. The parameter a associated with ln(1 + M ij ) is quite stable across the different subsamples, indicating that the exclusion of small source countries does not affect the results. Moreover, the value of the Akaike information criterion (AIC) is lower when we select the subsample on the basis of the threshold of at least 5000 foreign citizens in Italy from the same origin country in 2007. 13 By a-spatial model, we mean a model that only includes the localized diapsora. 14 The diagnostic tests confirm the endogeneity of the network variable and of its spatial lags and the relevance of the instrumental variables used (Table 4). 15 The different magnitude of the localized diaspora effects with respect to Jayet et al. (2010) may depend on several dimension on which the two studies diverge: time period, spatial unit of analysis, model specification and estimation method. However, the main conclusions about the existence of a robust and sizable network effect are consistently corroborated in both studies. 16 For example, in the case of male immigrants, the CD test statistic for strong dependence (Pesaran, 2004) is 0.591 (p = 0.554), while the CD test statistic for weak dependence (Pesaran, 2015) is 1.840 (p = 0.065) (for the test of weak cross-sectional dependence, we have used the spatial contiguity matrix). In the case of female immigrants, the corresponding values of the statistics are 0.964 (p = 0.335) and 2.077 (p = 0.038). 17 The measure of cultural distance is that proposed by Del Gatto andMastinu (2016, 2017); it is a weighted average of three distance measures: genetic, linguistic and religious. Economic distance is computed either as the ratio of the GDP per capita of the origin country and that of Italy, or as the ratio of the employment rate (number of employees/working age population) of the origin country and that of Italy. 18 Appendix G in the supplemental data online reports the results based on subperiods in order to takeat least partlythis limit into account. The analysis confirms the robustness of the results reported in the paper.