Skill concentration and persistence in Brazil

ABSTRACT This paper links the past and present regional concentration of skills using the spatial distributions of occupations from the Brazilian censuses of 1872, 1920 and 2010. The data indicate that the concentration of top skills is highly persistent. Multivariate regressions show that regions with a high concentration of industrial and liberal occupations in the past have a high concentration of interpersonal, analytical and cognitive skills today. Moreover, it is observed that skill persistence seems to be positively related to market size. Controlling for natural advantages, the dependence on slave labour and immigration in the past does not undermine the relevance of the historical skill distribution.


INTRODUCTION
The scale of an industrial plant depends on the demand for its products. … Since it takes machines to produce machines, and these are themselves the product of many different factories and workshops, machinery is produced efficiently only in a place where factories and workshops are close enough together to help each other work in unison, i.e. in large towns. (von Thünen, 1826(von Thünen, /1966 The spatial concentration of economic activity in generalespecially in large cities where land is most expensivemakes sense if production becomes more efficient. In the above quotation, von Thünen suggests that agglomeration economies arise from industrial production in factories. Indeed, between the 19th and mid-20th centuries, manufacturing had been the engine of growth and development. However, the modern economy depends much more on services, financial products and innovation. To understand the (changing) nature of agglomeration economies, the empirical literature has evolved from analyzing the concentration of people to the concentration of industries to the concentration of skills. The last point of view is informative because the task and skill content of jobs within the same industry may be entirely different, whereas workers in different occupations such as engineers, economists, managers, etc. perform essentially similar activities even across sectors. Moreover, recent work by Michaels, Rauch, and Redding (2019), Ehrl and Monasterio (2017) and Bacolod, Blum, and Strange (2009a) have found that human interactions, analytical skills, and 'face-to-face' or 'soft skills' are essential aspects of agglomeration economies and consequentially help explain why cities exist and thrive. As a logical consequence, abilities related to social intelligence and analytical thinking are also primarily concentrated in large cities (Andersson, Klaesson, & Larsson, 2014;Bacolod, Blum, & Strange, 2009b;Florida, Mellander, Stolarick, & Ross, 2012). 1 In contrast, physical and manual skills tend to be concentrated in smaller cities and rural areas. This recent strand of the literature focuses on the population size of cities or regions, but beyond that little is known as to why some regions are more skilled than others. The contribution of this paper is to link the degree of concentration of skills in regions to their skill composition in the past. To that end, we exploit the spatial distribution of occupations from the Brazilian censuses of 1872, 1920 and 2010 and regress the concentration of the most skilled occupations in 1872 or 1920 on the concentration of top skills in 2010. The data indicate that the regional concentration of top skills is highly persistent. We also observe that persistence occurred primarily in regions with a large initial market size, as measured by the gross domestic product (GDP) in 1920 or 1872. Understanding those relationships is of interest to economic geographers, historians, urban economists and anyone concerned with development economics. The relation between the current and past distribution of skills is largely unexplored thus far but may contribute to explaining why regional inequality persists over centuries.
Until recently, skills were usually distinguished on the basis of peoples' educational attainment as high, medium or low skilled. Following Autor, Levy, and Murnane (2003), Acemoglu and Autor (2011) and Firpo, Fortin, and Lemieux (2011), we distinguish between analytical, cognitive, interpersonal and manual skills. Our Brazilian data reveal that highly populated regions have a higher concentration of interpersonal, cognitive and analytical skills. The same applies to managers, scientists and skilled technicians, which correspond to the type of occupations with the highest wages/skills. The corollary is that manual skills can primarily be found in rural regions. Maciente (2013) reports similar results for Brazil but uses different skill definitions.
In both 1872 and 1920, two types of professions are outstanding for their significance to the current spatial distribution of skills: liberal and industrial professions. Industrialist professions comprise factory owners and manufacturing workers, and liberal professions are lawyers, judges, professors and teachers, among others. The data clearly reveal a monotone relation between the local concentration of liberal and industrial professions and the concentration of analytical, cognitive and interpersonal skills today. The skills in these types of occupations are the most highly paid and thus can be characterized as top skills. Using the concentration of other professions, such as vendors, unskilled workers, etc., as a placebo test reveals no significant relation to the current concentration of skills. Moreover, we observe that skill persistence seems to be related to the existence of a large consumer market. Controlling for other factors that may have stimulated the long-run development of regions, such as natural advantages (such as proximity to the sea and railroads), the dependence on slave labour and immigration in the past turns out to have a slight impact on the current distribution of skills without, however, undermining the relevance of the historical skill distribution. While these findings are consistent with several existing theories about long-run development, our data do not allow us to pin down a specific transmission channel. The fifth section presents a more extensive discussion about the possible interpretations of our findings.
The paper most closely related to the present one is Michaels et al. (2019). They report that in 1880 mainly manual/physical skills were concentrated in US urban areas, while interpersonal skills are currently predominant in those densely populated areas. A general equilibrium model with multiple regions, sectors, occupations and task/skills motivates their explanation for the economy's structural transformation. In essence, Michaels et al. (2019). conclude that the nature of agglomeration economies has changed over time. Owing to falling transport and task trade costs, the benefits of concentrating physical production have diminished, while the concentration of interpersonal skills in densely populated regions has become more attractive. This argument is perfectly consistent with our findings. Yet, not all densely populated regions exhibit a high concentration of interpersonal skills. The present paper adds to the picture that the transition of regions to the current equilibrium did not occur arbitrarily but was largely favoured by the regions' sectoral supply and demand capacity in the past. 2 Our findings are also related to papers on the dynamic aspect of agglomeration externalities and long-run economic development where the time frame of the analysis is of primary importance for the conclusions. Glaeser, Ponzetto, and Tobio (2014, p. 17) recognize that a high manufacturing share in the 1950s 'predicts the decline of cities', whereas we show that the opposite holds true at the outset of the Industrial Revolution. Glaeser, Kallal, Scheinkman, and Shleifer (1992) examine city-industries over a period of 22 years to test whether industrial specialization, competition or diversity are more favourable for employment growth. That paper focuses directly on the development of single localized industries and employment growth, whereas we investigate long-term transitions from the Industrial Revolution up to the present day. Henderson, Kuncoro, andTurner (1995, p. 1068) find evidence for dynamic externalities that 'lead to a buildup of local trade secrets' and that the stages of the industries' product life cycle are related to their skill requirements. One of the few papers that also applies econometric methods to analyze the long-run economic development since 1872 in Brazil is Reis (2014). He also acknowledges the importance of both geographical factors and institutions (such as slavery) in explaining the persistent regional income inequality in Brazil.

HISTORICAL BACKGROUND
This section describes what Brazil's economic and sociodemographic conditions were like in 1872, when its first census was conducted, in order to relate how education, transport infrastructure and geography, among others, may have affected the subsequent development and the skill persistence of regions.
In 1872, Brazil was an undemocratic, rural, sparsely populated, slave-owning society and above all very poor. Its per capita income was only 1.8 times the subsistence level, similar to Malawi or poorer than Rwanda or Somalia today. The majority of the almost 10 million inhabitants lived in municipalities close to the sea. The enormous size of the country and the Serra do Mar (a mountain range parallel to the Atlantic coast) have historically imposed high transportation costs on Brazil. Transport costs started to fall in the last decades of the 19th century with the expansion of the railway network, especially in the coffee-producing province of São Paulo. In 1870, there were only 678 km of railways; another 2504 km were completed over the following decade (Monasterio & Reis, 2008). Up to 1930 the extent of the railroad network had grown about 10-fold. Reis (2014) shows that the current extension of the railroads is greater but still runs along Skill concentration and persistence in Brazil 1545 the main lines from the past. Even so, the railroad system has never been of great importance for the transport of goods in Brazil. Around the end of the 19th century the largest part of interstate trade was handled via coastal shipping (Marcondes, 2012). Nowadays, truck traffic is more important than maritime commerce. According to the 1872 census, only 1.5 million people were literate. That dismal social situation persisted for a long time. According to Chaudhary, Musacchio, Nafziger, and Yan (2012), only 12% of school-age children were enrolled in primary school in 1910, while at the same time 80% of children in the UK, Germany and the United States were studying. Consequently, the number of teachers -7.2 per 1000 inhabitantswas also extremely low in Brazil, compared with 58 in the United States, 33 in Argentina and 15 in Chile (Kang, 2010, p. 43). In view of those circumstances, the wealthy have traditionally relied on private schools and private teachers. With the advent of industrialization, the local elites supported the expansion of mass schooling, though only in regions where there was a demand for skilled workers (Chaudhary et al., 2012). It was not until the end of 20th century that primary education was universalized. Likewise, the situation in higher education was also precarious. Some law, medical and engineering schools were established in the course of the 19th century, but in 1872 there was still no university in Brazil. The big wave of immigrantsmany of whom possessed better skills than the average Braziliangained pace from the last quarter of the 19th century onward. By 1920, the Brazilian population had reached 30 million, of which 1.6 million were foreigners or naturalized (Levy, 1974, p. 79). The fact of its having surpassed 190 million inhabitants by 2010 demonstrates that the major population boom occurred after 1920.
The first steps of Brazilian industrializationmostly light industries with imported machinerytook place in the last decades of 19th century. Manufacturing activities in 1872 were labour intensive and small scale; it was from the 1920s on that full-blown industrialization took place. The primary sector's share in GDP fell from 38% to 9% between 1920 and 1980. In the same period, the manufacturing industry grew from 12% to more than one-third of GDP (Reis, Cossio, Morandi, Medina, & Abreu, 2002, p. 248). Economic growth rates of the Brazilian economy have varied widely since 1872. It is estimated that the per capita income was stagnant in the last two decades of the 19th century. Between 1900 and 1930, income grew at a rate of 1.5% per year and accelerated to an impressive 3.3% a year between 1930 and 1980. Again, the following two decades were called lost decades (average growth of 0.2%) and, finally, moderate growth has returned in the last 15 years.
Regional inequalities in Brazil are well known. The classic paper by Williamson (1965) highlighted Brazil as an outlier: it had by far the highest measured regional disparity among 24 countries. Since then, Azzoni (2001) has identified slow b-convergence of per capita income at the state level. However, there remains a huge regional gap between the south/south-east and north/north-east. Even more recently, Monasterio (2010) and Reis (2014) have shown that places that were relatively developed in Brazil by 1872 remain so in the 21st century.

EMPIRICAL IMPLEMENTATION
The 1872, 1920 and 2010 censuses The first Brazilian census undertaken with reliable methods and complete coverage of Brazilian territory was carried out in 1872 (Botelho, 2005). The 1920 census also provides reliable data. For the purpose of the present study, we extracted the citizens' place of residence and their professions. The occupation categories are not directly comparable between 1872 and 1920. 3 An exact match between all categories is not even required by our empirical strategy because only the two most skilled of historical trades will be of main interest here: liberal and industrial professions. The former are divided into more categories than in 1872, while the manufacturing sector has undergone severe structural changes. To minimize consistency problems, all the liberal professions as well as the professions in the manufacturing sector are combined into two 'super-categories'. 4 In any case, we regard the fact that industrial production takes place as more important than the question of which type of manufactured product is actually produced by which occupation.
The most recent census stems from 2010. It also serves to provide information about worker's occupations and their place of residence. We follow the convention of restricting the sample to individuals aged between 15 and 65 years who are neither civil servants nor members of the armed forces.
Finally, we complement those census data sets with maps from the Brazilian Institute of Geography and Statistics (IBGE) to calculate the shortest distance from the centroid of a region to the coastline. The IPEA provides data on the average distance of each municipality to the state capital, the GDP in 1920 and the number of train stations in 1920. The large share of the informal sector in Brazil complicates a clear distinction between working or employable population even today. Therefore, we prefer to use the reliable measures of population from the censuses as our proxies for of the regions' size.

Generation of minimum comparable areas (AMCs)
The number of municipalities has increased from 624 in 1872 to 5570 today. For the analysis, however, we need a stable (comparable) spatial delineation of regions over time. We rely on the procedure developed in Ehrl (2017) to construct so-called minimum comparable areas (AMCs) for the period 1872-2010. The basic idea is to combine current municipalities so that the aggregates are consistent with the old borders of municipalities, taking all combinations and divisions of municipalities over the last 130 years into account. Consequently, we aggregate the number of inhabitants and all other region-specific variables from 1872, 1920 and 2010 in the AMCs

1546
Philipp Ehrl and Leonardo Monasterio displayed in Figure D1 in Appendix D in the supplemental data online. 5

Skills
A skill is a worker's endowment of capabilities for performing different activities (tasks) in everyday work life (Acemoglu & Autor, 2011). On the basis of the frequency and the intensity with which a specific task is performed and the skill that is required, a large number of occupations can be systematized in a straightforward way. Using a mapping of skills to occupations, workers' productive abilities may be compared along a small number of skill dimensions. A further advantage of skill measures is that the distinction of workers according to their actual activities is more meaningful for some issues of economic analysis than, for example, their formal education. Skills/tasks recently have found their way into labour economics and other related research fields through the work of Autor et al. (2003). They argue that technological progress in the form of information technology, computers and automation can replace routine and manual activities, whereas they complement analytical and cognitive skills. Consequently, one can also expect workers' skills to play an important role in the analysis of agglomeration economies. In Brazil, no workforce survey exists that assesses the skill/task requirements of workers in their occupations. Thus, the only way to make progress is to adopt data from another country. 6 One advantage thereof is that we can use established definitions that make this study comparable with existing work. Using the well-known definitions in Acemoglu and Autor (2011) and Firpo et al. (2011), each occupation is characterized by a unique value of analytical, cognitive, manual and face-to-face, that is, interpersonal skills. 7 The definition of those four skill measures is based on different elements from the O*NET data. See Appendix A in the supplemental data online for further details on the elements of the skill measures. That section also justifies why we focus mostly on face-to-face skills and it presents some numbers and graphs that relate the distribution of skills, occupations and wages in Brazil. Specifically, Figure A1 reveals that occupations that use analytical, cognitive and face-to-face skills intensively are at the upper end of the wage distribution, that is, they are most productive and/or the ones with the highest demand. The observed patterns resemble those in the United States or Germany.
To proceed with our analysis, we require a measure for the spatial concentration of skills. Studies on industrial concentration or diversity rely frequently on the Hirschman-Herfindahl index, but unlike the sectoral affiliation, a job's skill content is not a binary variable. Every worker possesses analytical, interpersonal, cognitive and manual skills, but to a different extent. Our dependent variable is a region-specific skill index that is calculated in two more steps. To make the skill scores comparable among one another, we standardize the occupation-specific values to a mean of 10 and a standard deviation of 1. Then we define the concentration indicesone for each skill categoryas the average of the standardized individual skill scores in the AMC. 8 The value of the index is thus independent of the regions' population size. Moreover, the index is easy to interpret, as higher values indicate a higher concentration of skills. Table D1 in Appendix D in the supplemental data online summarizes the name, source and year of reference for the main variables in the following estimations.

Empirical strategy
Having introduced the main variables, the following two basic specifications sketch the empirical analysis. In the main body of the present paper, the dependent variable will be the concentration of interpersonal skills in AMC k in t = 2010, as defined in equation (A.1) in Appendix A in the supplemental data online. In further extensions in Appendix A, we also present results for the concentration of cognitive and analytical skills, as well as for the concentration of high-skilled occupations (managers, scientists and skilled technicians) all of which are highly similar to what we obtain for the interpersonal skills.
The main variables of interest are occ o,k,t 0 , that is, the regional concentration of occupation o in AMC k in either t 0 = 1872 or 1920, defined as log of the number of professionals in occupation o per 1000 inhabitants. We expect that the top skills in the past, that is, industrial and liberal professions, are of primary importance for the concentration of skills today. In a robustness check, other occupations such as agricultural workers, sales persons, etc. are used alternatively. X contains control variables that may have determined the economic development of regions or that may still affect the attractiveness and productivity in AMC k: the distance to the sea and to the state capital, the presence of a railroad network in 1920 as well as the concentration of slaves and foreigners in each AMC in 1872. Probably the most obvious and most relevant persistent productivity advantage for Brazilian cities is the proximity to the sea. Shipping was the dominant transportation mode in Brazil at the turn of the 19th to the 20th century and today it continues to be favourable for exporters and specific industries (Marcondes, 2012). This location advantage is also evident because most of the major cities are close to the coastline. According to our data, 41% of GDP and 34% of the population are located in AMCs that are within a 50 km band from the coast. By similar reasoning, we include a dummy for the existence of a railroad network in each AMC in 1920. The fact that a given municipality was the state capital is also an obvious assurance for a prosperous development because the state government offers attractive jobs and in general a higher level of public goods. By the same reasoning, regions in close proximity to the state capital may also have been more attractive, principally for skilled workers. These natural and transport cost advantages are used by several other studies on development in Brazil, for example, Caselli and Michaels (2013) or Da Mata, Deichmann, Henderson, Lall, and Wang (2007).

Skill concentration and persistence in Brazil 1547
The 1872 census also registered how many foreigners and slaves were living in each region. In contrast to the presence of skilled occupations, the dependence on slave labour may indicate a backwardness of the local economy and we expect a negative relation to the concentration of interpersonal skills. A higher concentration of slaves may also indicate that the industrial structure of the region was dominated by agriculture (Reis, 2014). Immigrants were initially attracted by state-sponsored programmes to substitute the supply of slave labour. Rocha, Ferraz, and Soares (2017) and de Carvalho Filho and Monasterio (2012) document that the majority of non-Iberian immigrants at that time came from Europe and possessed above-average education. Regions with a larger share of immigrants show persistently higher education and economic development was also more favourable in retrospect. Finally, the initial market size could be another potential factor that facilitated the subsequent development. Therefore, we include either the region's GDP or population in 1920 as an additional control variable MS.
The second main specification includes an interaction between the measure of market size (MS k,t 0 ) and the regional skill concentration in the past:

Current concentration of skills
The present section provides an overview of the spatial distribution of skills in Brazil and of whether Brazil exhibits patterns comparable with those of other countries.
The vertical axes of the graphs in Figure 1 denote the concentration of the skill measures. The index of face-toface skills, for example, varies between 9.4 and 10.1. It already becomes evident that there are considerable differences in the vocational orientation of regions. Each scatter plot additionally reports the outcome of a simple ordinary least squares (OLS) regression of the skill concentration on the log population per AMC. All four graphs show a clear pattern. The larger the size of the region in terms of population, the more analytical, face-to-face and cognitive skills are required in the local economy. Figure 1(d) reveals that physical skills are unsurprisingly most used in rural regions. All four correlations are highly significant.
It is a general result of our study that analytical, cognitive and interpersonal skills show the same patterns not only in terms of spatial concentration but also in terms of the other dimensions. Figure 1(d) makes clear that manual skills are largely a mirror image to those other skills. A large intensity of manual skills may equally well be interpreted as the absence of analytical and interpersonal skills. Thus, for the rest of the study, we confine ourselves to the face-toface skill measure to avoid repetitive descriptions and results.
To get an idea of the dimension of the skill differences, consider the following stylized examples. São Luís do Quitunde in the federal state of Alagoas in the north of Brazil has an average analytical skill of 9.2, which is equivalent to a worker in a paper factory. In contrast, Florianópolis, the capital of the state of Santa Catarina, in the south has one of the highest analytical skill concentrations: an average occupation has the equivalent of a mechatronics technician or human resource analyst (9.6). In Areias (São Paulo), one of the smallest AMCs with only 3696 inhabitants, about 40% of the population makes a living from agriculture and another 33% works in the construction or retail trade. The face-to-face skill index in Areias is 9.47, which roughly corresponds to a worker in the construction sector. On the other side of the face-to-face skill distribution cities such as Recife, Belo Horizonte and Rio de Janeiro have a value of about 10. Artists, like musicians or art directors, have just such an interpersonal skill score.

Concentration of skills in the past
It is well known that settlement patterns are persistent over time. The majority of capital cities around the world have existed for centuries. Ciccone and Hall (1996), and many other authors after them, exploit the persistence of population density to instrument current levels with values from the 19th century. Note that the exact location of a major city is not always based on sound, rational causes but also depends on historic events, first-mover advantage and hysteresis (Fujita, Krugman, & Venables, 2001). Figure 2 illustrates the relations between the current concentration of interpersonal skills and the concentration of liberal and industrial professions in both 1872 and 1920, as measured by the log of the number of people with those professions per 1000 inhabitants. The upper left graph of Figure 2 shows that the proportion of industrial professions in 1872 can account for 21% of the spatial dispersion of interpersonal skills in 2010. The coefficient of this linear regression is highly significant and indicates that doubling the number of industrialists in a region is associated with a 0.06 (or 0.5 standard deviation) higher face-to-face skill index. The relation is weaker regarding the share of the liberal professions in 1872. This assessment definitely changed 48 years thereafter. Now the R 2 of the linear regression is equal to 42% and thus higher than that of the manufacturing sector. Regarding analytical and cognitive skills a very similar picture emerges (cf. Figures D2 and  D3 in Appendix D in the supplemental data online).
One possible interpretation of these observations is that the occupational structure of a region 140 or 90 years ago has been shaping the further economic development of that region until the present day. In particular, regions with a high concentration of industrial and liberal professions in 1872/1920 tend to have a high conglomeration of high-skilled jobs, which largely require interpersonal (and analytical) skills. It would certainly be quite ambitious to pin down a single channel through which history has shaped the present distribution of occupations and skills. Yet Figure 2 indicates that the distribution of historical trades seems to be closely related to the long-run industrial 1548 Philipp Ehrl and Leonardo Monasterio development of regions. The data also show that the less far we look into the past, the stronger becomes the relation to the previous occupational composition. The fifth section provides a discussion of possible explanations for the observed skill persistence. Table 1 contains some multivariate regressions to enable an understanding of the relative importance of the different skill concentrations in the past. In line with the simple regression lines displayed in Figure 2, the concentration of liberal professions in 1872 has the lowest correlation with the concentration of interpersonal skills, whereas for 1920 the concentration of liberal professions shows much greater persistence to current skills. Nevertheless, columns (2) and (3) indicate that the local concentrations of industrial and liberal professions in 1920 are equally significant for the spatial localization of skills today. In line with the previous statement, the relevance of both liberal and industrial professions supports the notion that there are multiple mechanisms that connect local skills to development.
Column (4) in Table 1 repeats the previous estimation with all four historical skill concentrations but includes controls for natural advantages and for other possible determinants of a favourable long-run development. The logs of the numbers of foreigners and slaves per 1000 inhabitants in 1872 are added in column (5), and column (6) finally also controls for the regions' initial endowment of wealth. If settlement decisions by industrialists and people with liberal professions as well as the long-run economic development up to the skill distribution in today's industries were substantially affected by those first-nature advantages, then the coefficients and in particular the explanatory power of our skill concentration variables should differ from the previous estimates. The estimates in Table 1 clearly indicate that this is not the case. In fact, only the distance to the state capital has a significant effect. We therefore conclude that although the distance from the coastline and the railroad transport system may represent a productivity advantage even today, their existence has not affected the distribution of productive skills in a crucial manner.

Combining historical trades and population size
Considering the possibility of heterogeneity between regions, we divide the AMCs according to two different variables that reflect the size of the market in the past: population size in 1872 and 1920 as well as GDP for which only data from 1920 are available. According to each of these three criteria variables, two groups of regions are defined. AMCs in the 'large' group have values of the criteria variable above the median in the distribution of AMCs. Consequently, the other half of the AMCs in the 'small' group has a below-average market size. 9 Table 2 presents the estimations after the division of the sample according to the agglomeration groups defined above. Again, we report only estimates for the face-toface skill concentration but the current concentrations of analytical and cognitive skills follow the same pattern. Each estimate contains the previous control variables, that is, the local share of slaves and foreigners, as well as the first-nature advantage proxies.
Three patterns stand out. First, no matter which of the three agglomeration group definitions is used, only the skill concentration for the largest group shows a significant coefficient at the 1% level. Second, the explained variation of the current distribution of skills in the group of AMCs with a large market is about twice as high as in the regions with small market size. Third, while the coefficients in the group of large regions are comparable with the previous estimations in Table 1, the coefficients of historical skills are either insignificant or seem to counterbalance each other. Table 2 points to another interesting observation that was not obvious from the previous estimations. The observed skill persistence seems to occur primarily in regions with a large market size in the past.
Appendix B in the supplemental data online contains further estimations that confirm the robustness of our results. First, we control for the size of the region in the past in order to distinguish the effect of the historical skill concentration from generic size and agglomeration effects. Second, a placebo test with the concentration of other professions in the past is performed and third, we run spatial autoregressive and spatial error models to test the dependence of the local development on the neighbouring regions, which apparently is not of major importance.

DISCUSSION
The results of the present paper indicate that the spatial distribution of top skills in Brazil is persistent over a period of nine or even 12 decades. While our data do not allow us to offer causal evidence on any of the possible transmission mechanisms, the current section at least provides two plausible explanations for our findings in the light of previous empirical and theoretical contributions. Given the long time span of the analyzed period, skill persistence cannot be traced back to single workers observed at different moments in time. The transmission of knowledge and skills can, however, be either inter-generational or through the local culture, institutions or the like. In a setting similar to the present one, Fritsch and Wyrwich (2014) argue that the persistence of new business formation is most likely due to a local entrepreneurship culture. Positive externalities from a concentration of workers with the 'same skilled trade' are one explanation for the transmission of knowledge within industries and over time (Marshall, 1890).
Another possible explanation for the persistence of skills is the positive long-run development in regions with a concentration of skilled trades and a large market size. The second section described how both education and manufacturing production in factories were scarce and hence quite valuable at the turn of the 19th century. The composition of the local economy was much more important than nowadays because overland transport costs were high. In a similar vein to von Thünen's quotation, Arrow (1962) stresses that physical production leads to dynamic benefits, that is, 'learning by doing'. In the presence of scale economies in the manufacturing sector and significant transport costs, higher local demand decreases unit production costs and the resulting lower prices attract even more individuals from other regions. Since those individuals are consumers and workers at the same time, this perpetual, self-reinforcing process provides firms with an ever greater demand and the necessary labour input to supply more products.
Besides the manual workforce, the consecutive development of manufacturing industries certainly also depended on the development of new ideas, production techniques that is, on education in generalprovided by professors and teachers. Lawyers and judges, for example, who are also part of the liberal professions may have served to stabilize the fragile institutions, maintain law and order and guarantee property rights, all essential conditions for investment and economic growth (Acemoglu, Johnson, & Robinson, 2001). Given that constellation, it seems plausible that favourable economic development was more likely in regions where the seed for knowledge and industrial production was already sown, whereas regions shaped by agriculture and slavery were likely to remain locked-in and without great impulses towards the continuous modernization that finally led to a concentration of high-skilled occupations. This interpretation is in line with the New

Skill concentration and persistence in Brazil 1551
Economic Geography, whose theory stresses that the circular linkages between economies of scale (supply) and market potential create a virtuous cycle for the local economy and lead to a continuous development of agglomerations (Fujita et al., 2001). Empirical assessments of these theories over the long-run are mainly concerned with testing the effects of market potential and they are focused on developed countries. Noteworthy studies are Crafts (2005) and Crafts and Mulatu (2005)

CONCLUSIONS
Research about the advantages of, and reasons for, the existence of agglomerations have a long tradition. The present paper adds to two fairly recent aspects of this literature.
Opening the specific content of occupations, several papers document that a concentration of interpersonal or 'soft skills' are essential aspects of agglomeration economies. Michaels et al. (2019) justify that this circumstance is the result of the modern production technology and the ever lower transport costs. Before these developments, the main productivity gains were achieved by the concentration of industrial (large-scale) production. We are the first to document how the specialization of regions in certain occupations changed over the course of a century. Specifically, we find that regions with either a high concentration of industrial occupations and/or liberal professions (professors, lawyers, etc.) over 90 years ago nowadays host a larger proportion of highly qualified workers who primarily require interpersonal, cognitive and analytical skills in their jobs.
Prior studies on long-run economic development have stressed the importance of population size, market potential or the role of institutions. The present results add to that by demonstrating that the relation between industrial/liberal professions in the past and the concentration of skills at present is particularly strong in regions that already had a large population/market in the past. Certainly, the distribution of knowledge and production does not guarantee a favourable development because, among many different factors, access to financial markets, politics, chance or the individual decision of businessmen also affect the local economy for better or worse. The interactions between the industrial structure and those other possible influences still offer many interesting questions for future research.
From a policy point of view, persistence works against the effectiveness of interventions in lagging regions. Moreover, our results suggest the existence of a critical mass and threshold effects for development. Brazilian history illustrates this challenge. For decades regional policies have been largely aimed at providing capital subsidies and infrastructure to poor areas but their results have been modest: huge regional inequalities are still a major problem. If we believe that the underlying forces of the observed process are still at work, we should expect further concentration and a magnification of regional inequalities in the future. The good news is that our results suggest that policies that change the local composition of skills may have powerful effects on regional trajectories. Once a significant change in the occupational structure is achieved, long-lasting payoffs can be expected. The fact that transport costs are much lower today implies that it has become easier to form an export base and enter a positive growth path.  (1)-(4), respectively, and according to the gross domestic product (GDP) in 1920 in columns (5) and (6). Regressions also control for the share of slaves and foreigners in each AMC, the GDP in 1920, as well as for the existence of a railroad network, the distance to the sea and the state capital. Robust standard errors are shown in parentheses. Regressions are weighted by population size in 2010. *Significance at 10%, **5% and ***1% levels.
For example, policy makers may try to attract firms that bring cognitive, analytical and interactive skills to the regions. Additionally, the creation of universities or training centres in lagging regions can improve the supply of such skills. and REAP meeting, UCLA, NARSC, CIDE, IPEA in Brasília and the UCB for the discussions and valuable comments. Philipp Ehrl gratefully acknowledges financial support from CAPES.

DISCLOSURE STATEMENT
No potential conflict of interest was reported by the authors.

NOTES
1. Even though the definition of skills in somewhat different in each paper, the overall implication of this evidence is straightforward. First, a concentration of interactive and analytical skills can be found in the largest, that is, most attractive, and productive regions. Second, a concentration of those skills seems to make workers more productive. 2. Further differences between the present paper and Michaels et al. (2019) exist regarding the definitions of regions and skills. Michaels et al. fix the task/skill definitions to describe economic activity in urban areas in both 1880 and today, whereas we define stable regions and follow the evolution of skills over time.
3. In 1920, occupations were classified into 48 different categories altogether, whereas in 1872, 36 categories were distinguished. 4. The census in 1872 considered the following occupations as liberal: religious professions, juridical professions, doctors, surgeons, pharmaceutics, midwives, artists, professors, teachers and 'men of letters'. The 1920 census registered basically the same groups of professions but only distinguished between religious, juridical, medical, magisterial and scientific professions. 5. Some of the 479 resulting AMCs have a vast area and obvious interpretation problems arise in these cases. Nevertheless, we refrain from arbitrarily excluding any regions and note that even by focusing only on regions that have the size of integrated labour markets (either 40,000, 10,000 or 2500 km 2 ) we obtained very similar results in those four cases. 6. The assignment of an occupation characteristics survey to another country has been made previously. Andersson et al. (2014), for example, transfer a German task classification into Swedish data, and Ehrl (2018) relies on the same mapping between Brazilian jobs and US skills in a study about offshoring. 7. Bacolod et al. (2009a) rely on a less recent survey (the DOT) for their skill definitions but use a similar category that they call 'people skills'. 8. See section A for the explicit formulas that aggregate and transform the O*NET skill elements to the final regional skill concentration indices. 9. Dividing the sample according to the population weighted mean, for example, results in a low number of AMCs in the 'Large' group. Given that the total number of AMCs is already quite low, the division according to the median into a roughly equal number of observations in both groups allows for a more reasonable identification of the effects.