How workforce migrates in China: A perspective from social media
datasetposted on 03.02.2018 by Jichang Zhao
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Weibo Data Description:
Tweets of Weibo during the Spring Festival travel rush, from January 13 to February 21 in 2017 are thoroughly collected to establish the migration network of workforce at the granularity of city.
Number of cities: 371
Number of city pairs(directed): 120,361
Number of city pairs(undirected): 61,759
Number of all flux: 41,454,268
Each line in the file(WorkforceMigrate.csv) demonstrates a 12-tuple (city1,city2,flux,gdp1,gdp2,ave_gdp1,ave_gdp2,population1,population2,geographical_distance,travel_time,travel_distance) defined to denote workforce movement from city1 to city2. Details can be found as follows.
1. city1: origin city id
2. city2: destination city id
3. flux: number of movements from city1 to city2
4. gdp1: GDP of city1
5. gdp2: GDP of city2
6. avg_gdp1: the per capita GDP of city1
7. avg_gdp2: the per capita GDP of city2
8. population1: the number of permanent residents in city1
9. population2: the number of permanent residents in city2
10. geographical_distance: geographical distance between city1 and city2
11. travel_distance: travel distance from city1 to city2 provided by Baidu Map API
12. travel_time: travel time from city1 to city2 provided by Baidu Map API
The demographic and economic information in 2015 are collected at the granularity of province.
Province Data Format:
Each line in file(ProvinceInfo.csv) demonstrates a 8-tuple (province, gdp15, Information Technology Industry,Financial Industry,Real Estate Industry,Scientific Research and Technical Services Industry,income15,R&D) defined to denote the economci information of provinces. Details can be found as follows:
1. province: province id
2. gdp15: GDP of province
3. Information Technology Industry: ratio of practitioner in the information technology industry
4. Financial Industry: ratio of practitioner in the Financial industry
5. Real Estate Industry: ratio of practitioner in the real estate industry
6. Scientific Research and Technical Services Industry: ratio of practitioner in the scientific research and technical services industry
7. income15: per capita disposable income
8. R&D: the fund investment for research and development
Train Data Desciption:
The national railway line data, including 5,878 trains in total from train schedule are collected to establish the train network at the granularity of city.
Number of cities: 284 cities
Number of citi pairs(undirected): 12381
Each line in the file(Train.csv) demonstrate a triple (city1, city2, train_count) defined to denote trains that pass through city1 and city2. Details can be found as follows:
1. city1: city id
2. city2: city id
3. train_count: the number of trains that pass through city1 and city2
Any issues please feel free to contact email@example.com.