{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Joining Dimensions data to Wikidata\n",
"\n",
"Grid identifiers are available in Wikidata. By using Sparql to query Wikidata alongside the Dimensions API, it is possible to join information in Dimensions to other attributes about institutions in Wikidata. In this example Wikidata is used to help us understand why some universities have such high numbers of papers with no external authors."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from dslquery import dslquery"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In example 8, we used Dimensions to find institutions with very high numbers of internal publications (publications with no external authors). In this example we use Wikidata to see if these numbers corellate with high numbers of students."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1) Query Wikidata, and put the results in a dataframe"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" inception | \n",
" students | \n",
"
\n",
" \n",
" grid | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" grid.497287.7 | \n",
" 1999-01-01T00:00:00Z | \n",
" 100 | \n",
"
\n",
" \n",
" grid.461653.3 | \n",
" 1979-01-01T00:00:00Z | \n",
" 110 | \n",
"
\n",
" \n",
" grid.448855.0 | \n",
" 2010-01-01T00:00:00Z | \n",
" 109 | \n",
"
\n",
" \n",
" grid.466243.1 | \n",
" 1925-01-01T00:00:00Z | \n",
" 104 | \n",
"
\n",
" \n",
" grid.465925.9 | \n",
" 1805-01-01T00:00:00Z | \n",
" 43 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" inception students\n",
"grid \n",
"grid.497287.7 1999-01-01T00:00:00Z 100\n",
"grid.461653.3 1979-01-01T00:00:00Z 110\n",
"grid.448855.0 2010-01-01T00:00:00Z 109\n",
"grid.466243.1 1925-01-01T00:00:00Z 104\n",
"grid.465925.9 1805-01-01T00:00:00Z 43"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#pip (or pip3) install sparqlwrapper\n",
"#https://rdflib.github.io/sparqlwrapper/\n",
"\n",
"from SPARQLWrapper import SPARQLWrapper, JSON\n",
"sparql = SPARQLWrapper(\"https://query.wikidata.org/sparql\")\n",
"sparql.setQuery(\"\"\"Select ?grid ?inception ?students\n",
"where {\n",
" ?inst wdt:P2427 ?grid;\n",
" wdt:P2196 ?students;\n",
" wdt:P571 ?inception .\n",
"}\"\"\")\n",
"sparql.setReturnFormat(JSON)\n",
"results = sparql.query().convert()\n",
"\n",
"cols = results['head']['vars']\n",
"\n",
"out = []\n",
"for row in results['results']['bindings']:\n",
" item = []\n",
" for c in cols:\n",
" item.append(row.get(c, {}).get('value'))\n",
" out.append(item)\n",
" \n",
"wddf = pd.DataFrame(out, columns=cols). \\\n",
" set_index('grid')\n",
"\n",
"wddf.head()\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2) Get internal collaboration information on institutions from Dimensions"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Execution time: 1.1190330982208252\n"
]
}
],
"source": [
"dsldf = pd.DataFrame(\n",
" dslquery(\"\"\"\n",
" search publications\n",
" where year > \"2012\"\n",
" and count(research_orgs) = 1\n",
" return research_orgs limit 200\n",
" \"\"\")['research_orgs']\n",
" ). \\\n",
" set_index('id')\n",
"dsldf.index.name = 'grid'"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" acronym | \n",
" count | \n",
" country_name | \n",
" name | \n",
"
\n",
" \n",
" grid | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" grid.11899.38 | \n",
" USP | \n",
" 29893 | \n",
" Brazil | \n",
" University of Sao Paulo | \n",
"
\n",
" \n",
" grid.12527.33 | \n",
" THU | \n",
" 25896 | \n",
" China | \n",
" Tsinghua University | \n",
"
\n",
" \n",
" grid.13402.34 | \n",
" ZJU | \n",
" 25513 | \n",
" China | \n",
" Zhejiang University | \n",
"
\n",
" \n",
" grid.17063.33 | \n",
" NaN | \n",
" 25437 | \n",
" Canada | \n",
" University of Toronto | \n",
"
\n",
" \n",
" grid.16821.3c | \n",
" SJTU | \n",
" 25139 | \n",
" China | \n",
" Shanghai Jiao Tong University | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" acronym count country_name name\n",
"grid \n",
"grid.11899.38 USP 29893 Brazil University of Sao Paulo\n",
"grid.12527.33 THU 25896 China Tsinghua University\n",
"grid.13402.34 ZJU 25513 China Zhejiang University\n",
"grid.17063.33 NaN 25437 Canada University of Toronto\n",
"grid.16821.3c SJTU 25139 China Shanghai Jiao Tong University"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dsldf.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Joining the data together\n",
"Although there is only a partial match in information, there appears to be a relationship between a high number of students, and a high number of internal publications"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" acronym | \n",
" count | \n",
" country_name | \n",
" name | \n",
" inception | \n",
" students | \n",
"
\n",
" \n",
" grid | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" grid.11899.38 | \n",
" USP | \n",
" 29893 | \n",
" Brazil | \n",
" University of Sao Paulo | \n",
" 1934-01-01T00:00:00Z | \n",
" 96364 | \n",
"
\n",
" \n",
" grid.12527.33 | \n",
" THU | \n",
" 25896 | \n",
" China | \n",
" Tsinghua University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.13402.34 | \n",
" ZJU | \n",
" 25513 | \n",
" China | \n",
" Zhejiang University | \n",
" 1897-05-21T00:00:00Z | \n",
" 39000 | \n",
"
\n",
" \n",
" grid.17063.33 | \n",
" NaN | \n",
" 25437 | \n",
" Canada | \n",
" University of Toronto | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.16821.3c | \n",
" SJTU | \n",
" 25139 | \n",
" China | \n",
" Shanghai Jiao Tong University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.214458.e | \n",
" UM | \n",
" 23737 | \n",
" United States | \n",
" University of Michigan | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.26999.3d | \n",
" UT | \n",
" 21234 | \n",
" Japan | \n",
" University of Tokyo | \n",
" 1877-04-12T00:00:00Z | \n",
" 28253 | \n",
"
\n",
" \n",
" grid.21107.35 | \n",
" JHU | \n",
" 20518 | \n",
" United States | \n",
" Johns Hopkins University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.4991.5 | \n",
" NaN | \n",
" 20507 | \n",
" United Kingdom | \n",
" University of Oxford | \n",
" 1096-01-01T00:00:00Z | \n",
" 19791 | \n",
"
\n",
" \n",
" grid.168010.e | \n",
" SU | \n",
" 20287 | \n",
" United States | \n",
" Stanford University | \n",
" 1891-01-01T00:00:00Z | \n",
" 16336 | \n",
"
\n",
" \n",
" grid.83440.3b | \n",
" UCL | \n",
" 18550 | \n",
" United Kingdom | \n",
" University College London | \n",
" 1826-01-01T00:00:00Z | \n",
" 23250 | \n",
"
\n",
" \n",
" grid.19006.3e | \n",
" UCLA | \n",
" 17649 | \n",
" United States | \n",
" University of California Los Angeles | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.5335.0 | \n",
" NaN | \n",
" 17343 | \n",
" United Kingdom | \n",
" University of Cambridge | \n",
" 1209-01-01T00:00:00Z | \n",
" 18977 | \n",
"
\n",
" \n",
" grid.17635.36 | \n",
" NaN | \n",
" 16919 | \n",
" United States | \n",
" University of Minnesota | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.17091.3e | \n",
" UBC | \n",
" 16708 | \n",
" Canada | \n",
" University of British Columbia | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.66875.3a | \n",
" NaN | \n",
" 16640 | \n",
" United States | \n",
" Mayo Clinic | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.34477.33 | \n",
" UW | \n",
" 16465 | \n",
" United States | \n",
" University of Washington | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.15276.37 | \n",
" UF | \n",
" 16165 | \n",
" United States | \n",
" University of Florida | \n",
" 1853-01-01T00:00:00Z | \n",
" 49500 | \n",
"
\n",
" \n",
" grid.38142.3c | \n",
" NaN | \n",
" 16047 | \n",
" United States | \n",
" Harvard University | \n",
" 1636-01-01T00:00:00Z | \n",
" 22000 | \n",
"
\n",
" \n",
" grid.25879.31 | \n",
" NaN | \n",
" 15877 | \n",
" United States | \n",
" University of Pennsylvania | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.258799.8 | \n",
" NaN | \n",
" 15692 | \n",
" Japan | \n",
" Kyoto University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.43169.39 | \n",
" XJTU | \n",
" 15623 | \n",
" China | \n",
" Xi'an Jiaotong University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.14003.36 | \n",
" UW | \n",
" 15534 | \n",
" United States | \n",
" University of Wisconsin–Madison | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.47100.32 | \n",
" NaN | \n",
" 15440 | \n",
" United States | \n",
" Yale University | \n",
" 1701-01-01T00:00:00Z | \n",
" 12336 | \n",
"
\n",
" \n",
" grid.19373.3f | \n",
" HIT | \n",
" 15384 | \n",
" China | \n",
" Harbin Institute of Technology | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.4886.2 | \n",
" RAS | \n",
" 14978 | \n",
" Russia | \n",
" Russian Academy of Sciences | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.261331.4 | \n",
" OSU | \n",
" 14642 | \n",
" United States | \n",
" The Ohio State University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.17089.37 | \n",
" NaN | \n",
" 14555 | \n",
" Canada | \n",
" University of Alberta | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.33199.31 | \n",
" HUST | \n",
" 14485 | \n",
" China | \n",
" Huazhong University of Science and Technology | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.64939.31 | \n",
" BUAA | \n",
" 14467 | \n",
" China | \n",
" Beihang University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" grid.4488.0 | \n",
" TUD | \n",
" 6587 | \n",
" Germany | \n",
" Dresden University of Technology | \n",
" 1828-01-01T00:00:00Z | \n",
" 36962 | \n",
"
\n",
" \n",
" grid.46072.37 | \n",
" UT | \n",
" 6586 | \n",
" Iran | \n",
" University of Tehran | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.410877.d | \n",
" UTM | \n",
" 6548 | \n",
" Malaysia | \n",
" University of Technology Malaysia | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.8532.c | \n",
" UFRGS | \n",
" 6511 | \n",
" Brazil | \n",
" Federal University of Rio Grande do Sul | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.5600.3 | \n",
" NaN | \n",
" 6479 | \n",
" United Kingdom | \n",
" Cardiff University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.26790.3a | \n",
" U Miami | \n",
" 6463 | \n",
" United States | \n",
" University of Miami | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.440785.a | \n",
" NaN | \n",
" 6457 | \n",
" China | \n",
" Jiangsu University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.14848.31 | \n",
" NaN | \n",
" 6445 | \n",
" Canada | \n",
" University of Montreal | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.4795.f | \n",
" NaN | \n",
" 6435 | \n",
" Spain | \n",
" Complutense University of Madrid | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.411327.2 | \n",
" HHU | \n",
" 6426 | \n",
" Germany | \n",
" Heinrich Heine University Düsseldorf | \n",
" 1965-11-16T00:00:00Z | \n",
" 33715 | \n",
"
\n",
" \n",
" grid.7737.4 | \n",
" UH | \n",
" 6409 | \n",
" Finland | \n",
" University of Helsinki | \n",
" 1640-01-01T00:00:00Z | \n",
" 38000 | \n",
"
\n",
" \n",
" grid.4793.9 | \n",
" NaN | \n",
" 6360 | \n",
" Greece | \n",
" Aristotle University of Thessaloniki | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.31880.32 | \n",
" BUPT | \n",
" 6358 | \n",
" China | \n",
" Beijing University of Posts and Telecommunicat... | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.267313.2 | \n",
" NaN | \n",
" 6356 | \n",
" United States | \n",
" The University of Texas Southwestern Medical C... | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.4643.5 | \n",
" NaN | \n",
" 6356 | \n",
" Italy | \n",
" Polytechnic University of Milan | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.1010.0 | \n",
" NaN | \n",
" 6334 | \n",
" Australia | \n",
" University of Adelaide | \n",
" 1874-01-01T00:00:00Z | \n",
" 25000 | \n",
"
\n",
" \n",
" grid.7700.0 | \n",
" NaN | \n",
" 6333 | \n",
" Germany | \n",
" Heidelberg University | \n",
" 1386-01-01T00:00:00Z | \n",
" 28413 | \n",
"
\n",
" \n",
" grid.32197.3e | \n",
" TIT | \n",
" 6324 | \n",
" Japan | \n",
" Tokyo Institute of Technology | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.5333.6 | \n",
" EPFL | \n",
" 6315 | \n",
" Switzerland | \n",
" Swiss Federal Institute of Technology in Lausanne | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.5947.f | \n",
" NTNU | \n",
" 6314 | \n",
" Norway | \n",
" Norwegian University of Science and Technology | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.20515.33 | \n",
" NaN | \n",
" 6276 | \n",
" Japan | \n",
" University of Tsukuba | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.5719.a | \n",
" NaN | \n",
" 6259 | \n",
" Germany | \n",
" University of Stuttgart | \n",
" 1967-01-01T00:00:00Z | \n",
" 26457 | \n",
"
\n",
" \n",
" grid.7372.1 | \n",
" NaN | \n",
" 6220 | \n",
" United Kingdom | \n",
" University of Warwick | \n",
" 1965-01-01T00:00:00Z | \n",
" 25615 | \n",
"
\n",
" \n",
" grid.7839.5 | \n",
" NaN | \n",
" 6218 | \n",
" Germany | \n",
" Goethe University Frankfurt | \n",
" 1914-01-01T00:00:00Z | \n",
" 46429 | \n",
"
\n",
" \n",
" grid.429017.9 | \n",
" IIT KGP | \n",
" 6197 | \n",
" India | \n",
" Indian Institute of Technology Kharagpur | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.16750.35 | \n",
" NaN | \n",
" 6142 | \n",
" United States | \n",
" Princeton University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.440736.2 | \n",
" NaN | \n",
" 6119 | \n",
" China | \n",
" Xidian University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.5477.1 | \n",
" NaN | \n",
" 6086 | \n",
" Netherlands | \n",
" Utrecht University | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.8404.8 | \n",
" NaN | \n",
" 6051 | \n",
" Italy | \n",
" University of Florence | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" grid.7177.6 | \n",
" UvA | \n",
" 6043 | \n",
" Netherlands | \n",
" University of Amsterdam | \n",
" 1632-01-01T00:00:00Z | \n",
" 31123 | \n",
"
\n",
" \n",
"
\n",
"
200 rows × 6 columns
\n",
"
"
],
"text/plain": [
" acronym count country_name \\\n",
"grid \n",
"grid.11899.38 USP 29893 Brazil \n",
"grid.12527.33 THU 25896 China \n",
"grid.13402.34 ZJU 25513 China \n",
"grid.17063.33 NaN 25437 Canada \n",
"grid.16821.3c SJTU 25139 China \n",
"grid.214458.e UM 23737 United States \n",
"grid.26999.3d UT 21234 Japan \n",
"grid.21107.35 JHU 20518 United States \n",
"grid.4991.5 NaN 20507 United Kingdom \n",
"grid.168010.e SU 20287 United States \n",
"grid.83440.3b UCL 18550 United Kingdom \n",
"grid.19006.3e UCLA 17649 United States \n",
"grid.5335.0 NaN 17343 United Kingdom \n",
"grid.17635.36 NaN 16919 United States \n",
"grid.17091.3e UBC 16708 Canada \n",
"grid.66875.3a NaN 16640 United States \n",
"grid.34477.33 UW 16465 United States \n",
"grid.15276.37 UF 16165 United States \n",
"grid.38142.3c NaN 16047 United States \n",
"grid.25879.31 NaN 15877 United States \n",
"grid.258799.8 NaN 15692 Japan \n",
"grid.43169.39 XJTU 15623 China \n",
"grid.14003.36 UW 15534 United States \n",
"grid.47100.32 NaN 15440 United States \n",
"grid.19373.3f HIT 15384 China \n",
"grid.4886.2 RAS 14978 Russia \n",
"grid.261331.4 OSU 14642 United States \n",
"grid.17089.37 NaN 14555 Canada \n",
"grid.33199.31 HUST 14485 China \n",
"grid.64939.31 BUAA 14467 China \n",
"... ... ... ... \n",
"grid.4488.0 TUD 6587 Germany \n",
"grid.46072.37 UT 6586 Iran \n",
"grid.410877.d UTM 6548 Malaysia \n",
"grid.8532.c UFRGS 6511 Brazil \n",
"grid.5600.3 NaN 6479 United Kingdom \n",
"grid.26790.3a U Miami 6463 United States \n",
"grid.440785.a NaN 6457 China \n",
"grid.14848.31 NaN 6445 Canada \n",
"grid.4795.f NaN 6435 Spain \n",
"grid.411327.2 HHU 6426 Germany \n",
"grid.7737.4 UH 6409 Finland \n",
"grid.4793.9 NaN 6360 Greece \n",
"grid.31880.32 BUPT 6358 China \n",
"grid.267313.2 NaN 6356 United States \n",
"grid.4643.5 NaN 6356 Italy \n",
"grid.1010.0 NaN 6334 Australia \n",
"grid.7700.0 NaN 6333 Germany \n",
"grid.32197.3e TIT 6324 Japan \n",
"grid.5333.6 EPFL 6315 Switzerland \n",
"grid.5947.f NTNU 6314 Norway \n",
"grid.20515.33 NaN 6276 Japan \n",
"grid.5719.a NaN 6259 Germany \n",
"grid.7372.1 NaN 6220 United Kingdom \n",
"grid.7839.5 NaN 6218 Germany \n",
"grid.429017.9 IIT KGP 6197 India \n",
"grid.16750.35 NaN 6142 United States \n",
"grid.440736.2 NaN 6119 China \n",
"grid.5477.1 NaN 6086 Netherlands \n",
"grid.8404.8 NaN 6051 Italy \n",
"grid.7177.6 UvA 6043 Netherlands \n",
"\n",
" name \\\n",
"grid \n",
"grid.11899.38 University of Sao Paulo \n",
"grid.12527.33 Tsinghua University \n",
"grid.13402.34 Zhejiang University \n",
"grid.17063.33 University of Toronto \n",
"grid.16821.3c Shanghai Jiao Tong University \n",
"grid.214458.e University of Michigan \n",
"grid.26999.3d University of Tokyo \n",
"grid.21107.35 Johns Hopkins University \n",
"grid.4991.5 University of Oxford \n",
"grid.168010.e Stanford University \n",
"grid.83440.3b University College London \n",
"grid.19006.3e University of California Los Angeles \n",
"grid.5335.0 University of Cambridge \n",
"grid.17635.36 University of Minnesota \n",
"grid.17091.3e University of British Columbia \n",
"grid.66875.3a Mayo Clinic \n",
"grid.34477.33 University of Washington \n",
"grid.15276.37 University of Florida \n",
"grid.38142.3c Harvard University \n",
"grid.25879.31 University of Pennsylvania \n",
"grid.258799.8 Kyoto University \n",
"grid.43169.39 Xi'an Jiaotong University \n",
"grid.14003.36 University of Wisconsin–Madison \n",
"grid.47100.32 Yale University \n",
"grid.19373.3f Harbin Institute of Technology \n",
"grid.4886.2 Russian Academy of Sciences \n",
"grid.261331.4 The Ohio State University \n",
"grid.17089.37 University of Alberta \n",
"grid.33199.31 Huazhong University of Science and Technology \n",
"grid.64939.31 Beihang University \n",
"... ... \n",
"grid.4488.0 Dresden University of Technology \n",
"grid.46072.37 University of Tehran \n",
"grid.410877.d University of Technology Malaysia \n",
"grid.8532.c Federal University of Rio Grande do Sul \n",
"grid.5600.3 Cardiff University \n",
"grid.26790.3a University of Miami \n",
"grid.440785.a Jiangsu University \n",
"grid.14848.31 University of Montreal \n",
"grid.4795.f Complutense University of Madrid \n",
"grid.411327.2 Heinrich Heine University Düsseldorf \n",
"grid.7737.4 University of Helsinki \n",
"grid.4793.9 Aristotle University of Thessaloniki \n",
"grid.31880.32 Beijing University of Posts and Telecommunicat... \n",
"grid.267313.2 The University of Texas Southwestern Medical C... \n",
"grid.4643.5 Polytechnic University of Milan \n",
"grid.1010.0 University of Adelaide \n",
"grid.7700.0 Heidelberg University \n",
"grid.32197.3e Tokyo Institute of Technology \n",
"grid.5333.6 Swiss Federal Institute of Technology in Lausanne \n",
"grid.5947.f Norwegian University of Science and Technology \n",
"grid.20515.33 University of Tsukuba \n",
"grid.5719.a University of Stuttgart \n",
"grid.7372.1 University of Warwick \n",
"grid.7839.5 Goethe University Frankfurt \n",
"grid.429017.9 Indian Institute of Technology Kharagpur \n",
"grid.16750.35 Princeton University \n",
"grid.440736.2 Xidian University \n",
"grid.5477.1 Utrecht University \n",
"grid.8404.8 University of Florence \n",
"grid.7177.6 University of Amsterdam \n",
"\n",
" inception students \n",
"grid \n",
"grid.11899.38 1934-01-01T00:00:00Z 96364 \n",
"grid.12527.33 NaN NaN \n",
"grid.13402.34 1897-05-21T00:00:00Z 39000 \n",
"grid.17063.33 NaN NaN \n",
"grid.16821.3c NaN NaN \n",
"grid.214458.e NaN NaN \n",
"grid.26999.3d 1877-04-12T00:00:00Z 28253 \n",
"grid.21107.35 NaN NaN \n",
"grid.4991.5 1096-01-01T00:00:00Z 19791 \n",
"grid.168010.e 1891-01-01T00:00:00Z 16336 \n",
"grid.83440.3b 1826-01-01T00:00:00Z 23250 \n",
"grid.19006.3e NaN NaN \n",
"grid.5335.0 1209-01-01T00:00:00Z 18977 \n",
"grid.17635.36 NaN NaN \n",
"grid.17091.3e NaN NaN \n",
"grid.66875.3a NaN NaN \n",
"grid.34477.33 NaN NaN \n",
"grid.15276.37 1853-01-01T00:00:00Z 49500 \n",
"grid.38142.3c 1636-01-01T00:00:00Z 22000 \n",
"grid.25879.31 NaN NaN \n",
"grid.258799.8 NaN NaN \n",
"grid.43169.39 NaN NaN \n",
"grid.14003.36 NaN NaN \n",
"grid.47100.32 1701-01-01T00:00:00Z 12336 \n",
"grid.19373.3f NaN NaN \n",
"grid.4886.2 NaN NaN \n",
"grid.261331.4 NaN NaN \n",
"grid.17089.37 NaN NaN \n",
"grid.33199.31 NaN NaN \n",
"grid.64939.31 NaN NaN \n",
"... ... ... \n",
"grid.4488.0 1828-01-01T00:00:00Z 36962 \n",
"grid.46072.37 NaN NaN \n",
"grid.410877.d NaN NaN \n",
"grid.8532.c NaN NaN \n",
"grid.5600.3 NaN NaN \n",
"grid.26790.3a NaN NaN \n",
"grid.440785.a NaN NaN \n",
"grid.14848.31 NaN NaN \n",
"grid.4795.f NaN NaN \n",
"grid.411327.2 1965-11-16T00:00:00Z 33715 \n",
"grid.7737.4 1640-01-01T00:00:00Z 38000 \n",
"grid.4793.9 NaN NaN \n",
"grid.31880.32 NaN NaN \n",
"grid.267313.2 NaN NaN \n",
"grid.4643.5 NaN NaN \n",
"grid.1010.0 1874-01-01T00:00:00Z 25000 \n",
"grid.7700.0 1386-01-01T00:00:00Z 28413 \n",
"grid.32197.3e NaN NaN \n",
"grid.5333.6 NaN NaN \n",
"grid.5947.f NaN NaN \n",
"grid.20515.33 NaN NaN \n",
"grid.5719.a 1967-01-01T00:00:00Z 26457 \n",
"grid.7372.1 1965-01-01T00:00:00Z 25615 \n",
"grid.7839.5 1914-01-01T00:00:00Z 46429 \n",
"grid.429017.9 NaN NaN \n",
"grid.16750.35 NaN NaN \n",
"grid.440736.2 NaN NaN \n",
"grid.5477.1 NaN NaN \n",
"grid.8404.8 NaN NaN \n",
"grid.7177.6 1632-01-01T00:00:00Z 31123 \n",
"\n",
"[200 rows x 6 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(dsldf, wddf, on='grid', how='left')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}