10.6084/m9.figshare.5235823 Yaroslav Nechaev Yaroslav Nechaev Francesco Corcoglioniti Francesco Corcoglioniti Claudio Giuliano Claudio Giuliano SocialLink: knowledge transfer between social media and linked open data Springer Nature 2017 Linked data linked open data DBpedia Wikidata RDF Machine Learning Social Media data harvesting semi-structured data structured data knowledge transfer knowledge bases entity linking event detection user profiling supervised alignment approach Named Entity Linking 2017-10-18 13:16:11 Dataset https://springernature.figshare.com/articles/dataset/SocialLink_knowledge_transfer_between_social_media_and_linked_open_data/5235823 <p>This dataset contains canonical citations (DOIs) for the SocialLink dataset (15th May 2017 release), alignment data and code and entity data in .csv and .json format.</p><p>SocialLink is a publicly-available Linked Open Data dataset that matches social media accounts on Twitter to the corresponding entities in multiple language chapters of DBpedia. By effectively bridging the Twitter social media world and the Linked Open Data cloud, SocialLink enables knowledge transfer between the two: on the one hand, it supports Semantic Web practitioners in better harvesting the vast amounts of valuable, up-to-date information available in Twitter; on the other hand, it permits Social Media researchers to leverage DBpedia data when processing the noisy, semi-structured data of Twitter. </p><p>The SocialLink dataset is created by the SocialLink Pipeline, which aligns 271,000 DBpedia persons and organisations to their Twitter profiles via data acquisition, candidate acquisition and candidate selection phases. </p><p>Data files are stored in compressed .gz format that can be uncompressed using standard compression utilities. Diagrams are presented in .pdf format, .csv, .json and .java files can be accessed via text edit programs, .tql files can be accessed via MS SQL Server.</p><p><b>Format descriptions:</b><br></p><p><b>JSON</b> </p><p>JSON file is a single array containing an object for each DBpedia entity with similar structure.</p><p>Where <b>candidates </b>property contain the list of candidate IDs for each entity, while <b>scores </b>property contains a confidence score for each candidate reported by our candidate selection algorithm.</p><p><b>twitter_id</b> might be present in case a certain threshold is met (thresholds are selected according to the high F1 setup from our paper)</p><p><b>CSV</b></p><p>For each row of our CSV file contains info about a certain entity. Each row looks like this:</p><p>http://dbpedia.org/resource/MoShang,"[6887052,26735153,302784580,1331809652,2275404837,2597365788,1516978014,753046765809508356,1512300530,255873440]","[1.579205048600787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]",6887052</p><p>The columns contain the same data as in JSON format. If the Twitter ID can't be determined — 0 is used in the last column instead.</p><p><b>approach.pdf </b>and <b>rdf.pdf</b> provide visual representations of the SocialLink pipeline and RDF alignments. </p><p>For more detailed information on the RDF modeling choices see the associated publication, while extensive documentation is available via the SocialLink website (url below), covering: (i) dataset scope, format, statistics, and access mechanisms; (ii) instructions for deploying and running the SocialLink pipeline to recreate the resource; (iii) example applications using the dataset; and, (iv) links to external resources like the GitHub repository and issue tracker.</p><p>Code: <a href="https://github.com/Remper/sociallink">https://github.com/Remper/sociallink</a></p><p>SocialLink Website: <a href="http://sociallink.futuro.media/">http://sociallink.futuro.media/</a></p><p> </p>