ActivityNet-EKG: A resource for Video-Textual-Knowledge-Entity Linking
To understand the content of a document, consist of a video and some textual descriptions, an artificial agent will need to recognized jointly both the entities shown in the video and mentioned in the text, and to link them with its background knowledge. This is an important but at the same time a complex task, that we called Video-Textual-Knowledge-Entity-Linking (ViTEL). The ViTEL task aims to links both the video and textual entity mentions with the corresponding entity (candidate) of a knowledge-base (Ontology). Solving this problem will open a wide range of opportunities to improve and integrate the scientific community of multimedia and semantic web for solving different tasks efficiently. In this project, we proposed ActivityNet-EKG (ActivityNet-Entity-Knowledge-Graph) dataset, consisting of video clips, corresponding descriptions (captions), in which aligned visual and textual entity mentions are both annotated with the corresponding entities typed (class) according to DBpedia [1], and Wikidata [41] knowledge-bases. The ActivityNetEKG dataset can be used for training and evaluating algorithms solving the problem of Video-Textual-Knowledge-Entity-Linking.