%0 Generic %A Khadka, Anita %D 2018 %T Citation-Context Dataset (C2D) %U https://ordo.open.ac.uk/articles/dataset/Citation-Context_Dataset_C2D_/6865298 %R 10.21954/ou.rd.6865298.v2 %2 https://ndownloader.figshare.com/files/12776420 %K Citation-Context %K Citation information %K Academic recommender system %K Text Mining %K Information Retrieval and Web Search %X

We have released the first version of a citation-context based dataset called C2D, created while doing an experiment in the work which will be published in RecSys 2018 as a short paper.


C2D dataset is created by using 2 million full-text open-source research publications obtained from CORE. It contains 53 million unique records of citation-information. To construct C2D, we extracted citation information from each publication. Information such as cited document's title, author(s), published date and citation-context. We will describe the assumption of extracting citation-context in a bit more detail below:


First of all, we extracted positions of citations where they are mentioned including citation-contexts which are texts around the cited documents. For our purpose, we created a citation-context using three sentences; the sentence where the reference has been cited, the preceding, and the following sentence. Additionally, at the start or end of a paragraph, the preceding or following sentence is not extracted respectively.


Therefore, the attributes of the dataset contain:

Attributes:

Please cite our paper if you use this dataset.


Note:

%I The Open University