The dataset used in the experiments on the paper "Modeling citation worthiness by using attention‑based
bidirectional long short‑term memory networks
and interpretable models"
There are one million sentences in total, and further splitted into trainning, validation and testing by 60%, 20% and 20%, respectively.
For the pre-processing of the dataset, please refer to the paper.
The data are stored in jsonl format (each row is an json object), we list a couple of rows as example:
{"sec_name":"introduction","cur_sent_id":"12213838@0#3$0","next_sent_id":"12213838@0#3$1","cur_sent":"All three spectrin subunits are essential for normal development.","next_sent":"βH, encoded by the karst locus, is an essential protein that is required for epithelial morphogenesis .","cur_scaled_len_features":{"type":1,"values":[0.17716535433070865,0.13513513513513514]},"next_scaled_len_features":{"type":1,"values":[0.32677165354330706,0.35135135135135137]},"cur_has_citation":0,"next_has_citation":1}
{"sec_name":"results","prev_sent_id":"12230634@1@1#0$2","cur_sent_id":"12230634@1@1#0$3","next_sent_id":"12230634@1@1#0$4","prev_sent":"μIU/ml at the 2.0-h postprandial time point.","cur_sent":"Statistically significant differences between the mean plasma insulin levels of dogs treated with 50 mg/kg of GSNO, and those treated with 50 mg/kg GSNO and vitamin C (50 mg/kg) were observed at the 1.0-h and 1.5-h time points (P < 0.05).","next_sent":"The mean plasma insulin concentrations in the dogs treated with 50 mg/kg of vitamin C and 50 mg/kg of GSNO, or 50 mg/kg of GSNO was significantly altered compared to those of controls or captopril-treated dogs (P < 0.05).","prev_scaled_len_features":{"type":1,"values":[0.09448818897637795,0.08108108108108109]},"cur_scaled_len_features":{"type":1,"values":[0.8582677165354331,1.0]},"next_scaled_len_features":{"type":1,"values":[0.7913385826771654,0.9459459459459459]},"prev_has_citation":0,"cur_has_citation":0,"next_has_citation":0}
{"sec_name":"results","prev_sent_id":"12213837@1@0#3$3","cur_sent_id":"12213837@1@0#3$4","next_sent_id":"12213837@1@0#3$5","prev_sent":"Cleavage of VAMP2 by BoNT/D releases the NH2-terminal 59 amino acids from the protein and eliminates exocytosis.","cur_sent":"However, in this case, exocytosis cannot be recovered by addition of the cleaved fragment .","next_sent":"Peptides that exactly correspond to the BoNT/D cleavage site (VAMP2 aa 25–59 and 60–94-cys) were equally efficient at mediating liposome fusion (unpublished data).","prev_scaled_len_features":{"type":1,"values":[0.36220472440944884,0.35135135135135137]},"cur_scaled_len_features":{"type":1,"values":[0.2795275590551181,0.2972972972972973]},"next_scaled_len_features":{"type":1,"values":[0.562992125984252,0.5135135135135135]},"prev_has_citation":0,"cur_has_citation":1,"next_has_citation":0}