Biomedical Abstract Meaning Representation as Linked Data

Version 4 2016-10-14, 23:04

Version 3 2016-10-11, 15:27

Version 2 2016-10-11, 15:22

Version 1 2016-04-29, 23:59

dataset

posted on 2016-10-14, 23:04 authored by Gully BurnsGully Burns, Jose Luis AmbiteJose Luis Ambite, Ulf Hermjakob, The AMR Development Team

This provides the release of an RDF-based linked data translation of the bio-AMR-v0.8 data prepared by the L2K2R2 curation team (acknowledging the original curation work by ISI/USC and SDL, and released at http://amr.isi.edu/download.html).

This corpus includes annotations of cancer-related PubMed articles, covering 3 full papers (PMID:24651010, PMID:11777939, PMID:15630473) as well as the result sections of 46 additional PubMed papers. The corpus also includes about 1000 sentences each from the BEL BioCreative training corpus and the Chicago Corpus. The Bio AMR corpus is split into dev (500 snt.), training (5,452 snt.), test (500 snt.) sets and is available as RDF translations of core AMR data and automatically-aligned AMR data.

This release contains training, development and test files with an explanatory README file. We include a written note describing the differences between the Propbank and AMR representations used in this version.

UPDATE (10/11/2016). We have corrected some issues with grounding in the following two files:

1. amr-bio-v2.grounded.txt
2. amr-bio-v2.grounded.rdf

In this, we reconfigured our automatic grounding service to resolve certain typed entities to URLs from UNIPROT, the Gene Ontology and PubChem and linked to the best matching candidate in all cases (rather than presenting the top five).