ARCHIVE
ES_PT.tmx.tar.gz (593.3 kB)
ARCHIVE
EN_ES.tmx.tar.gz (23.74 MB)
ARCHIVE
EN_PT.tar.gz (338.58 MB)
ARCHIVE
EN_ES_PT.tmx.tar.gz (45.67 MB)
ARCHIVE
subset_well_parsed.json.tar.gz (183.81 MB)
ARCHIVE
moses_like_en_es.tar.gz (21.9 MB)
ARCHIVE
moses_like_en_pt_es.tar.gz (43.45 MB)
ARCHIVE
moses_like_en_pt.tar.gz (307.78 MB)
1/0
A Large Parallel Corpus of Full-Text Scientific Articles
Download all (965.51 MB) This item is shared privately
dataset
modified on 2019-01-21, 09:00 NOTE FOR WMT PARTICIPANTS:
There is an easier version for MT available in Moses format (one sentence per line. The files start with moses_like.
If you use this dataset, please cite the following wordk:
@InProceedings{L18-1546, author = "Soares, Felipe and Moreira, Viviane and Becker, Karin", title = "A Large Parallel Corpus of Full-Text Scientific Articles", booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018)", year = "2018", publisher = "European Language Resource Association", location = "Miyazaki, Japan", url = "http://aclweb.org/anthology/L18-1546" }