There is an easier version for MT available in Moses format (one sentence per line. The files start with moses_like.
If you use this dataset, please cite the following wordk:
@InProceedings{L18-1546,
author = "Soares, Felipe
and Moreira, Viviane
and Becker, Karin",
title = "A Large Parallel Corpus of Full-Text Scientific Articles",
booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018)",
year = "2018",
publisher = "European Language Resource Association",
location = "Miyazaki, Japan",
url = "http://aclweb.org/anthology/L18-1546"
}
We developed a parallel corpus of full-text scientific articles collected from Scielo database in the following languages: English, Portuguese and Spanish. The corpus is sentence aligned for all language pairs, as well as trilingual aligned for a small subset of sentences