A Large Parallel Corpus of Full-Text Scientific Articles

There is an easier version for MT available in Moses format (one sentence per line. The files start with moses_like.

If you use this dataset, please cite the following wordk:
We developed a parallel corpus of full-text scientific articles collected from Scielo database in the following languages: English, Portuguese and Spanish. The corpus is sentence aligned for all language pairs, as well as trilingual aligned for a small subset of sentences