数据集:
scielo
任务:
计算机处理:
multilingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:1905.01852许可:
A parallel corpus of full-text scientific articles collected from Scielo database in the following languages:English, Portuguese and Spanish. The corpus is sentence aligned for all language pairs, as well as trilingual aligned for a small subset of sentences. Alignment was carried out using the Hunalign algorithm.
The underlying task is machine translation.
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@inproceedings{soares2018large,
title={A Large Parallel Corpus of Full-Text Scientific Articles},
author={Soares, Felipe and Moreira, Viviane and Becker, Karin},
booktitle={Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018)},
year={2018}
}
Thanks to @patil-suraj for adding this dataset.