数据集:
csebuetnlp/BanglaParaphrase
任务:
语言:
计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:2210.05109许可:
We present BanglaParaphrase, a high quality synthetic Bangla paraphrase dataset containing about 466k paraphrase pairs. The paraphrases ensures high quality by being semantically coherent and syntactically diverse.
from datasets import load_dataset
from datasets import load_dataset
ds = load_dataset("csebuetnlp/BanglaParaphrase")
 One example from the train part of the dataset is given below in JSON format.
{
"source": "বেশিরভাগ সময় প্রকৃতির দয়ার ওপরেই বেঁচে থাকতেন উপজাতিরা।", 
"target": "বেশিরভাগ সময়ই উপজাতিরা প্রকৃতির দয়ার উপর নির্ভরশীল ছিল।"
}
 Dataset with train-dev-test example counts are given below:
| Language | ISO 639-1 Code | Train | Validation | Test | 
|---|---|---|---|---|
| Bengali | bn | 419, 967 | 233, 31 | 233, 32 | 
Contents of this repository are restricted to only non-commercial research purposes under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) . Copyright of the dataset contents belongs to the original copyright holders.
@article{akil2022banglaparaphrase,
  title={BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset},
  author={Akil, Ajwad and Sultana, Najrin and Bhattacharjee, Abhik and Shahriyar, Rifat},
  journal={arXiv preprint arXiv:2210.05109},
  year={2022}
}