数据集:
ruanchaves/test_stanford
语言:
计算机处理:
monolingual语言创建人:
machine-generated批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:1501.03210许可:
Bansal等人手动标注的斯坦福情感分析数据集。
英文
{
"index": 1467856821,
"hashtag": "therapyfail",
"segmentation": "therapy fail",
"gold_position": 8,
"rank": {
"position": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20
],
"candidate": [
"therap y fail",
"the rap y fail",
"t her apy fail",
"the rap yfail",
"t he rap y fail",
"thera py fail",
"ther apy fail",
"th era py fail",
"therapy fail",
"therapy fai l",
"the r apy fail",
"the rapyfa il",
"the rapy fail",
"t herapy fail",
"the rapyfail",
"therapy f ai l",
"therapy fa il",
"the rapyf a il",
"therapy f ail",
"the ra py fail"
]
}
}
此配置文件上的所有标签分割和标识符拆分数据集具有相同的基本字段:hashtag和分割或标识符和分割。
hashtag和分割之间或标识符和分割之间唯一的区别是空格字符。拼写检查,展开缩略语或将字符更正为大写字母会进入其他字段。
字母数字字符和任何特殊字符序列(例如_,:,~)之间始终有空格。
如果有任何用于命名实体识别和其他令牌分类任务的注释,则以spans字段形式给出。
@misc{bansal2015deep,
title={Towards Deep Semantic Analysis Of Hashtags},
author={Piyush Bansal and Romil Bansal and Vasudeva Varma},
year={2015},
eprint={1501.03210},
archivePrefix={arXiv},
primaryClass={cs.IR}
}
该数据集是 @ruanchaves 在开发 hashformers 库时添加的。