数据集:
big_patent
任务:
语言:
计算机处理:
monolingual语言创建人:
found批注创建人:
no-annotation源数据集:
original预印本库:
arxiv:1906.03741许可:
BIGPATENT 数据集包含了130万份美国专利文件记录,以及人工编写的摘要。每个美国专利申请都归类于一个合作专利分类(CPC)代码。共有九个分类类别:
当前默认的是2.1.2版本(修复大小写的原始字符串)和“all” CPC代码:
from datasets import load_dataset
ds = load_dataset("big_patent") # default is 'all' CPC codes
ds = load_dataset("big_patent", "all") # the same as above
ds = load_dataset("big_patent", "a") # only 'a' CPC codes
ds = load_dataset("big_patent", codes=["a", "b"])
要使用1.0.0版本(小写分词词语),请同时传入参数代码和版本:
ds = load_dataset("big_patent", codes="all", version="1.0.0")
ds = load_dataset("big_patent", codes="a", version="1.0.0")
ds = load_dataset("big_patent", codes=["a", "b"], version="1.0.0")
[需要更多信息]
英语
每个实例包含一对描述和摘要。描述是从专利的描述部分提取的,而摘要是从摘要部分提取的。
{
'description': 'FIELD OF THE INVENTION \n [0001] This invention relates to novel calcium phosphate-coated implantable medical devices and processes of making same. The unique calcium-phosphate coated implantable medical devices minimize...',
'abstract': 'This invention relates to novel calcium phosphate-coated implantable medical devices...'
}
| train | validation | test | |
|---|---|---|---|
| all | 1207222 | 67068 | 67072 |
| a | 174134 | 9674 | 9675 |
| b | 161520 | 8973 | 8974 |
| c | 101042 | 5613 | 5614 |
| d | 10164 | 565 | 565 |
| e | 34443 | 1914 | 1914 |
| f | 85568 | 4754 | 4754 |
| g | 258935 | 14385 | 14386 |
| h | 257019 | 14279 | 14279 |
| y | 124397 | 6911 | 6911 |
[需要更多信息]
[需要更多信息]
谁是源语言生成者?[需要更多信息]
[需要更多信息]
谁是注释者?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
@article{DBLP:journals/corr/abs-1906-03741,
author = {Eva Sharma and
Chen Li and
Lu Wang},
title = {{BIGPATENT:} {A} Large-Scale Dataset for Abstractive and Coherent
Summarization},
journal = {CoRR},
volume = {abs/1906.03741},
year = {2019},
url = {http://arxiv.org/abs/1906.03741},
eprinttype = {arXiv},
eprint = {1906.03741},
timestamp = {Wed, 26 Jun 2019 07:14:58 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1906-03741.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
感谢 @mattbui 添加了该数据集。