数据集:
wkrl/cord
任务:
子任务:
parsing语言:
计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original许可:
[More Information Needed]
[More Information Needed]
[More Information Needed]
{
"id": datasets.Value("string"),
"words": datasets.Sequence(datasets.Value("string")),
"bboxes": datasets.Sequence(datasets.Sequence(datasets.Value("int64"))),
"labels": datasets.Sequence(datasets.features.ClassLabel(names=_LABELS)),
"images": datasets.features.Image(),
}
Creative Commons Attribution 4.0 International License
@article{park2019cord,
title={CORD: A Consolidated Receipt Dataset for Post-OCR Parsing},
author={Park, Seunghyun and Shin, Seung and Lee, Bado and Lee, Junyeop and Surh, Jaeheung and Seo, Minjoon and Lee, Hwalsuk}
booktitle={Document Intelligence Workshop at Neural Information Processing Systems}
year={2019}
}
Thanks to @clovaai for adding this dataset.