数据集:
dali-does/clevr-math
任务:
语言:
计算机处理:
monolingual语言创建人:
machine-generated批注创建人:
machine-generated源数据集:
clevr预印本库:
arxiv:2208.05358许可:
基于CLEVR的用于组合多模态数学推理的数据集。
使用CLIPfrom transformers import CLIPPreprocessor
from datasets import load_dataset, DownloadConfig
dl_config = DownloadConfig(resume_download=True,
num_proc=8,
force_download=True)
# Load 'general' instance of dataset
dataset = load_dataset('dali-does/clevr-math', download_config=dl_config)
# Load version with only multihop in test data
dataset_multihop = load_dataset('dali-does/clevr-math', 'multihop',
download_config=dl_config)
model_path = "openai/clip-vit-base-patch32"
extractor = CLIPProcessor.from_pretrained(model_path)
def transform_tokenize(e):
e['image'] = [image.convert('RGB') for image in e['image']]
return extractor(text=e['question'],
images=e['image'],
padding=True)
dataset = dataset.map(transform_tokenize,
batched=True,
num_proc=8,
padding='max_length')
dataset_subtraction = dataset.filter(lambda e:
e['template'].startswith('subtraction'), num_proc=4)
加载数据,预处理文本。 排行榜将在稍后公布。
该数据集目前仅支持英语。要将数据集扩展到其他语言,需要将CLEVR模板改写成目标语言。
features = datasets.Features(
{
"template": datasets.Value("string"),
"id": datasets.Value("string"),
"question": datasets.Value("string"),
"image": datasets.Image(),
"label": datasets.Value("int64")
}
)
训练/验证/测试
使用CLEVR数据集提供的代码生成数据,使用blender和数据集管理员构建的模板。
[需要更多信息]
Adam Dahlgren Lindström - dali@cs.umu.se
根据知识共享署名相同方式共享4.0国际许可(CC-BY 4.0)
[需要更多信息]
@misc{https://doi.org/10.48550/arxiv.2208.05358,
doi = {10.48550/ARXIV.2208.05358},
url = {https://arxiv.org/abs/2208.05358},
author = {Lindström, Adam Dahlgren and Abraham, Savitha Sam},
keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7; I.2.10; I.2.6; I.4.8; I.1.4},
title = {CLEVR-Math: A Dataset for Compositional Language, Visual, and Mathematical Reasoning},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution Share Alike 4.0 International}
}
感谢 @dali-does 添加此数据集。