CMMLU: 中国版大规模多任务语言理解测评

首页： https://github.com/haonan-li/CMMLU
仓库： https://huggingface.co/datasets/haonan-li/cmmlu
论文： CMMLU: Measuring Chinese Massive Multitask Language Understanding .

简介

CMMLU 是一个全面的、专为评估中国语言和文化背景下LLM的高级知识和推理能力而设计的中文评估套件。 CMMLU涵盖了广泛的主题，涵盖了从初级到高级专业水平的67个主题。它包括需要计算机专业知识的学科，比如物理学和数学，以及人文社会科学学科。由于这些任务具有特定的语境细微差别和措辞，导致其中很多任务在其他语言中不容易翻译。此外，CMMLU中的许多任务具有仅适用于中国的答案，并不一定适用于其他地区或语言。

排行榜

最新的排行榜在我们的 github 中。

数据

我们为每个主题提供开发集和测试集数据，开发集中包含5个问题，测试集中包含100+个问题。

数据集中的每个问题都是一个带有4个选项的多选题，只有一个选项是正确答案。

这里有两个例子：

    题目：同一物种的两类细胞各产生一种分泌蛋白，组成这两种蛋白质的各种氨基酸含量相同，但排列顺序不同。其原因是参与这两种蛋白质合成的：
    A. tRNA种类不同
    B. 同一密码子所决定的氨基酸不同
    C. mRNA碱基序列不同
    D. 核糖体成分不同
    答案是：C

    题目：某种植物病毒V是通过稻飞虱吸食水稻汁液在水稻间传播的。稻田中青蛙数量的增加可减少该病毒在水稻间的传播。下列叙述正确的是：
    A. 青蛙与稻飞虱是捕食关系
    B. 水稻和病毒V是互利共生关系
    C. 病毒V与青蛙是寄生关系
    D. 水稻与青蛙是竞争关系
    答案是：

加载数据

from datasets import load_dataset
cmmlu=load_dataset(r"haonan-li/cmmlu", 'agronomy')
print(cmmlu['test'][0])

一次性加载所有数据

task_list = ['agronomy', 'anatomy', 'ancient_chinese', 'arts', 'astronomy', 'business_ethics', 'chinese_civil_service_exam', 'chinese_driving_rule', 'chinese_food_culture', 'chinese_foreign_policy', 'chinese_history', 'chinese_literature', 
'chinese_teacher_qualification', 'clinical_knowledge', 'college_actuarial_science', 'college_education', 'college_engineering_hydrology', 'college_law', 'college_mathematics', 'college_medical_statistics', 'college_medicine', 'computer_science',
'computer_security', 'conceptual_physics', 'construction_project_management', 'economics', 'education', 'electrical_engineering', 'elementary_chinese', 'elementary_commonsense', 'elementary_information_and_technology', 'elementary_mathematics', 
'ethnology', 'food_science', 'genetics', 'global_facts', 'high_school_biology', 'high_school_chemistry', 'high_school_geography', 'high_school_mathematics', 'high_school_physics', 'high_school_politics', 'human_sexuality',
'international_law', 'journalism', 'jurisprudence', 'legal_and_moral_basis', 'logical', 'machine_learning', 'management', 'marketing', 'marxist_theory', 'modern_chinese', 'nutrition', 'philosophy', 'professional_accounting', 'professional_law', 
'professional_medicine', 'professional_psychology', 'public_relations', 'security_study', 'sociology', 'sports_science', 'traditional_chinese_medicine', 'virology', 'world_history', 'world_religions']

from datasets import load_dataset
cmmlu = {k: load_dataset(r"haonan-li/cmmlu", k) for k in task_list}

引用

@misc{li2023cmmlu,
      title={CMMLU: Measuring massive multitask language understanding in Chinese}, 
      author={Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin},
      year={2023},
      eprint={2306.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

许可协议

CMMLU 数据集使用 Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License 许可协议。

作者:

haonan-li

数据集大小:

1.04 MB