数据集:
gem
预印本库:
arxiv:2102.01672许可:
GEM 是一个以评估为重点的自然语言生成基准环境,可以通过人工注释和自动指标进行评估。
GEM 的目标是:
我们的目标是定期更新 GEM,并通过扩展现有数据或为其他语言开发数据集来鼓励更具包容性的数据集开发实践。
您可以在每个子集的数据集卡中找到更完整的信息:
子集按任务进行组织:
{
"summarization": {
"mlsum": ["mlsum_de", "mlsum_es"],
"wiki_lingua": ["wiki_lingua_es_en", "wiki_lingua_ru_en", "wiki_lingua_tr_en", "wiki_lingua_vi_en"],
"xsum": ["xsum"],
},
"struct2text": {
"common_gen": ["common_gen"],
"cs_restaurants": ["cs_restaurants"],
"dart": ["dart"],
"e2e": ["e2e_nlg"],
"totto": ["totto"],
"web_nlg": ["web_nlg_en", "web_nlg_ru"],
},
"simplification": {
"wiki_auto_asset_turk": ["wiki_auto_asset_turk"],
},
"dialog": {
"schema_guided_dialog": ["schema_guided_dialog"],
},
}
每个示例的训练集中有一个目标,验证集和测试集中有一组参考(一个或多个项目)。
验证集示例如下。
{'concept_set_id': 0,
'concepts': ['field', 'look', 'stand'],
'gem_id': 'common_gen-validation-0',
'references': ['The player stood in the field looking at the batter.',
'The coach stands along the field, looking at the goalkeeper.',
'I stood and looked across the field, peacefully.',
'Someone stands, looking around the empty field.'],
'target': 'The player stood in the field looking at the batter.'}
cs_restaurants 验证集示例如下。
{'dialog_act': '?request(area)',
'dialog_act_delexicalized': '?request(area)',
'gem_id': 'cs_restaurants-validation-0',
'references': ['Jakou lokalitu hledáte ?'],
'target': 'Jakou lokalitu hledáte ?',
'target_delexicalized': 'Jakou lokalitu hledáte ?'}
dart 验证集示例如下。
{'dart_id': 0,
'gem_id': 'dart-validation-0',
'references': ['A school from Mars Hill, North Carolina, joined in 1973.'],
'subtree_was_extended': True,
'target': 'A school from Mars Hill, North Carolina, joined in 1973.',
'target_sources': ['WikiSQL_decl_sents'],
'tripleset': [['Mars Hill College', 'JOINED', '1973'], ['Mars Hill College', 'LOCATION', 'Mars Hill, North Carolina']]}
e2e_nlg 验证集示例如下。
{'gem_id': 'e2e_nlg-validation-0',
'meaning_representation': 'name[Alimentum], area[city centre], familyFriendly[no]',
'references': ['There is a place in the city centre, Alimentum, that is not family-friendly.'],
'target': 'There is a place in the city centre, Alimentum, that is not family-friendly.'}
mlsum_de 验证集示例如下。
{'date': '00/04/2019',
'gem_id': 'mlsum_de-validation-0',
'references': ['In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ihrer Wohnung gefunden worden. Nun stehen zwei Bekannte unter Verdacht.'],
'target': 'In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ihrer Wohnung gefunden worden. Nun stehen zwei Bekannte unter Verdacht.',
'text': 'Kerzen und Blumen stehen vor dem Eingang eines Hauses, in dem eine 18-jährige Frau tot aufgefunden wurde. In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ...',
'title': 'Tod von 18-Jähriger auf Usedom: Zwei Festnahmen',
'topic': 'panorama',
'url': 'https://www.sueddeutsche.de/panorama/usedom-frau-tot-festnahme-verdaechtige-1.4412256'}
mlsum_es 验证集示例如下。
{'date': '05/01/2019',
'gem_id': 'mlsum_es-validation-0',
'references': ['El diseñador que dio carta de naturaleza al estilo genuinamente americano celebra el medio siglo de su marca entre grandes fastos y problemas financieros. Conectar con las nuevas generaciones es el regalo que precisa más que nunca'],
'target': 'El diseñador que dio carta de naturaleza al estilo genuinamente americano celebra el medio siglo de su marca entre grandes fastos y problemas financieros. Conectar con las nuevas generaciones es el regalo que precisa más que nunca',
'text': 'Un oso de peluche marcándose un heelflip de monopatín es todo lo que Ralph Lauren necesitaba esta Navidad. Estampado en un jersey de lana azul marino, supone la guinda que corona ...',
'title': 'Ralph Lauren busca el secreto de la eterna juventud',
'topic': 'elpais estilo',
'url': 'http://elpais.com/elpais/2019/01/04/estilo/1546617396_933318.html'}
schema_guided_dialog 验证集示例如下。
{'dialog_acts': [{'act': 2, 'slot': 'song_name', 'values': ['Carnivore']}, {'act': 2, 'slot': 'playback_device', 'values': ['TV']}],
'dialog_id': '10_00054',
'gem_id': 'schema_guided_dialog-validation-0',
'prompt': 'Yes, I would.',
'references': ['Please confirm the song Carnivore on tv.'],
'target': 'Please confirm the song Carnivore on tv.',
'turn_id': 15}
totto 验证集示例如下。
{'example_id': '7391450717765563190',
'gem_id': 'totto-validation-0',
'highlighted_cells': [[3, 0], [3, 2], [3, 3]],
'overlap_subset': 'True',
'references': ['Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.',
'Daniel Henry Chamberlain was the 76th Governor of South Carolina, beginning in 1874.',
'Daniel Henry Chamberlain was the 76th Governor of South Carolina who took office in 1874.'],
'sentence_annotations': [{'final_sentence': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.',
'original_sentence': 'Daniel Henry Chamberlain (June 23, 1835 – April 13, 1907) was an American planter, lawyer, author and the 76th Governor of South Carolina '
'from 1874 until 1877.',
'sentence_after_ambiguity': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.',
'sentence_after_deletion': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.'},
...
],
'table': [[{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': '#'},
{'column_span': 2, 'is_header': True, 'row_span': 1, 'value': 'Governor'},
{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': 'Took Office'},
{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': 'Left Office'}],
[{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': '74'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '-'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': 'Robert Kingston Scott'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': 'July 6, 1868'}],
...
],
'table_page_title': 'List of Governors of South Carolina',
'table_section_text': 'Parties Democratic Republican',
'table_section_title': 'Governors under the Constitution of 1868',
'table_webpage_url': 'http://en.wikipedia.org/wiki/List_of_Governors_of_South_Carolina',
'target': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.',
'totto_id': 0}
web_nlg_en 验证集示例如下。
{'category': 'Airport',
'gem_id': 'web_nlg_en-validation-0',
'input': ['Aarhus | leader | Jacob_Bundsgaard'],
'references': ['The leader of Aarhus is Jacob Bundsgaard.'],
'target': 'The leader of Aarhus is Jacob Bundsgaard.',
'webnlg_id': 'dev/Airport/1/Id1'}
web_nlg_ru 验证集示例如下。
{'category': 'Airport',
'gem_id': 'web_nlg_ru-validation-0',
'input': ['Punjab,_Pakistan | leaderTitle | Provincial_Assembly_of_the_Punjab'],
'references': ['Пенджаб, Пакистан, возглавляется Провинциальной ассамблеей Пенджаба.', 'Пенджаб, Пакистан возглавляется Провинциальной ассамблеей Пенджаба.'],
'target': 'Пенджаб, Пакистан, возглавляется Провинциальной ассамблеей Пенджаба.',
'webnlg_id': 'dev/Airport/1/Id1'}
wiki_auto_asset_turk 验证集示例如下。
{'gem_id': 'wiki_auto_asset_turk-validation-0',
'references': ['The Gandalf Awards honor excellent writing in in fantasy literature.'],
'source': 'The Gandalf Awards, honoring achievement in fantasy literature, were conferred by the World Science Fiction Society annually from 1974 to 1981.',
'source_id': '350_691837-1-0-0',
'target': 'The Gandalf Awards honor excellent writing in in fantasy literature.',
'target_id': '350_691837-0-0-0'}
wiki_lingua_es_en 验证集示例如下。
'references': ["Practice matted hair prevention from early in your cat's life. Make sure that your cat is grooming itself effectively. Keep a close eye on cats with long hair."], 'source': 'Muchas personas presentan problemas porque no cepillaron el pelaje de sus gatos en una etapa temprana de su vida, ya que no lo consideraban necesario. Sin embargo, a medida que...', 'target': "Practice matted hair prevention from early in your cat's life. Make sure that your cat is grooming itself effectively. Keep a close eye on cats with long hair."}wiki_lingua_ru_en
验证集示例如下。
{'gem_id': 'wiki_lingua_ru_en-val-0',
'references': ['Get immediate medical care if you notice signs of a complication. Undergo diagnostic tests to check for gallstones and complications. Ask your doctor about your treatment '
'options.'],
'source': 'И хотя, скорее всего, вам не о чем волноваться, следует незамедлительно обратиться к врачу, если вы подозреваете, что у вас возникло осложнение желчекаменной болезни. Это ...',
'target': 'Get immediate medical care if you notice signs of a complication. Undergo diagnostic tests to check for gallstones and complications. Ask your doctor about your treatment '
'options.'}
wiki_lingua_tr_en 验证集示例如下。
{'gem_id': 'wiki_lingua_tr_en-val-0',
'references': ['Open Instagram. Go to the video you want to download. Tap ⋮. Tap Copy Link. Open Google Chrome. Tap the address bar. Go to the SaveFromWeb site. Tap the "Paste Instagram Video" text box. Tap and hold the text box. Tap PASTE. Tap Download. Download the video. Find the video on your Android.'],
'source': 'Instagram uygulamasının çok renkli kamera şeklindeki simgesine dokun. Daha önce giriş yaptıysan Instagram haber kaynağı açılır. Giriş yapmadıysan istendiğinde e-posta adresini ...',
'target': 'Open Instagram. Go to the video you want to download. Tap ⋮. Tap Copy Link. Open Google Chrome. Tap the address bar. Go to the SaveFromWeb site. Tap the "Paste Instagram Video" text box. Tap and hold the text box. Tap PASTE. Tap Download. Download the video. Find the video on your Android.'}
wiki_lingua_vi_en 验证集示例如下。
{'gem_id': 'wiki_lingua_vi_en-val-0',
'references': ['Select the right time of year for planting the tree. You will usually want to plant your tree when it is dormant, or not flowering, during cooler or colder times of year.'],
'source': 'Bạn muốn cung cấp cho cây cơ hội tốt nhất để phát triển và sinh tồn. Trồng cây đúng thời điểm trong năm chính là yếu tố then chốt. Thời điểm sẽ thay đổi phụ thuộc vào loài cây ...',
'target': 'Select the right time of year for planting the tree. You will usually want to plant your tree when it is dormant, or not flowering, during cooler or colder times of year.'}
xsum 验证集示例如下。
{'document': 'Burberry reported pre-tax profits of £166m for the year to March. A year ago it made a loss of £16.1m, hit by charges at its Spanish operations.\n'
'In the past year it has opened 21 new stores and closed nine. It plans to open 20-30 stores this year worldwide.\n'
'The group has also focused on promoting the Burberry brand online...',
'gem_id': 'xsum-validation-0',
'references': ['Luxury fashion designer Burberry has returned to profit after opening new stores and spending more on online marketing'],
'target': 'Luxury fashion designer Burberry has returned to profit after opening new stores and spending more on online marketing',
'xsum_id': '10162122'}
所有拆分的数据字段都是相同的。
common_gen| train | validation | test | |
|---|---|---|---|
| common_gen | 67389 | 993 | 1497 |
| train | validation | test | |
|---|---|---|---|
| cs_restaurants | 3569 | 781 | 842 |
| train | validation | test | |
|---|---|---|---|
| dart | 62659 | 2768 | 6959 |
| train | validation | test | |
|---|---|---|---|
| e2e_nlg | 33525 | 4299 | 4693 |
| train | validation | test | |
|---|---|---|---|
| mlsum_de | 220748 | 11392 | 10695 |
| train | validation | test | |
|---|---|---|---|
| mlsum_es | 259886 | 9977 | 13365 |
| train | validation | test | |
|---|---|---|---|
| schema_guided_dialog | 164982 | 10000 | 10000 |
| train | validation | test | |
|---|---|---|---|
| totto | 121153 | 7700 | 7700 |
| train | validation | test | |
|---|---|---|---|
| web_nlg_en | 35426 | 1667 | 1779 |
| train | validation | test | |
|---|---|---|---|
| web_nlg_ru | 14630 | 790 | 1102 |
| train | validation | test_asset | test_turk | |
|---|---|---|---|---|
| wiki_auto_asset_turk | 373801 | 73249 | 359 | 359 |
| train | validation | test | |
|---|---|---|---|
| wiki_lingua_es_en | 79515 | 8835 | 19797 |
| train | validation | test | |
|---|---|---|---|
| wiki_lingua_ru_en | 36898 | 4100 | 9094 |
| train | validation | test | |
|---|---|---|---|
| wiki_lingua_tr_en | 3193 | 355 | 808 |
| train | validation | test | |
|---|---|---|---|
| wiki_lingua_vi_en | 9206 | 1023 | 2167 |
| train | validation | test | |
|---|---|---|---|
| xsum | 23206 | 1117 | 1166 |
CC-BY-SA-4.0
@article{gem_benchmark,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a}}o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
感谢 @yjernite 添加此数据集。