数据集:
gem
预印本库:
arxiv:2102.01672许可:
GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation, both through human annotations and automated Metrics.
GEM aims to:
It is our goal to regularly update GEM and to encourage toward more inclusive practices in dataset development by extending existing data or developing datasets for additional languages.
You can find more complete information in the dataset cards for each of the subsets:
The subsets are organized by task:
{
"summarization": {
"mlsum": ["mlsum_de", "mlsum_es"],
"wiki_lingua": ["wiki_lingua_es_en", "wiki_lingua_ru_en", "wiki_lingua_tr_en", "wiki_lingua_vi_en"],
"xsum": ["xsum"],
},
"struct2text": {
"common_gen": ["common_gen"],
"cs_restaurants": ["cs_restaurants"],
"dart": ["dart"],
"e2e": ["e2e_nlg"],
"totto": ["totto"],
"web_nlg": ["web_nlg_en", "web_nlg_ru"],
},
"simplification": {
"wiki_auto_asset_turk": ["wiki_auto_asset_turk"],
},
"dialog": {
"schema_guided_dialog": ["schema_guided_dialog"],
},
}
Each example has one target per example in its training set, and a set of references (with one or more items) in its validation and test set.
An example of validation looks as follows.
{'concept_set_id': 0,
'concepts': ['field', 'look', 'stand'],
'gem_id': 'common_gen-validation-0',
'references': ['The player stood in the field looking at the batter.',
'The coach stands along the field, looking at the goalkeeper.',
'I stood and looked across the field, peacefully.',
'Someone stands, looking around the empty field.'],
'target': 'The player stood in the field looking at the batter.'}
cs_restaurants
An example of validation looks as follows.
{'dialog_act': '?request(area)',
'dialog_act_delexicalized': '?request(area)',
'gem_id': 'cs_restaurants-validation-0',
'references': ['Jakou lokalitu hledáte ?'],
'target': 'Jakou lokalitu hledáte ?',
'target_delexicalized': 'Jakou lokalitu hledáte ?'}
dart
An example of validation looks as follows.
{'dart_id': 0,
'gem_id': 'dart-validation-0',
'references': ['A school from Mars Hill, North Carolina, joined in 1973.'],
'subtree_was_extended': True,
'target': 'A school from Mars Hill, North Carolina, joined in 1973.',
'target_sources': ['WikiSQL_decl_sents'],
'tripleset': [['Mars Hill College', 'JOINED', '1973'], ['Mars Hill College', 'LOCATION', 'Mars Hill, North Carolina']]}
e2e_nlg
An example of validation looks as follows.
{'gem_id': 'e2e_nlg-validation-0',
'meaning_representation': 'name[Alimentum], area[city centre], familyFriendly[no]',
'references': ['There is a place in the city centre, Alimentum, that is not family-friendly.'],
'target': 'There is a place in the city centre, Alimentum, that is not family-friendly.'}
mlsum_de
An example of validation looks as follows.
{'date': '00/04/2019',
'gem_id': 'mlsum_de-validation-0',
'references': ['In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ihrer Wohnung gefunden worden. Nun stehen zwei Bekannte unter Verdacht.'],
'target': 'In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ihrer Wohnung gefunden worden. Nun stehen zwei Bekannte unter Verdacht.',
'text': 'Kerzen und Blumen stehen vor dem Eingang eines Hauses, in dem eine 18-jährige Frau tot aufgefunden wurde. In einer Kleinstadt auf der Insel Usedom war eine junge Frau tot in ...',
'title': 'Tod von 18-Jähriger auf Usedom: Zwei Festnahmen',
'topic': 'panorama',
'url': 'https://www.sueddeutsche.de/panorama/usedom-frau-tot-festnahme-verdaechtige-1.4412256'}
mlsum_es
An example of validation looks as follows.
{'date': '05/01/2019',
'gem_id': 'mlsum_es-validation-0',
'references': ['El diseñador que dio carta de naturaleza al estilo genuinamente americano celebra el medio siglo de su marca entre grandes fastos y problemas financieros. Conectar con las nuevas generaciones es el regalo que precisa más que nunca'],
'target': 'El diseñador que dio carta de naturaleza al estilo genuinamente americano celebra el medio siglo de su marca entre grandes fastos y problemas financieros. Conectar con las nuevas generaciones es el regalo que precisa más que nunca',
'text': 'Un oso de peluche marcándose un heelflip de monopatín es todo lo que Ralph Lauren necesitaba esta Navidad. Estampado en un jersey de lana azul marino, supone la guinda que corona ...',
'title': 'Ralph Lauren busca el secreto de la eterna juventud',
'topic': 'elpais estilo',
'url': 'http://elpais.com/elpais/2019/01/04/estilo/1546617396_933318.html'}
schema_guided_dialog
An example of validation looks as follows.
{'dialog_acts': [{'act': 2, 'slot': 'song_name', 'values': ['Carnivore']}, {'act': 2, 'slot': 'playback_device', 'values': ['TV']}],
'dialog_id': '10_00054',
'gem_id': 'schema_guided_dialog-validation-0',
'prompt': 'Yes, I would.',
'references': ['Please confirm the song Carnivore on tv.'],
'target': 'Please confirm the song Carnivore on tv.',
'turn_id': 15}
totto
An example of validation looks as follows.
{'example_id': '7391450717765563190',
'gem_id': 'totto-validation-0',
'highlighted_cells': [[3, 0], [3, 2], [3, 3]],
'overlap_subset': 'True',
'references': ['Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.',
'Daniel Henry Chamberlain was the 76th Governor of South Carolina, beginning in 1874.',
'Daniel Henry Chamberlain was the 76th Governor of South Carolina who took office in 1874.'],
'sentence_annotations': [{'final_sentence': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.',
'original_sentence': 'Daniel Henry Chamberlain (June 23, 1835 – April 13, 1907) was an American planter, lawyer, author and the 76th Governor of South Carolina '
'from 1874 until 1877.',
'sentence_after_ambiguity': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.',
'sentence_after_deletion': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.'},
...
],
'table': [[{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': '#'},
{'column_span': 2, 'is_header': True, 'row_span': 1, 'value': 'Governor'},
{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': 'Took Office'},
{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': 'Left Office'}],
[{'column_span': 1, 'is_header': True, 'row_span': 1, 'value': '74'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': '-'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': 'Robert Kingston Scott'},
{'column_span': 1, 'is_header': False, 'row_span': 1, 'value': 'July 6, 1868'}],
...
],
'table_page_title': 'List of Governors of South Carolina',
'table_section_text': 'Parties Democratic Republican',
'table_section_title': 'Governors under the Constitution of 1868',
'table_webpage_url': 'http://en.wikipedia.org/wiki/List_of_Governors_of_South_Carolina',
'target': 'Daniel Henry Chamberlain was the 76th Governor of South Carolina from 1874.',
'totto_id': 0}
web_nlg_en
An example of validation looks as follows.
{'category': 'Airport',
'gem_id': 'web_nlg_en-validation-0',
'input': ['Aarhus | leader | Jacob_Bundsgaard'],
'references': ['The leader of Aarhus is Jacob Bundsgaard.'],
'target': 'The leader of Aarhus is Jacob Bundsgaard.',
'webnlg_id': 'dev/Airport/1/Id1'}
web_nlg_ru
An example of validation looks as follows.
{'category': 'Airport',
'gem_id': 'web_nlg_ru-validation-0',
'input': ['Punjab,_Pakistan | leaderTitle | Provincial_Assembly_of_the_Punjab'],
'references': ['Пенджаб, Пакистан, возглавляется Провинциальной ассамблеей Пенджаба.', 'Пенджаб, Пакистан возглавляется Провинциальной ассамблеей Пенджаба.'],
'target': 'Пенджаб, Пакистан, возглавляется Провинциальной ассамблеей Пенджаба.',
'webnlg_id': 'dev/Airport/1/Id1'}
wiki_auto_asset_turk
An example of validation looks as follows.
{'gem_id': 'wiki_auto_asset_turk-validation-0',
'references': ['The Gandalf Awards honor excellent writing in in fantasy literature.'],
'source': 'The Gandalf Awards, honoring achievement in fantasy literature, were conferred by the World Science Fiction Society annually from 1974 to 1981.',
'source_id': '350_691837-1-0-0',
'target': 'The Gandalf Awards honor excellent writing in in fantasy literature.',
'target_id': '350_691837-0-0-0'}
wiki_lingua_es_en
An example of validation looks as follows.
'references': ["Practice matted hair prevention from early in your cat's life. Make sure that your cat is grooming itself effectively. Keep a close eye on cats with long hair."], 'source': 'Muchas personas presentan problemas porque no cepillaron el pelaje de sus gatos en una etapa temprana de su vida, ya que no lo consideraban necesario. Sin embargo, a medida que...', 'target': "Practice matted hair prevention from early in your cat's life. Make sure that your cat is grooming itself effectively. Keep a close eye on cats with long hair."}wiki_lingua_ru_en
An example of validation looks as follows.
{'gem_id': 'wiki_lingua_ru_en-val-0',
'references': ['Get immediate medical care if you notice signs of a complication. Undergo diagnostic tests to check for gallstones and complications. Ask your doctor about your treatment '
'options.'],
'source': 'И хотя, скорее всего, вам не о чем волноваться, следует незамедлительно обратиться к врачу, если вы подозреваете, что у вас возникло осложнение желчекаменной болезни. Это ...',
'target': 'Get immediate medical care if you notice signs of a complication. Undergo diagnostic tests to check for gallstones and complications. Ask your doctor about your treatment '
'options.'}
wiki_lingua_tr_en
An example of validation looks as follows.
{'gem_id': 'wiki_lingua_tr_en-val-0',
'references': ['Open Instagram. Go to the video you want to download. Tap ⋮. Tap Copy Link. Open Google Chrome. Tap the address bar. Go to the SaveFromWeb site. Tap the "Paste Instagram Video" text box. Tap and hold the text box. Tap PASTE. Tap Download. Download the video. Find the video on your Android.'],
'source': 'Instagram uygulamasının çok renkli kamera şeklindeki simgesine dokun. Daha önce giriş yaptıysan Instagram haber kaynağı açılır. Giriş yapmadıysan istendiğinde e-posta adresini ...',
'target': 'Open Instagram. Go to the video you want to download. Tap ⋮. Tap Copy Link. Open Google Chrome. Tap the address bar. Go to the SaveFromWeb site. Tap the "Paste Instagram Video" text box. Tap and hold the text box. Tap PASTE. Tap Download. Download the video. Find the video on your Android.'}
wiki_lingua_vi_en
An example of validation looks as follows.
{'gem_id': 'wiki_lingua_vi_en-val-0',
'references': ['Select the right time of year for planting the tree. You will usually want to plant your tree when it is dormant, or not flowering, during cooler or colder times of year.'],
'source': 'Bạn muốn cung cấp cho cây cơ hội tốt nhất để phát triển và sinh tồn. Trồng cây đúng thời điểm trong năm chính là yếu tố then chốt. Thời điểm sẽ thay đổi phụ thuộc vào loài cây ...',
'target': 'Select the right time of year for planting the tree. You will usually want to plant your tree when it is dormant, or not flowering, during cooler or colder times of year.'}
xsum
An example of validation looks as follows.
{'document': 'Burberry reported pre-tax profits of £166m for the year to March. A year ago it made a loss of £16.1m, hit by charges at its Spanish operations.\n'
'In the past year it has opened 21 new stores and closed nine. It plans to open 20-30 stores this year worldwide.\n'
'The group has also focused on promoting the Burberry brand online...',
'gem_id': 'xsum-validation-0',
'references': ['Luxury fashion designer Burberry has returned to profit after opening new stores and spending more on online marketing'],
'target': 'Luxury fashion designer Burberry has returned to profit after opening new stores and spending more on online marketing',
'xsum_id': '10162122'}
The data fields are the same among all splits.
common_gen| train | validation | test | |
|---|---|---|---|
| common_gen | 67389 | 993 | 1497 |
| train | validation | test | |
|---|---|---|---|
| cs_restaurants | 3569 | 781 | 842 |
| train | validation | test | |
|---|---|---|---|
| dart | 62659 | 2768 | 6959 |
| train | validation | test | |
|---|---|---|---|
| e2e_nlg | 33525 | 4299 | 4693 |
| train | validation | test | |
|---|---|---|---|
| mlsum_de | 220748 | 11392 | 10695 |
| train | validation | test | |
|---|---|---|---|
| mlsum_es | 259886 | 9977 | 13365 |
| train | validation | test | |
|---|---|---|---|
| schema_guided_dialog | 164982 | 10000 | 10000 |
| train | validation | test | |
|---|---|---|---|
| totto | 121153 | 7700 | 7700 |
| train | validation | test | |
|---|---|---|---|
| web_nlg_en | 35426 | 1667 | 1779 |
| train | validation | test | |
|---|---|---|---|
| web_nlg_ru | 14630 | 790 | 1102 |
| train | validation | test_asset | test_turk | |
|---|---|---|---|---|
| wiki_auto_asset_turk | 373801 | 73249 | 359 | 359 |
| train | validation | test | |
|---|---|---|---|
| wiki_lingua_es_en | 79515 | 8835 | 19797 |
| train | validation | test | |
|---|---|---|---|
| wiki_lingua_ru_en | 36898 | 4100 | 9094 |
| train | validation | test | |
|---|---|---|---|
| wiki_lingua_tr_en | 3193 | 355 | 808 |
| train | validation | test | |
|---|---|---|---|
| wiki_lingua_vi_en | 9206 | 1023 | 2167 |
| train | validation | test | |
|---|---|---|---|
| xsum | 23206 | 1117 | 1166 |
CC-BY-SA-4.0
@article{gem_benchmark,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a}}o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Thanks to @yjernite for adding this dataset.