英文

"LexGLUE" 数据集卡

数据集概述

受到最近广泛使用的GLUE多任务基准NLP数据集(Wang等人,2018),随后更困难的SuperGLUE(Wang等人,2019),其他先前的多任务NLP基准(Conneau和Kiela,2018; McCann等人,2018)和其他领域的类似倡议的启发(Peng等人,2019),我们引入了 法律普通语言理解评估(LexGLUE)基准,这是一个评估NLP方法在法律任务中性能的基准数据集。LexGLUE基于七个现有的法律NLP数据集,使用大部分来自SuperGLUE的标准进行选择。

与GLUE和SuperGLUE(Wang等人,2019b,a)一样,我们的目标之一是推动通用(或“基础”)模型的发展,这些模型可以处理多个NLP任务,在我们的案例中是法律NLP任务,可能需要有限的任务特定微调。另一个目标是为希望探索或开发法律NLP方法的NLP研究人员和从业者提供方便和信息丰富的入门点。考虑到这些目标,我们在LexGLUE中包含的数据集及其所解决的任务已经简化了多个方面,以便新手和通用模型能够更容易地解决所有任务。

LexGLUE基准伴随着依赖于Hugging Face Transformers库的实验基础设施,并驻留在: https://github.com/coastalcph/lex-glue .

支持的任务和排行榜

支持的任务如下:

Dataset Source Sub-domain Task Type Classes
ECtHR (Task A) 1239321 ECHR Multi-label classification 10+1
ECtHR (Task B) 12310321 ECHR Multi-label classification 10+1
SCOTUS 12311321 US Law Multi-class classification 14
EUR-LEX 12312321 EU Law Multi-label classification 100
LEDGAR 12313321 Contracts Multi-class classification 100
UNFAIR-ToS 12314321 Contracts Multi-label classification 8+1
CaseHOLD 12315321 US Law Multiple choice QA n/a
ecthr_a

欧洲人权法院(ECtHR)审理一个国家违反欧洲人权公约(ECHR)人权条款的指控。对于每个案件,数据集提供来自案件描述的一系列事实段落(事实)。每个案件都与被违反的ECHR条款(如果有的话)进行映射。

ecthr_b

欧洲人权法院(ECtHR)审理一个国家违反欧洲人权公约(ECHR)人权条款的指控。对于每个案件,数据集提供来自案件描述的一系列事实段落(事实)。每个案件都与被法院认为被违反的ECHR条款进行映射。

scotus

美国最高法院(SCOTUS)是美国联邦最高法院,通常只审理最具争议或难度较大的案件,这些案件在较低法院已经没有得到足够解决。这是一个单标签多类别分类任务,给定一个文件(法院意见书),任务是预测相关的问题领域。14个问题领域聚类了278个问题,其焦点在于争议(纠纷)的主题。

eurlex

欧洲联盟(EU)的立法文件发布在EUR-Lex门户网站上。所有欧盟法律都由欧盟出版局用EuroVoc词库的多个概念进行注释。EuroVoc词库是由出版局维护的一个多语言词库,包含超过7,000个涉及欧盟及其成员国各种活动的概念(例如经济、医疗保健、贸易)。给定一个文件,任务是预测其EuroVoc标签(概念)。

ledgar

LEDGAR数据集旨在进行合同条款(段落)分类。合同条款来自美国证券交易委员会(SEC)的提交文件,这些文件可以从EDGAR公开获取。每个标签表示相应合同条款的单一主题(主题)。

unfair_tos

UNFAIR-ToS数据集包含来自在线平台(例如YouTube,eBay,Facebook等)的50个服务条款(ToS)。该数据集已根据欧洲消费者法律的定义,在句子级别上进行了不公平合同条款(句子)的注释,这意味着根据欧洲消费者法律,这些条款可能违反用户权利。

case_hold

CaseHOLD(Legal Decisions的案件持有)数据集包括有关美国法律案件的多项选择题,这些案件出自哈佛大学法律图书馆的案例法律文本库。案件持有是在相关案例的附带法律裁决摘要,用于解释当前案件的裁决摘要。输入包括来自法院裁决的摘录(或提示),其中包含对特定案例的引用,而持有语句被掩盖。模型必须从五个选项中识别出正确的(掩盖的)持有语句。

当前排行榜包括基于Transformer(Vaswaniet al。,2017)预训练语言模型,这些模型在大多数NLP任务(Bommasani等人,2021)和NLU基准测试(Wang等人,2019a)中均达到了最先进的性能。由 Chalkidis et al. (2021) 报告的结果:

任务测试结果

Dataset ECtHR A ECtHR B SCOTUS EUR-LEX LEDGAR UNFAIR-ToS CaseHOLD
Model μ-F1 / m-F1 μ-F1 / m-F1 μ-F1 / m-F1 μ-F1 / m-F1 μ-F1 / m-F1 μ-F1 / m-F1 μ-F1 / m-F1
TFIDF+SVM 64.7 / 51.7 74.6 / 65.1 78.2 / 69.5 71.3 / 51.4 87.2 / 82.4 95.4 / 78.8 n/a
Medium-sized Models (L=12, H=768, A=12)
BERT 71.2 / 63.6 79.7 / 73.4 68.3 / 58.3 71.4 / 57.2 87.6 / 81.8 95.6 / 81.3 70.8
RoBERTa 69.2 / 59.0 77.3 / 68.9 71.6 / 62.0 71.9 / 57.9 87.9 / 82.3 95.2 / 79.2 71.4
DeBERTa 70.0 / 60.8 78.8 / 71.0 71.1 / 62.7 72.1 / 57.4 88.2 / 83.1 95.5 / 80.3 72.6
Longformer 69.9 / 64.7 79.4 / 71.7 72.9 / 64.0 71.6 / 57.7 88.2 / 83.0 95.5 / 80.9 71.9
BigBird 70.0 / 62.9 78.8 / 70.9 72.8 / 62.0 71.5 / 56.8 87.8 / 82.6 95.7 / 81.3 70.8
Legal-BERT 70.0 / 64.0 80.4 / 74.7 76.4 / 66.5 72.1 / 57.4 88.2 / 83.0 96.0 / 83.0 75.3
CaseLaw-BERT 69.8 / 62.9 78.8 / 70.3 76.6 / 65.9 70.7 / 56.6 88.3 / 83.0 96.0 / 82.3 75.4
Large-sized Models (L=24, H=1024, A=18)
RoBERTa 73.8 / 67.6 79.8 / 71.6 75.5 / 66.3 67.9 / 50.3 88.6 / 83.6 95.8 / 81.6 74.4

平均(任务的平均值)测试结果

Averaging Arithmetic Harmonic Geometric
Model μ-F1 / m-F1 μ-F1 / m-F1 μ-F1 / m-F1
Medium-sized Models (L=12, H=768, A=12)
BERT 77.8 / 69.5 76.7 / 68.2 77.2 / 68.8
RoBERTa 77.8 / 68.7 76.8 / 67.5 77.3 / 68.1
DeBERTa 78.3 / 69.7 77.4 / 68.5 77.8 / 69.1
Longformer 78.5 / 70.5 77.5 / 69.5 78.0 / 70.0
BigBird 78.2 / 69.6 77.2 / 68.5 77.7 / 69.0
Legal-BERT 79.8 / 72.0 78.9 / 70.8 79.3 / 71.4
CaseLaw-BERT 79.4 / 70.9 78.5 / 69.7 78.9 / 70.3
Large-sized Models (L=24, H=1024, A=18)
RoBERTa 79.4 / 70.8 78.4 / 69.1 78.9 / 70.0

语言

我们仅考虑英文数据集,以便让全球的研究人员进行实验。

数据集结构

数据实例

ecthr_a

“训练”示例如下所示。

{
  "text": ["8. The applicant was arrested in the early morning of 21 October 1990 ...", ...],
  "labels": [6]
}
ecthr_b

“训练”示例如下所示。

{
  "text": ["8. The applicant was arrested in the early morning of 21 October 1990 ...", ...],
  "label": [5, 6]
}
scotus

“训练”示例如下所示。

{
  "text": "Per Curiam\nSUPREME COURT OF THE UNITED STATES\nRANDY WHITE, WARDEN v. ROGER L. WHEELER\n Decided December 14, 2015\nPER CURIAM.\nA death sentence imposed by a Kentucky trial court and\naffirmed by the ...",
  "label": 8
}
eurlex

“训练”示例如下所示。

{
  "text": "COMMISSION REGULATION (EC) No 1629/96 of 13 August 1996 on an invitation to tender for the refund on export of wholly milled round grain rice to certain third countries ...",
  "labels": [4, 20, 21, 35, 68]
}
ledgar

“训练”示例如下所示。

{
  "text": "All Taxes shall be the financial responsibility of the party obligated to pay such Taxes as determined by applicable law and neither party is or shall be liable at any time for any of the other party ...",
  "label": 32
}
unfair_tos

“训练”示例如下所示。

{
  "text": "tinder may terminate your account at any time without notice if it believes that you have violated this agreement.",
  "label": 2
}
casehold

“测试”示例如下所示。

{
  "context": "In Granato v. City and County of Denver, No. CIV 11-0304 MSK/BNB, 2011 WL 3820730 (D.Colo. Aug. 20, 2011), the Honorable Marcia S. Krieger, now-Chief United States District Judge for the District of Colorado, ruled similarly: At a minimum, a party asserting a Mo-nell claim must plead sufficient facts to identify ... to act pursuant to City or State policy, custom, decision, ordinance, re d 503, 506-07 (3d Cir.l985)(<HOLDING>).",
  "endings": ["holding that courts are to accept allegations in the complaint as being true including monell policies and writing that a federal court reviewing the sufficiency of a complaint has a limited task",
    "holding that for purposes of a class certification motion the court must accept as true all factual allegations in the complaint and may draw reasonable inferences therefrom", 
    "recognizing that the allegations of the complaint must be accepted as true on a threshold motion to dismiss", 
    "holding that a court need not accept as true conclusory allegations which are contradicted by documents referred to in the complaint", 
    "holding that where the defendant was in default the district court correctly accepted the fact allegations of the complaint as true"
  ],
  "label": 0
}

数据字段

ecthr_a
  • text: 一个string特征列表(案件描述中的事实段落列表)。
  • labels: 一个分类标签列表(违反的ECHR条款列表,如果有)。ECHR条款列表 "Article 2"、"Article 3"、"Article 5"、"Article 6"、"Article 8"、"Article 9"、"Article 10"、"Article 11"、"Article 14"、"Article 1 of Protocol 1"
ecthr_b
  • text: 一个string特征列表(案件描述中的事实段落列表)
  • labels: 一个分类标签列表(文章列表)。ECHR条款列表 "Article 2"、"Article 3"、"Article 5"、"Article 6"、"Article 8"、"Article 9"、"Article 10"、"Article 11"、"Article 14"、"Article 1 of Protocol 1"
scotus
  • text: 一个string特征(法院意见书)
  • label: 一个分类标签(相关问题领域)。问题领域列表 (1, Criminal Procedure)、(2, Civil Rights)、(3, First Amendment)、(4, Due Process)、(5, Privacy)、(6, Attorneys)、(7, Unions)、(8, Economic Activity)、(9, Judicial Power)、(10, Federalism)、(11, Interstate Relations)、(12, Federal Taxation)、(13, Miscellaneous)、(14, Private Action)
eurlex
  • text: 一个string特征(一条欧盟法律)
  • labels: 一个分类标签列表(相关的EUROVOC概念列表)。EUROVOC概念列表非常长,包括100个EUROVOC概念。您可以在 here 中找到EUROVOC概念描述信息。
ledgar
  • text: 一个string特征(合同条款/段落)
  • label: 一个分类标签(合同条款的类型)。合同条款类型列表 "Adjustments"、"Agreements"、"Amendments"、"Anti-Corruption Laws"、"Applicable Laws"、"Approvals"、"Arbitration"、"Assignments"、"Assigns"、"Authority"、"Authorizations"、"Base Salary"、"Benefits"、"Binding Effects"、"Books"、"Brokers"、"Capitalization"、"Change In Control"、"Closings"、"Compliance With Laws"、"Confidentiality"、"Consent To Jurisdiction"、"Consents"、"Construction"、"Cooperation"、"Costs"、"Counterparts"、"Death"、"Defined Terms"、"Definitions"、"Disability"、"Disclosures"、"Duties"、"Effective Dates"、"Effectiveness"、"Employment"、"Enforceability"、"Enforcements"、"Entire Agreements"、"Erisa"、"Existence"、"Expenses"、"Fees"、"Financial Statements"、"Forfeitures"、"Further Assurances"、"General"、"Governing Laws"、"Headings"、"Indemnifications"、"Indemnity"、"Insurances"、"Integration"、"Intellectual Property"、"Interests"、"Interpretations"、"Jurisdictions"、"Liens"、"Litigations"、"Miscellaneous"、"Modifications"、"No Conflicts"、"No Defaults"、"No Waivers"、"Non-Disparagement"、"Notices"、"Organizations"、"Participations"、"Payments"、"Positions"、"Powers"、"Publicity"、"Qualifications"、"Records"、"Releases"、"Remedies"、"Representations"、"Sales"、"Sanctions"、"Severability"、"Solvency"、"Specific Performance"、"Submission To Jurisdiction"、"Subsidiaries"、"Successors"、"Survival"、"Tax Withholdings"、"Taxes"、"Terminations"、"Terms"、"Titles"、"Transactions With Affiliates"、"Use Of Proceeds"、"Vacations"、"Venues"、"Vesting"、"Waiver Of Jury Trials"、"Waivers"、"Warranties"、"Withholdings"
unfair_tos
  • text: 一个string特征(ToS句子)
  • labels: 一个分类标签列表(不公平类型列表,如果有)。不公平类型列表 "Limitation of liability"、"Unilateral termination"、"Unilateral change"、"Content removal"、"Contract by using"、"Choice of law"、"Jurisdiction"、"Arbitration"
casehold
  • context: 一个string特征(上下文句子包括掩盖的持有语句)
  • holdings: 一个string特征列表(候选持有语句列表)
  • label: 一个分类标签(原始/正确持有语句的id)

数据拆分

Dataset Training Development Test Total
ECtHR (Task A) 9,000 1,000 1,000 11,000
ECtHR (Task B) 9,000 1,000 1,000 11,000
SCOTUS 5,000 1,400 1,400 7,800
EUR-LEX 55,000 5,000 5,000 65,000
LEDGAR 60,000 10,000 10,000 80,000
UNFAIR-ToS 5,532 2,275 1,607 9,414
CaseHOLD 45,000 3,900 3,900 52,800

数据集创建

策划理由

More Information Needed

源数据

Dataset Source Sub-domain Task Type
ECtHR (Task A) 1239321 ECHR Multi-label classification
ECtHR (Task B) 12310321 ECHR Multi-label classification
SCOTUS 12311321 US Law Multi-class classification
EUR-LEX 12312321 EU Law Multi-label classification
LEDGAR 12313321 Contracts Multi-class classification
UNFAIR-ToS 12314321 Contracts Multi-label classification
CaseHOLD 12315321 US Law Multiple choice QA
初始数据收集和归一化

More Information Needed

谁是源语言制作人?

More Information Needed

注释

注释过程

More Information Needed

谁是标注者?

More Information Needed

个人和敏感信息

More Information Needed

使用数据的考虑事项

数据的社会影响

More Information Needed

偏见讨论

More Information Needed

其他已知限制

More Information Needed

附加信息

More Information Needed

数据集策划者

Ilias Chalkidis,Abhik Jana,Dirk Hartung,Michael Bommarito,Ion Androutsopoulos,Daniel Martin Katz和Nikolaos Aletras。LexGLUE:用于英语法律语言理解的基准数据集。2022年。在计算语言学协会第60届年会的论文集中。爱尔兰都柏林。

许可信息

More Information Needed

引用信息

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, and Nikolaos Aletras. LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. 2022. In the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Ireland.

@inproceedings{chalkidis-etal-2021-lexglue,
        title={LexGLUE: A Benchmark Dataset for Legal Language Understanding in English}, 
        author={Chalkidis, Ilias and Jana, Abhik and Hartung, Dirk and
        Bommarito, Michael and Androutsopoulos, Ion and Katz, Daniel Martin and
        Aletras, Nikolaos},
        year={2022},
        booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
        address={Dubln, Ireland},
}

贡献

感谢 @iliaschalkidis 添加了该数据集。