microsoft/deberta-v2-xxlarge | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

microsoft/deberta-v2-xxlarge

任务:

填充掩码

类库:

PyTorch TensorFlow Transformers

语言:

其他:

deberta-v2 deberta

预印本库:

arxiv:2006.03654

许可:

mit

模型介绍文件清单

英文

DeBERTa：具有解缠注意力的增强BERT模型

DeBERTa 使用解缠注意力和增强的掩码解码器对BERT和RoBERTa模型进行改进。在使用80GB训练数据的大多数NLU任务上，其性能优于BERT和RoBERTa。

请查看 official repository 获取更多详细信息和更新。

这是DeBERTa V2 xxlarge模型，具有48层，1536隐藏大小。总参数为15亿，并使用160GB原始数据进行训练。

在NLU任务上进行微调

我们呈现了SQuAD 1.1/2.0和几个GLUE基准任务的开发结果。

Model	SQuAD 1.1	SQuAD 2.0	MNLI-m/mm	SST-2	QNLI	CoLA	RTE	MRPC	QQP	STS-B
F1/EM	F1/EM	Acc	Acc	Acc	MCC	Acc	Acc/F1	Acc/F1	P/S
BERT-Large	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa-Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet-Large	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
1235321 1	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
1236321 1	-/-	-/-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
1237321 1	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
1238321 1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1

备注。

1 根据RoBERTa，对于RTE、MRPC、STS-B，我们基于 DeBERTa-Large-MNLI 、 DeBERTa-XLarge-MNLI 、 DeBERTa-V2-XLarge-MNLI 、 DeBERTa-V2-XXLarge-MNLI 对任务进行微调。当从MNLI微调模型开始时，SST-2/QQP/QNLI/SQuADv2的结果也会稍微改善，但是对于这4个任务，我们仅报告从预训练基础模型开始微调的结果。
2 若要尝试带有 HF transformers 的XXLarge 模型，我们建议使用deepspeed，因为它更快且节省内存。

使用 Deepspeed 运行，

pip install datasets
pip install deepspeed

# Download the deepspeed config file
wget https://huggingface.co/microsoft/deberta-v2-xxlarge/resolve/main/ds_config.json -O ds_config.json

export TASK_NAME=mnli
output_dir="ds_results"
num_gpus=8
batch_size=8
python -m torch.distributed.launch --nproc_per_node=${num_gpus} \\
  run_glue.py \\
  --model_name_or_path microsoft/deberta-v2-xxlarge \\
  --task_name $TASK_NAME \\
  --do_train \\
  --do_eval \\
  --max_seq_length 256 \\
  --per_device_train_batch_size ${batch_size} \\
  --learning_rate 3e-6 \\
  --num_train_epochs 3 \\
  --output_dir $output_dir \\
  --overwrite_output_dir \\
  --logging_steps 10 \\
  --logging_dir $output_dir \\
  --deepspeed ds_config.json

您还可以使用 --sharded_ddp 运行

cd transformers/examples/text-classification/
export TASK_NAME=mnli
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py   --model_name_or_path microsoft/deberta-v2-xxlarge   \\
--task_name $TASK_NAME   --do_train   --do_eval   --max_seq_length 256   --per_device_train_batch_size 8   \\
--learning_rate 3e-6   --num_train_epochs 3   --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16

引用

如果您认为DeBERTa对您的工作有用，请引用以下论文：

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

作者:

Microsoft

数据集大小:

8.75 GB