模型:
Intel/dpt-large-ade
DPT 是在ADE20k上进行语义分割训练的密集预测Transformer模型。它由Ranftl等人在论文 Vision Transformers for Dense Prediction 中提出,并于 this repository 首次发布。
免责声明:发布DPT的团队没有为该模型撰写模型卡片,因此该模型卡片由Hugging Face团队编写。
DPT使用视觉Transformer(ViT)作为骨干,然后在其上方添加了颈部和头部,用于进行语义分割。
您可以使用原始模型进行语义分割。请查看 model hub 获取您感兴趣的任务的微调版本。
您可以按照以下步骤使用该模型:
from transformers import DPTFeatureExtractor, DPTForSemanticSegmentation
from PIL import Image
import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade")
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
有关更多代码示例,请参阅 documentation 。
@article{DBLP:journals/corr/abs-2103-13413,
author = {Ren{\'{e}} Ranftl and
Alexey Bochkovskiy and
Vladlen Koltun},
title = {Vision Transformers for Dense Prediction},
journal = {CoRR},
volume = {abs/2103.13413},
year = {2021},
url = {https://arxiv.org/abs/2103.13413},
eprinttype = {arXiv},
eprint = {2103.13413},
timestamp = {Wed, 07 Apr 2021 15:31:46 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2103-13413.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}