模型:

facebook/sam-vit-base

类库:

PyTorch TensorFlow Transformers

其他:

sam mask-generation

许可:

apache-2.0

模型介绍文件清单

英文

Segment Anything模型（SAM）- ViT Base（ViT-B）版本的模型卡片

Segment Anything模型（SAM）的详细架构。

TL;DR

Link to original repository

Segment Anything模型（SAM）可以从输入提示（如点或框）生成高质量的对象掩码，并且可以用于生成图像中所有对象的掩码。它在1100万张图像和11亿个掩码的训练集上进行了训练，并且在各种分割任务的零样本性能上表现出色。该论文的摘要如下所述：

我们介绍Segment Anything（SA）项目：一个用于图像分割的新任务、模型和数据集。使用我们的高效模型进行数据收集，我们建立了迄今为止最大的分割数据集，包括超过11M个经过许可和尊重隐私的图像上的10亿个掩码。该模型被设计和训练成可提示，因此它可以在新的图像分布和任务上进行零样本转移。我们评估了它在许多任务上的功能，并发现它的零样本性能令人印象深刻，往往与甚至优于先前的全面监督结果相竞争。我们发布了Segment Anything模型（SAM）和相应的数据集（SA-1B），其中包含10亿个掩码和1100万张图像，以促进计算机视觉基础模型的研究。

免责声明：本模型卡片的内容由Hugging Face团队撰写，并且其中的部分内容是从原始 SAM model card 粘贴复制过来的。

模型细节

SAM模型由3个模块组成：

VisionEncoder：基于VIT的图像编码器。它使用对图像的补丁进行注意力计算图像嵌入。使用了相对位置嵌入。
PromptEncoder：为点和边界框生成嵌入
MaskDecoder：一个双向transformer，它在图像嵌入和点嵌入之间进行交叉注意力（->），并在点嵌入和图像嵌入之间进行交叉注意力。输出被输入到
Neck：根据由MaskDecoder产生的上下文掩码预测输出掩码。

使用方法

提示生成掩码

from PIL import Image
import requests
from transformers import SamModel, SamProcessor

model = SamModel.from_pretrained("facebook/sam-vit-base")
processor = SamProcessor.from_pretrained("facebook/sam-vit-base")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]] # 2D localization of a window

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to("cuda")
outputs = model(**inputs)
masks = processor.image_processor.post_process_masks(outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu())
scores = outputs.iou_scores

除了生成掩码的其他参数，您可以传递关于您感兴趣目标的2D位置的信息，包围目标的边界框（格式应为边界框的右上角和左下角点的x、y坐标），分割掩码。根据 the official repository ，在撰写本文时，According to the official repository , 目前不支持将文本作为输入。有关更多详细信息，请参阅这份笔记本，它展示了如何使用模型进行演示，包括一个视觉示例！

自动生成掩码

该模型可以用于以“零样本”方式生成分割掩码，给定一个输入图像。模型会自动提示一个包含1024个点的网格，这些点都会被输入模型。

此流程用于自动生成掩码。以下代码演示了您可以多么轻松地运行它（在任何设备上运行！只需提供适当的points_per_batch参数）

from transformers import pipeline
generator =  pipeline("mask-generation", device = 0, points_per_batch = 256)
image_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
outputs = generator(image_url, points_per_batch = 256)

现在来显示图像：

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)
    

plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs["masks"]:
    show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.show()

引用

如果您使用了该模型，请使用以下BibTeX条目进行引用。

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

作者:

Meta AI

数据集大小:

715.6 MB