模型:

timm/convnext_xlarge.fb_in22k_ft_in1k_384

英文

convnext_xlarge.fb_in22k_ft_in1k_384模型卡片

一个ConvNeXt图像分类模型。在ImageNet-22k上进行了预训练,并由论文作者在ImageNet-1k上进行了微调。

模型详情

模型用途

图像分类

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('convnext_xlarge.fb_in22k_ft_in1k_384', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

特征图提取

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_xlarge.fb_in22k_ft_in1k_384',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

for o in output:
    # print shape of each feature map in output
    # e.g.:
    #  torch.Size([1, 256, 96, 96])
    #  torch.Size([1, 512, 48, 48])
    #  torch.Size([1, 1024, 24, 24])
    #  torch.Size([1, 2048, 12, 12])

    print(o.shape)

图像嵌入

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_xlarge.fb_in22k_ft_in1k_384',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 2048, 12, 12) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

模型比较

在timm model results 中探索该模型的数据集和运行时指标。

所有计时数据基于PyTorch 1.13的eager模式在RTX 3090上使用AMP。

model top1 top5 img_size param_count gmacs macts samples_per_sec batch_size
1238321 88.848 98.742 512 660.29 600.81 413.07 28.58 48
1239321 88.668 98.738 384 660.29 337.96 232.35 50.56 64
12310321 88.612 98.704 256 846.47 198.09 124.45 122.45 256
12311321 88.312 98.578 384 200.13 101.11 126.74 196.84 256
12312321 88.196 98.532 384 197.96 101.1 126.74 128.94 128
12313321 87.968 98.47 320 200.13 70.21 88.02 283.42 256
12314321 87.75 98.556 384 350.2 179.2 168.99 124.85 192
12315321 87.646 98.422 384 88.72 45.21 84.49 209.51 256
12316321 87.476 98.382 384 197.77 101.1 126.74 194.66 256
12317321 87.344 98.218 256 200.13 44.94 56.33 438.08 256
12318321 87.26 98.248 224 197.96 34.4 43.13 376.84 256
12319321 87.138 98.212 384 88.59 45.21 84.49 365.47 256
12320321 87.002 98.208 224 350.2 60.98 57.5 368.01 256
12321321 86.796 98.264 384 88.59 45.21 84.49 366.54 256
12322321 86.74 98.022 224 88.72 15.38 28.75 624.23 256
12323321 86.636 98.028 224 197.77 34.4 43.13 581.43 256
12324321 86.504 97.97 384 88.59 45.21 84.49 368.14 256
12325321 86.344 97.97 256 88.59 20.09 37.55 816.14 256
12326321 86.256 97.75 224 660.29 115.0 79.07 154.72 256
12327321 86.182 97.92 384 50.22 25.58 63.37 516.19 256
12328321 86.154 97.68 256 88.59 20.09 37.55 819.86 256
12329321 85.822 97.866 224 88.59 15.38 28.75 1037.66 256
12330321 85.778 97.886 384 50.22 25.58 63.37 518.95 256
12331321 85.742 97.584 224 197.96 34.4 43.13 375.23 256
12332321 85.174 97.506 224 50.22 8.71 21.56 1474.31 256
12333321 85.118 97.608 384 28.59 13.14 39.48 856.76 256
12334321 85.112 97.63 384 28.64 13.14 39.48 491.32 256
12335321 84.874 97.09 224 88.72 15.38 28.75 625.33 256
12336321 84.562 97.394 224 50.22 8.71 21.56 1478.29 256
12337321 84.282 96.892 224 197.77 34.4 43.13 584.28 256
12338321 84.186 97.124 224 28.59 4.47 13.44 2433.7 256
12339321 84.084 97.14 384 28.59 13.14 39.48 862.95 256
12340321 83.894 96.964 224 28.64 4.47 13.44 1452.72 256
12341321 83.82 96.746 224 88.59 15.38 28.75 1054.0 256
12342321 83.37 96.742 384 15.62 7.22 24.61 801.72 256
12343321 83.142 96.434 224 50.22 8.71 21.56 1464.0 256
12344321 82.92 96.284 224 28.64 4.47 13.44 1425.62 256
12345321 82.898 96.616 224 28.59 4.47 13.44 2480.88 256
12346321 82.282 96.344 224 15.59 2.46 8.37 3926.52 256
12347321 82.216 95.852 224 28.59 4.47 13.44 2529.75 256
12348321 82.066 95.854 224 28.59 4.47 13.44 2346.26 256
12349321 82.03 96.166 224 15.62 2.46 8.37 2300.18 256
12350321 81.83 95.738 224 15.62 2.46 8.37 2321.48 256
12351321 80.866 95.246 224 15.65 2.65 9.38 3523.85 256
12352321 80.768 95.334 224 15.59 2.46 8.37 3915.58 256
12353321 80.304 95.072 224 9.07 1.37 6.1 3274.57 256
12354321 79.526 94.558 224 9.05 1.37 6.1 5686.88 256
12355321 79.522 94.692 224 9.06 1.43 6.5 5422.46 256
12356321 78.488 93.98 224 5.23 0.79 4.57 4264.2 256
12357321 77.86 93.83 224 5.23 0.82 4.87 6910.6 256
12358321 77.454 93.68 224 5.22 0.79 4.57 7189.92 256
12359321 76.664 93.044 224 3.71 0.55 3.81 4728.91 256
12360321 75.88 92.846 224 3.7 0.58 4.11 7963.16 256
12361321 75.664 92.9 224 3.7 0.55 3.81 8439.22 256

引用

@article{liu2022convnet,
  author  = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  title   = {A ConvNet for the 2020s},
  journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2022},
}
@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}