convnext_xlarge.fb_in22k_ft_in1k_384模型卡片

一个ConvNeXt图像分类模型。在ImageNet-22k上进行了预训练，并由论文作者在ImageNet-1k上进行了微调。

模型详情

模型类型：图像分类 / 特征主干
模型统计：
- 参数数目（M）：350.2
- GMACs：179.2
- 激活数目（M）：169.0
- 图像尺寸：384 x 384
论文：
- A ConvNet for the 2020s： https://arxiv.org/abs/2201.03545
原始文献： https://github.com/facebookresearch/ConvNeXt
数据集：ImageNet-1k
预训练数据集：ImageNet-22k

模型用途

图像分类

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('convnext_xlarge.fb_in22k_ft_in1k_384', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

特征图提取

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_xlarge.fb_in22k_ft_in1k_384',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

for o in output:
    # print shape of each feature map in output
    # e.g.:
    #  torch.Size([1, 256, 96, 96])
    #  torch.Size([1, 512, 48, 48])
    #  torch.Size([1, 1024, 24, 24])
    #  torch.Size([1, 2048, 12, 12])

    print(o.shape)

图像嵌入

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_xlarge.fb_in22k_ft_in1k_384',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 2048, 12, 12) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

模型比较

在timm model results 中探索该模型的数据集和运行时指标。

所有计时数据基于PyTorch 1.13的eager模式在RTX 3090上使用AMP。

model	top1	top5	img_size	param_count	gmacs	macts	samples_per_sec	batch_size
1238321	88.848	98.742	512	660.29	600.81	413.07	28.58	48
1239321	88.668	98.738	384	660.29	337.96	232.35	50.56	64
12310321	88.612	98.704	256	846.47	198.09	124.45	122.45	256
12311321	88.312	98.578	384	200.13	101.11	126.74	196.84	256
12312321	88.196	98.532	384	197.96	101.1	126.74	128.94	128
12313321	87.968	98.47	320	200.13	70.21	88.02	283.42	256
12314321	87.75	98.556	384	350.2	179.2	168.99	124.85	192
12315321	87.646	98.422	384	88.72	45.21	84.49	209.51	256
12316321	87.476	98.382	384	197.77	101.1	126.74	194.66	256
12317321	87.344	98.218	256	200.13	44.94	56.33	438.08	256
12318321	87.26	98.248	224	197.96	34.4	43.13	376.84	256
12319321	87.138	98.212	384	88.59	45.21	84.49	365.47	256
12320321	87.002	98.208	224	350.2	60.98	57.5	368.01	256
12321321	86.796	98.264	384	88.59	45.21	84.49	366.54	256
12322321	86.74	98.022	224	88.72	15.38	28.75	624.23	256
12323321	86.636	98.028	224	197.77	34.4	43.13	581.43	256
12324321	86.504	97.97	384	88.59	45.21	84.49	368.14	256
12325321	86.344	97.97	256	88.59	20.09	37.55	816.14	256
12326321	86.256	97.75	224	660.29	115.0	79.07	154.72	256
12327321	86.182	97.92	384	50.22	25.58	63.37	516.19	256
12328321	86.154	97.68	256	88.59	20.09	37.55	819.86	256
12329321	85.822	97.866	224	88.59	15.38	28.75	1037.66	256
12330321	85.778	97.886	384	50.22	25.58	63.37	518.95	256
12331321	85.742	97.584	224	197.96	34.4	43.13	375.23	256
12332321	85.174	97.506	224	50.22	8.71	21.56	1474.31	256
12333321	85.118	97.608	384	28.59	13.14	39.48	856.76	256
12334321	85.112	97.63	384	28.64	13.14	39.48	491.32	256
12335321	84.874	97.09	224	88.72	15.38	28.75	625.33	256
12336321	84.562	97.394	224	50.22	8.71	21.56	1478.29	256
12337321	84.282	96.892	224	197.77	34.4	43.13	584.28	256
12338321	84.186	97.124	224	28.59	4.47	13.44	2433.7	256
12339321	84.084	97.14	384	28.59	13.14	39.48	862.95	256
12340321	83.894	96.964	224	28.64	4.47	13.44	1452.72	256
12341321	83.82	96.746	224	88.59	15.38	28.75	1054.0	256
12342321	83.37	96.742	384	15.62	7.22	24.61	801.72	256
12343321	83.142	96.434	224	50.22	8.71	21.56	1464.0	256
12344321	82.92	96.284	224	28.64	4.47	13.44	1425.62	256
12345321	82.898	96.616	224	28.59	4.47	13.44	2480.88	256
12346321	82.282	96.344	224	15.59	2.46	8.37	3926.52	256
12347321	82.216	95.852	224	28.59	4.47	13.44	2529.75	256
12348321	82.066	95.854	224	28.59	4.47	13.44	2346.26	256
12349321	82.03	96.166	224	15.62	2.46	8.37	2300.18	256
12350321	81.83	95.738	224	15.62	2.46	8.37	2321.48	256
12351321	80.866	95.246	224	15.65	2.65	9.38	3523.85	256
12352321	80.768	95.334	224	15.59	2.46	8.37	3915.58	256
12353321	80.304	95.072	224	9.07	1.37	6.1	3274.57	256
12354321	79.526	94.558	224	9.05	1.37	6.1	5686.88	256
12355321	79.522	94.692	224	9.06	1.43	6.5	5422.46	256
12356321	78.488	93.98	224	5.23	0.79	4.57	4264.2	256
12357321	77.86	93.83	224	5.23	0.82	4.87	6910.6	256
12358321	77.454	93.68	224	5.22	0.79	4.57	7189.92	256
12359321	76.664	93.044	224	3.71	0.55	3.81	4728.91	256
12360321	75.88	92.846	224	3.7	0.58	4.11	7963.16	256
12361321	75.664	92.9	224	3.7	0.55	3.81	8439.22	256

引用

@article{liu2022convnet,
  author  = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  title   = {A ConvNet for the 2020s},
  journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2022},
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}

作者:

PyTorch Image Models

数据集大小:

2.61 GB