模型:
timm/coatnet_rmlp_2_rw_224.sw_in1k
任务:
许可:
一个 timm 特定的 CoAtNet 模型(带有受 Swin-V2 启发的 MLP Log-CPB(连续对数坐标相对位置偏差)的图像分类模型),由 Ross Wightman 在 ImageNet-1k 上使用 timm 进行训练。
ImageNet-1k 训练使用 TPU 支持,感谢 TRC 计划的支持。
MaxxViT 包含一些相关的模型架构,共享一个共同的结构,包括:
除了上述主要变体,每个模型之间还存在一些细微的变化。包含字符串 rw 的任何模型名称都是 timm 的特定配置,模型调整有利于 PyTorch eager 使用。这些模型是在训练初始复现模型时创建的,因此存在一些差异。带有字符串 tf 的所有模型都是与原始论文作者基于 Tensorflow 的模型完全匹配,权重已转换为 PyTorch。这涵盖了许多 MaxViT 模型。官方的 CoAtNet 模型从未发布过。
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('coatnet_rmlp_2_rw_224.sw_in1k', pretrained=True)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'coatnet_rmlp_2_rw_224.sw_in1k',
pretrained=True,
features_only=True,
)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
for o in output:
# print shape of each feature map in output
# e.g.:
# torch.Size([1, 128, 112, 112])
# torch.Size([1, 128, 56, 56])
# torch.Size([1, 256, 28, 28])
# torch.Size([1, 512, 14, 14])
# torch.Size([1, 1024, 7, 7])
print(o.shape)
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'coatnet_rmlp_2_rw_224.sw_in1k',
pretrained=True,
num_classes=0, # remove classifier nn.Linear
)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
# or equivalently (without needing to set num_classes=0)
output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 1024, 7, 7) shaped tensor
output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor
| model | top1 | top5 | samples / sec | Params (M) | GMAC | Act (M) |
|---|---|---|---|---|---|---|
| 12310321 | 88.53 | 98.64 | 21.76 | 475.77 | 534.14 | 1413.22 |
| 12311321 | 88.32 | 98.54 | 42.53 | 475.32 | 292.78 | 668.76 |
| 12312321 | 88.20 | 98.53 | 50.87 | 119.88 | 138.02 | 703.99 |
| 12313321 | 88.04 | 98.40 | 36.42 | 212.33 | 244.75 | 942.15 |
| 12314321 | 87.98 | 98.56 | 71.75 | 212.03 | 132.55 | 445.84 |
| 12315321 | 87.92 | 98.54 | 104.71 | 119.65 | 73.80 | 332.90 |
| 12316321 | 87.81 | 98.37 | 106.55 | 116.14 | 70.97 | 318.95 |
| 12317321 | 87.47 | 98.37 | 149.49 | 116.09 | 72.98 | 213.74 |
| 12318321 | 87.39 | 98.31 | 160.80 | 73.88 | 47.69 | 209.43 |
| 12319321 | 86.89 | 98.02 | 375.86 | 116.14 | 23.15 | 92.64 |
| 12320321 | 86.64 | 98.02 | 501.03 | 116.09 | 24.20 | 62.77 |
| 12321321 | 86.60 | 97.92 | 50.75 | 119.88 | 138.02 | 703.99 |
| 12322321 | 86.57 | 97.89 | 631.88 | 73.87 | 15.09 | 49.22 |
| 12323321 | 86.52 | 97.88 | 36.04 | 212.33 | 244.75 | 942.15 |
| 12324321 | 86.49 | 97.90 | 620.58 | 73.88 | 15.18 | 54.78 |
| 12325321 | 86.29 | 97.80 | 101.09 | 119.65 | 73.80 | 332.90 |
| 12326321 | 86.23 | 97.69 | 70.56 | 212.03 | 132.55 | 445.84 |
| 12327321 | 86.10 | 97.76 | 88.63 | 69.13 | 67.26 | 383.77 |
| 12328321 | 85.67 | 97.58 | 144.25 | 31.05 | 33.49 | 257.59 |
| 12329321 | 85.54 | 97.46 | 188.35 | 69.02 | 35.87 | 183.65 |
| 12330321 | 85.11 | 97.38 | 293.46 | 30.98 | 17.53 | 123.42 |
| 12331321 | 84.93 | 96.97 | 247.71 | 211.79 | 43.68 | 127.35 |
| 12332321 | 84.90 | 96.96 | 1025.45 | 41.72 | 8.11 | 40.13 |
| 12333321 | 84.85 | 96.99 | 358.25 | 119.47 | 24.04 | 95.01 |
| 12334321 | 84.63 | 97.06 | 575.53 | 66.01 | 14.67 | 58.38 |
| 12335321 | 84.61 | 96.74 | 625.81 | 73.88 | 15.18 | 54.78 |
| 12336321 | 84.49 | 96.76 | 693.82 | 64.90 | 10.75 | 49.30 |
| 12337321 | 84.43 | 96.83 | 647.96 | 68.93 | 11.66 | 53.17 |
| 12338321 | 84.23 | 96.78 | 807.21 | 29.15 | 6.77 | 46.92 |
| 12339321 | 83.62 | 96.38 | 989.59 | 41.72 | 8.04 | 34.60 |
| 12340321 | 83.50 | 96.50 | 1100.53 | 29.06 | 5.11 | 33.11 |
| 12341321 | 83.41 | 96.59 | 1004.94 | 30.92 | 5.60 | 35.78 |
| 12342321 | 83.36 | 96.45 | 1093.03 | 41.69 | 7.85 | 35.47 |
| 12343321 | 83.11 | 96.33 | 1276.88 | 23.70 | 6.26 | 23.05 |
| 12344321 | 83.03 | 96.34 | 1341.24 | 16.78 | 4.37 | 26.05 |
| 12345321 | 82.96 | 96.26 | 1283.24 | 15.50 | 4.47 | 31.92 |
| 12346321 | 82.93 | 96.23 | 1218.17 | 15.45 | 4.46 | 30.28 |
| 12347321 | 82.39 | 96.19 | 1600.14 | 27.44 | 4.67 | 22.04 |
| 12348321 | 82.39 | 95.84 | 1831.21 | 27.44 | 4.43 | 18.73 |
| 12349321 | 82.05 | 95.87 | 2109.09 | 15.15 | 2.62 | 20.34 |
| 12350321 | 81.95 | 95.92 | 2525.52 | 14.70 | 2.47 | 12.80 |
| 12351321 | 81.70 | 95.64 | 2344.52 | 15.14 | 2.41 | 15.41 |
| 12352321 | 80.53 | 95.21 | 1594.71 | 7.52 | 1.85 | 24.86 |
| model | top1 | top5 | samples / sec | Params (M) | GMAC | Act (M) |
|---|---|---|---|---|---|---|
| 12350321 | 81.95 | 95.92 | 2525.52 | 14.70 | 2.47 | 12.80 |
| 12351321 | 81.70 | 95.64 | 2344.52 | 15.14 | 2.41 | 15.41 |
| 12349321 | 82.05 | 95.87 | 2109.09 | 15.15 | 2.62 | 20.34 |
| 12348321 | 82.39 | 95.84 | 1831.21 | 27.44 | 4.43 | 18.73 |
| 12347321 | 82.39 | 96.19 | 1600.14 | 27.44 | 4.67 | 22.04 |
| 12352321 | 80.53 | 95.21 | 1594.71 | 7.52 | 1.85 | 24.86 |
| 12344321 | 83.03 | 96.34 | 1341.24 | 16.78 | 4.37 | 26.05 |
| 12345321 | 82.96 | 96.26 | 1283.24 | 15.50 | 4.47 | 31.92 |
| 12343321 | 83.11 | 96.33 | 1276.88 | 23.70 | 6.26 | 23.05 |
| 12346321 | 82.93 | 96.23 | 1218.17 | 15.45 | 4.46 | 30.28 |
| 12340321 | 83.50 | 96.50 | 1100.53 | 29.06 | 5.11 | 33.11 |
| 12342321 | 83.36 | 96.45 | 1093.03 | 41.69 | 7.85 | 35.47 |
| 12332321 | 84.90 | 96.96 | 1025.45 | 41.72 | 8.11 | 40.13 |
| 12341321 | 83.41 | 96.59 | 1004.94 | 30.92 | 5.60 | 35.78 |
| 12339321 | 83.62 | 96.38 | 989.59 | 41.72 | 8.04 | 34.60 |
| 12338321 | 84.23 | 96.78 | 807.21 | 29.15 | 6.77 | 46.92 |
| 12336321 | 84.49 | 96.76 | 693.82 | 64.90 | 10.75 | 49.30 |
| 12337321 | 84.43 | 96.83 | 647.96 | 68.93 | 11.66 | 53.17 |
| 12322321 | 86.57 | 97.89 | 631.88 | 73.87 | 15.09 | 49.22 |
| 12335321 | 84.61 | 96.74 | 625.81 | 73.88 | 15.18 | 54.78 |
| 12324321 | 86.49 | 97.90 | 620.58 | 73.88 | 15.18 | 54.78 |
| 12334321 | 84.63 | 97.06 | 575.53 | 66.01 | 14.67 | 58.38 |
| 12320321 | 86.64 | 98.02 | 501.03 | 116.09 | 24.20 | 62.77 |
| 12319321 | 86.89 | 98.02 | 375.86 | 116.14 | 23.15 | 92.64 |
| 12333321 | 84.85 | 96.99 | 358.25 | 119.47 | 24.04 | 95.01 |
| 12330321 | 85.11 | 97.38 | 293.46 | 30.98 | 17.53 | 123.42 |
| 12331321 | 84.93 | 96.97 | 247.71 | 211.79 | 43.68 | 127.35 |
| 12329321 | 85.54 | 97.46 | 188.35 | 69.02 | 35.87 | 183.65 |
| 12318321 | 87.39 | 98.31 | 160.80 | 73.88 | 47.69 | 209.43 |
| 12317321 | 87.47 | 98.37 | 149.49 | 116.09 | 72.98 | 213.74 |
| 12328321 | 85.67 | 97.58 | 144.25 | 31.05 | 33.49 | 257.59 |
| 12316321 | 87.81 | 98.37 | 106.55 | 116.14 | 70.97 | 318.95 |
| 12315321 | 87.92 | 98.54 | 104.71 | 119.65 | 73.80 | 332.90 |
| 12325321 | 86.29 | 97.80 | 101.09 | 119.65 | 73.80 | 332.90 |
| 12327321 | 86.10 | 97.76 | 88.63 | 69.13 | 67.26 | 383.77 |
| 12314321 | 87.98 | 98.56 | 71.75 | 212.03 | 132.55 | 445.84 |
| 12326321 | 86.23 | 97.69 | 70.56 | 212.03 | 132.55 | 445.84 |
| 12312321 | 88.20 | 98.53 | 50.87 | 119.88 | 138.02 | 703.99 |
| 12321321 | 86.60 | 97.92 | 50.75 | 119.88 | 138.02 | 703.99 |
| 12311321 | 88.32 | 98.54 | 42.53 | 475.32 | 292.78 | 668.76 |
| 12313321 | 88.04 | 98.40 | 36.42 | 212.33 | 244.75 | 942.15 |
| 12323321 | 86.52 | 97.88 | 36.04 | 212.33 | 244.75 | 942.15 |
| 12310321 | 88.53 | 98.64 | 21.76 | 475.77 | 534.14 | 1413.22 |
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}
@article{tu2022maxvit,
title={MaxViT: Multi-Axis Vision Transformer},
author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
journal={ECCV},
year={2022},
}
@article{dai2021coatnet,
title={CoAtNet: Marrying Convolution and Attention for All Data Sizes},
author={Dai, Zihang and Liu, Hanxiao and Le, Quoc V and Tan, Mingxing},
journal={arXiv preprint arXiv:2106.04803},
year={2021}
}