模型:

timm/xcit_medium_24_p16_224.fb_dist_in1k

英文

xcit_medium_24_p16_224.fb_dist_in1k模型卡片

< p > 一个XCiT(Cross-Covariance Image Transformer)图像分类模型。在ImageNet-1k上使用论文作者的蒸馏进行预训练。

模型详情

  • 模型类型:图像分类/特征骨干
  • 模型统计:
    • 参数数目(M):84.4
    • GMACs:16.1
    • 激活(M):31.7
    • 图像大小:224 x 224
    • 论文:
      • XCiT:交叉协方差图像变换器: https://arxiv.org/abs/2106.09681
      • 数据集:ImageNet-1k
      • 原始来源: https://github.com/facebookresearch/xcit

        模型用途

        图像分类
        from urllib.request import urlopen
        from PIL import Image
        import timm
        
        img = Image.open(urlopen(
            'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
        ))
        
        model = timm.create_model('xcit_medium_24_p16_224.fb_dist_in1k', pretrained=True)
        model = model.eval()
        
        # get model specific transforms (normalization, resize)
        data_config = timm.data.resolve_model_data_config(model)
        transforms = timm.data.create_transform(**data_config, is_training=False)
        
        output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1
        
        top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
        

        图像嵌入
        from urllib.request import urlopen
        from PIL import Image
        import timm
        
        img = Image.open(urlopen(
            'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
        ))
        
        model = timm.create_model(
            'xcit_medium_24_p16_224.fb_dist_in1k',
            pretrained=True,
            num_classes=0,  # remove classifier nn.Linear
        )
        model = model.eval()
        
        # get model specific transforms (normalization, resize)
        data_config = timm.data.resolve_model_data_config(model)
        transforms = timm.data.create_transform(**data_config, is_training=False)
        
        output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor
        
        # or equivalently (without needing to set num_classes=0)
        
        output = model.forward_features(transforms(img).unsqueeze(0))
        # output is unpooled, a (1, 197, 512) shaped tensor
        
        output = model.forward_head(output, pre_logits=True)
        # output is a (1, num_features) shaped tensor
        

        引用

        @article{el2021xcit,
          title={XCiT: Cross-Covariance Image Transformers},
          author={El-Nouby, Alaaeldin and Touvron, Hugo and Caron, Mathilde and Bojanowski, Piotr and Douze, Matthijs and Joulin, Armand and Laptev, Ivan and Neverova, Natalia and Synnaeve, Gabriel and Verbeek, Jakob and others},
          journal={arXiv preprint arXiv:2106.09681},
          year={2021}
        }