模型:

timm/vit_large_patch14_clip_336.datacompxl_ft_inat21

英文

vit_large_patch14_clip_336.datacompxl_ft_inat21 模型卡片

这是对 iNaturalist 2021 竞赛数据( https://github.com/visipedia/inat_comp/tree/master/2021 )进行的一系列 timm 微调实验之一,用于更高容量的模型。

该数据集涵盖了10,000个物种,您可以通过分类小部件探索这些数据集和模型,并使用您后院的图片进行交互,但它比您在 iNaturalist 网站上找到的模型要小很多( https://www.inaturalist.org/blog/75633-a-new-computer-vision-model-v2-1-including-1-770-new-taxa )。

这些模型的训练没有使用额外的元数据(与竞赛情况相同),它们是直接微调的,目的是探索模型预训练数据的差异。

Model Top-1 Top-5 Paper
1238321 92.05 98.01 1239321
12310321 90.85 97.68 12311321
12312321 90.29 97.44 12313321

微调超参数

./distributed_train.sh 4 --data-dir /tfds/ --dataset tfds/i_naturalist2021 --amp -j 8 --model vit_large_patch14_clip_224 --img-size 336 --model-kwargs img_size=336  --val-split val --opt adamw --opt-eps 1e-6 --weight-decay .01 --lr 5e-5 -
-warmup-lr 0 --sched-on-updates --clip-grad 1.0 --pretrained -b 48 --num-classes 10000 --grad-accum-steps 8 --layer-decay 0.8
 ./distributed_train.sh 4 --data-dir /tfds/ --dataset tfds/i_naturalist2021 --amp -j 8 --model eva02_large_patch14_clip_336 --val-split val --opt adamw --opt-eps 1e-6 --weight-decay .01 --lr 5e-5 --warmup-lr 0 --sched-on-updates --clip-gra
d 1.0 --pretrained -b 40 --num-classes 10000 --grad-accum-steps 10 --layer-decay 0.8 --torchcompile

运行验证

python validate.py /tfds/ --dataset tfds/i_naturalist2021 --model hf-hub:timm/eva02_large_patch14_clip_336.merged2b_ft_inat21 --split val --amp

引用文献

@inproceedings{cherti2023reproducible,
  title={Reproducible scaling laws for contrastive language-image learning},
  author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={2818--2829},
  year={2023}
}
@article{datacomp,
  title={DataComp: In search of the next generation of multimodal datasets},
  author={Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt},
  journal={arXiv preprint arXiv:2304.14108},
  year={2023}
}
@article{EVA02,
  title={EVA-02: A Visual Representation for Neon Genesis},
  author={Fang, Yuxin and Sun, Quan and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
  journal={arXiv preprint arXiv:2303.11331},
  year={2023}
}