模型:

timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21

英文

vit_large_patch14_clip_336.laion2b_ft_in12k_in1k_inat21模型卡片

这是一系列在iNaturalist 2021竞赛数据( https://github.com/visipedia/inat_comp/tree/master/2021 )上使用更高容量的模型进行timm微调实验的一部分。

此数据集涵盖了10,000个物种,通过使用来自您后院的图片来探索这些数据集和模型是很有趣的,但它们比您可以在iNaturalist网站上找到的模型要小得多( https://www.inaturalist.org/blog/75633-a-new-computer-vision-model-v2-1-including-1-770-new-taxa )。

在训练这些模型时没有使用额外的元数据(与竞赛情况相同),这是一个简单的微调实验,用于探索模型预训练数据的差异。

Model Top-1 Top-5 Paper
1238321 92.05 98.01 1239321
12310321 90.85 97.68 12311321
12312321 90.29 97.44 12313321

微调超参数

./distributed_train.sh 4 --data-dir /tfds/ --dataset tfds/i_naturalist2021 --amp -j 8 --model vit_large_patch14_clip_224 --img-size 336 --model-kwargs img_size=336  --val-split val --opt adamw --opt-eps 1e-6 --weight-decay .01 --lr 5e-5 -
-warmup-lr 0 --sched-on-updates --clip-grad 1.0 --pretrained -b 48 --num-classes 10000 --grad-accum-steps 8 --layer-decay 0.8
 ./distributed_train.sh 4 --data-dir /tfds/ --dataset tfds/i_naturalist2021 --amp -j 8 --model eva02_large_patch14_clip_336 --val-split val --opt adamw --opt-eps 1e-6 --weight-decay .01 --lr 5e-5 --warmup-lr 0 --sched-on-updates --clip-gra
d 1.0 --pretrained -b 40 --num-classes 10000 --grad-accum-steps 10 --layer-decay 0.8 --torchcompile

运行验证

python validate.py /tfds/ --dataset tfds/i_naturalist2021 --model hf-hub:timm/eva02_large_patch14_clip_336.merged2b_ft_inat21 --split val --amp

引用

@inproceedings{cherti2023reproducible,
  title={Reproducible scaling laws for contrastive language-image learning},
  author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={2818--2829},
  year={2023}
}
@article{datacomp,
  title={DataComp: In search of the next generation of multimodal datasets},
  author={Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt},
  journal={arXiv preprint arXiv:2304.14108},
  year={2023}
}
@article{EVA02,
  title={EVA-02: A Visual Representation for Neon Genesis},
  author={Fang, Yuxin and Sun, Quan and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
  journal={arXiv preprint arXiv:2303.11331},
  year={2023}
}