模型:
ai-forever/ruclip-vit-base-patch32-384
RuCLIP ( Ru ssian C ontrastive L anguage– I mage P retraining) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and multimodal learning.
Model was trained by Sber AI and SberDevices teams.
pip install ruclip
clip, processor = ruclip.load("ruclip-vit-base-patch32-384", device="cuda")
We have evaluated the performance on the following datasets:
| Dataset | Metric Name | Metric Result |
|---|---|---|
| Food101 | acc | 0.642 |
| CIFAR10 | acc | 0.862 |
| CIFAR100 | acc | 0.529 |
| Birdsnap | acc | 0.161 |
| SUN397 | acc | 0.510 |
| Stanford Cars | acc | 0.572 |
| DTD | acc | 0.390 |
| MNIST | acc | 0.404 |
| STL10 | acc | 0.946 |
| PCam | acc | 0.506 |
| CLEVR | acc | 0.188 |
| Rendered SST2 | acc | 0.508 |
| ImageNet | acc | 0.451 |
| FGVC Aircraft | mean-per-class | 0.053 |
| Oxford Pets | mean-per-class | 0.587 |
| Caltech101 | mean-per-class | 0.834 |
| Flowers102 | mean-per-class | 0.449 |
| HatefulMemes | roc-auc | 0.537 |