数据集:
laion/laion_100m_vqgan_f8
This dataset contains VQGAN (f8, 8192) embeddings for the images from the first ~100 million image-text pairs of the LAION-400M dataset . VQGAN was introduced in the paper "Taming Transformers for High-Resolution Image Synthesis" and adopted for training DALLE-mini .
Warning : This large-scale dataset is non-curated. It was built for research purposes to enable testing model training on larger scale for broad researcher and other interested communities, and is not meant for any real-world production or application.
VQGAN (f8, 8192) is a pretrained model with downsampling factor f=8 , 8192 codebook entries, and Gumbel quantization. We did not perform any fine-tuning and used the VQGAN wrapper from the DALLE-pytorch repository for inference. Since LAION-400M contains 256x256 images, the model produces 1024 codes for each image.
The data is provided as *.parquet files with the embeddings and meta information:
The data corresponds to the shards 00000 , 00001 , ..., 09999 of LAION-400M. 0.07% of the shards were excluded since they were corrupted in the original dataset.
The LAION-400M dataset is distributed under the CC-BY 4.0 license . The VQGAN models are distributed under the MIT license .