
[2103.00020] Learning Transferable Visual Models From Natural …
Feb 26, 2021 · View a PDF of the paper titled Learning Transferable Visual Models From Natural Language Supervision, by Alec Radford and 11 other authors
arXiv.org e-Print archive
This paper explores pre-training models for learning state-of-the-art image representations using natural language captions paired with images.
LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality ...
Nov 7, 2024 · Inspired by the rapid progress of large language models (LLMs), we investigate how the superior linguistic understanding and broad world knowledge of LLMs can further strengthen CLIP, …
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Dec 6, 2023 · To fulfill the requirements, we introduce Alpha-CLIP, an enhanced version of CLIP with an auxiliary alpha channel to suggest attentive regions and fine-tuned with constructed millions of RGBA …
[2309.16671] Demystifying CLIP Data - arXiv.org
Sep 28, 2023 · View a PDF of the paper titled Demystifying CLIP Data, by Hu Xu and 9 other authors
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
Nov 2, 2022 · In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on …
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Mar 22, 2024 · Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities. …
[2303.12417] CLIP$^2$: Contrastive Language-Image-Point Pretraining ...
Mar 22, 2023 · To take a step toward open-world 3D vision understanding, we propose Contrastive Language-Image-Point Cloud Pretraining (CLIP 2) to directly learn the transferable 3D point cloud …
Exploring CLIP for Assessing the Look and Feel of Images
Jul 25, 2022 · In this paper, we go beyond the conventional paradigms by exploring the rich visual language prior encapsulated in Contrastive Language-Image Pre-training (CLIP) models for …
Hierarchical Text-Conditional Image Generation with CLIP Latents
Apr 13, 2022 · View a PDF of the paper titled Hierarchical Text-Conditional Image Generation with CLIP Latents, by Aditya Ramesh and 4 other authors