About 54 results
Open links in new tab
  1. [2103.00020] Learning Transferable Visual Models From Natural …

    Feb 26, 2021 · View a PDF of the paper titled Learning Transferable Visual Models From Natural Language Supervision, by Alec Radford and 11 other authors

  2. arXiv.org e-Print archive

    This paper explores pre-training models for learning state-of-the-art image representations using natural language captions paired with images.

  3. LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality ...

    Nov 7, 2024 · Inspired by the rapid progress of large language models (LLMs), we investigate how the superior linguistic understanding and broad world knowledge of LLMs can further strengthen CLIP, …

  4. Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

    Dec 6, 2023 · To fulfill the requirements, we introduce Alpha-CLIP, an enhanced version of CLIP with an auxiliary alpha channel to suggest attentive regions and fine-tuned with constructed millions of RGBA …

  5. [2309.16671] Demystifying CLIP Data - arXiv.org

    Sep 28, 2023 · View a PDF of the paper titled Demystifying CLIP Data, by Hu Xu and 9 other authors

  6. Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

    Nov 2, 2022 · In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on …

  7. Long-CLIP: Unlocking the Long-Text Capability of CLIP

    Mar 22, 2024 · Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities. …

  8. [2303.12417] CLIP$^2$: Contrastive Language-Image-Point Pretraining ...

    Mar 22, 2023 · To take a step toward open-world 3D vision understanding, we propose Contrastive Language-Image-Point Cloud Pretraining (CLIP 2) to directly learn the transferable 3D point cloud …

  9. Exploring CLIP for Assessing the Look and Feel of Images

    Jul 25, 2022 · In this paper, we go beyond the conventional paradigms by exploring the rich visual language prior encapsulated in Contrastive Language-Image Pre-training (CLIP) models for …

  10. Hierarchical Text-Conditional Image Generation with CLIP Latents

    Apr 13, 2022 · View a PDF of the paper titled Hierarchical Text-Conditional Image Generation with CLIP Latents, by Aditya Ramesh and 4 other authors