APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

This paper was accepted at the workshop “Has It Trained Yet?” at NeurIPS.
Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets. In this work, we ask whether it is possible to achieve similar results with substantially less training time and data. We achieve this by taking advantage of existing pretrained unimodal encoders and careful curation of alignment data relevant to the downstream task of interest. We study a natural approach to aligning existing encoders via small auxiliary…Apple Machine Learning Research

Vedere AI

APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.