Vision-language models that can handle multi-image inputs

January 19, 2024

Amazon AWS

Attention-based representation of multi-image inputs improves performance on downstream vision-language tasks.Read More

Navigation