Knowledge distillation method for better vision-language models February 22, 2024 by Amazon AWS Method preserves knowledge encoded in teacher model’s attention heads even when student model has fewer of them.Read More Previous Post Gemini models are coming to Performance Max Next Post VideoPrism: A foundational visual encoder for video understanding