Research/Multimodal
-
[Flamingo] Tackling multiple tasks with a single visual language modelResearch/Multimodal 2024. 8. 15. 10:54
Google DeepMind https://deepmind.google/discover/blog/tackling-multiple-tasks-with-a-single-visual-language-model/One key aspect of intelligence is the ability to quickly learn how to perform a new task when given a brief instruction. For instance, a child may recognise real animals at the zoo after seeing a few pictures of the animals in a book, despite differences between the two. But for a ty..
-
Contrastive LearningResearch/Multimodal 2024. 8. 14. 14:03
https://neurips.cc/media/neurips-2021/Slides/21895.pdfhttps://sthalles.github.io/simple-self-supervised-learning/https://sanghyu.tistory.com/184What is Self-Supervised Learning?Self-Supervised Learning (SSL) is a special type of representation learning that enables learning good data representation from unlabelled dataset. It is motivated by the idea of constructing supervised learning tasks out..
-
[CLIP] Connecting text and imagesResearch/Multimodal 2024. 8. 13. 22:01
https://openai.com/index/clip/We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3. Although deep learning has revolutionized computer vi..