Comparing CLIP vs. LLaVA on Zero-Shot Classification by Misaki Matsuura
In this study by Misaki Matsuura, the effectiveness of CLIP (contrastive language-image pre-training) and LLaVA (large language-and-vision assistant) on zero-shot classification is explored. CLIP, with 63 million parameters, retrieves textual labels based on internet image-text pairs. On the other h
0 views • 6 slides