>Ferret: A Multimodal Large Language Model What I thought when reading the title...

basiccalendar74 · on Dec 23, 2023

this seems like a good but small research project by a research team in Apple. far away from what product teams are working on for next generation of apple products.

ipsum2 · on Dec 24, 2023

The innovation is the modification of the neural network architecture to incorporate the spatial-aware visual sampler. The data and existing models are not the interesting part.

foxhop · on Dec 23, 2023

Thanks for the summary.