Overview
LLaVA (Large Language and Vision Assistant) is a multimodal model that can understand and process both text and images. It is used for tasks like image description and detection.
Role in this knowledge base
LLaVA is used as the vision model within the application to process and describe uploaded images.
Key facts
- LLaVA is used for image detection when an image is part of the user’s prompt.