Meta: Llama 3.2 90B Vision Instruct

by meta-llama

0 stars
Context 33K tokens
Modalities Text, Image → Text
Max Output 16,384
Input Price $0.35 / million tokens
Output Price $0.40 / million tokens

Overview

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Key Features

  • Multimodal capabilities (Text, Image → Text)
  • 33K tokens context window
  • Up to 16,384 output tokens
  • API access available

Model Information

Developer:

meta-llama

Release Date:

September 25, 2024

Context Window:

33K tokens

Modalities:

Text, Image → Text

Pricing

Input Tokens $0.35 / million tokens
Output Tokens $0.40 / million tokens
Get API Key

Discussion

No comments yet. Be the first to share your thoughts about this model!