Multimodal Support
This document introduces the current multimodal support in the xLLM inference engine, including supported models, modality types, and offline and online interfaces.
Supported Models
Section titled “Supported Models”- Qwen2.5-VL: including 7B/32B/72B.
- Qwen3-VL: including 2B/4B/8B/32B.
- Qwen3-VL-MoE: including A3B/A22B.
- MiniCPM-V-2_6: 7B.
Modality Types
Section titled “Modality Types”- Images: supports single-image and multi-image inputs, image + prompt combinations, and text-only prompts.