Skip to content
EN

Multimodal Support

This document introduces the current multimodal support in the xLLM inference engine, including supported models, modality types, and offline and online interfaces.

  • Qwen2.5-VL: including 7B/32B/72B.
  • Qwen3-VL: including 2B/4B/8B/32B.
  • Qwen3-VL-MoE: including A3B/A22B.
  • MiniCPM-V-2_6: 7B.
  • Images: supports single-image and multi-image inputs, image + prompt combinations, and text-only prompts.