[Feature]: About video input for qwen3vl

### 🚀 The feature, motivation and pitch

I tried using base64 encoding to provide video input for vllm inference, but it seems this input method is not yet supported by Qwen3VL (I've seen similar issues reported elsewhere). Currently, I can only specify parameters like fps/maximum frames and then pass the local path or URL of the video.

However, in my scenario, my videos are not uniformly sampled; I need to manually sample them first and then input multiple frames. Is there a way to achieve this input method now?

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: About video input for qwen3vl #30129

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: About video input for qwen3vl #30129

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions