Preview multimodal video generation
Official resolutions, durations, inputs and audio support in one table.
| Brand | Google Gemini via KIE |
| Type | Video generation (Image / Text -> Video) |
| Max resolution | 720p |
| Max duration | 4s (cap 4s) |
| Native audio | Not supported |
| Inputs supported | image · text · video |
| Aspect ratios | 16:9 · 9:16 |
| Editorial rating | 4.0 / 5.0 |
| Credit range | 90 – 90 credits / job |
Gemini Omni is a preview multimodal video model available through KIE. It can combine a written prompt with images and a short video reference, which makes it useful for testing scene direction, reference-driven motion, and early multimodal video workflows. Because provider availability is still uneven, use it as an experimental model: credits are refunded automatically if the provider task fails, and you can compare the same prompt with Kling, Seedance, Wan, or Veo from the same workspace.
Blend prompt, image references, and one source video to test direction before committing to a heavier final model
Useful when the brief depends on visual references rather than a prompt alone
Pass a short motion reference and ask Gemini Omni to reinterpret the scene
Run it beside Kling, Seedance, Wan, or Veo to learn when the preview provider is worth using
Copy these prompts to use directly, or tweak them to fit your needs.
combine the reference lighting with a slow cinematic orbit, subtle fabric motion, clean realistic details
Image reference plus camera motion
reinterpret the source video motion as a premium product reveal, smooth push-in, reflective highlights
Video-guided product reveal
futuristic workspace, character walks through soft volumetric light, grounded realistic motion
Prompt-first cinematic scene
Describe the scene, camera move, and output intent in one concise prompt. Gemini Omni works best when each reference has a clear role.
Use one hero image first, then add supporting images or a short source video only when they clarify the brief.
If the provider queue fails or the result is inconsistent, your credits are returned and you can rerun the same idea with Kling, Seedance, Wan, or Veo.
Use phrases like "use image 1 for subject identity" or "use the source video only for camera motion."
use the first image for subject identity, source video only for camera movement
Treat Gemini Omni as a preview model. Run a short exploratory clip first, then decide whether to iterate or switch models.
Do not mix references with opposite lighting, camera language, or subject identity unless the prompt explains which one should win.
Gemini Omni is configured at 90 credits for a 4-second 720p preview clip, or 120 credits when a video reference is included. Failed provider tasks are refunded automatically.
Veo 3.1 is the safer production pick for polished Google video output. Gemini Omni is better for previewing multimodal reference workflows.
Kling 3 is stronger for complex physical interaction. Gemini Omni is more experimental when the brief depends on mixed reference types.
Seedance 2 is more reliable for motion and choreography. Gemini Omni is useful when you want to test a prompt plus image plus video reference in one request.
Gemini Omni is best for Multimodal concept tests. Its strongest advantage is Supports prompt plus multiple image references in the same generation request, while the key limitation is Provider success rate and queue stability are currently less predictable than mature production models. For lower iteration cost, validate prompts on free models first, then switch to this model for final renders.
Jump straight into the generator workspace, upload one photo, and test your first AI video now.