Gemini Omni is a Google DeepMind AI model focused on video creation and editing. It lets users change videos by describing what they want in normal language, step by step. The page describes it as “create anything from any input — starting with video,” meaning it can use different types of references, such as image, text, video, or audio, to help produce one cohesive result. It is designed for creative video work where the scene stays consistent as edits build on each other.
Key Features
Natural conversation editing: Users can edit videos by simply explaining changes in plain language.
Multi-step consistency: Each edit can build on the previous one while keeping the scene coherent.
Reference-based creation: It can use images, text, video, or audio as references to create a single connected output.
Real-world understanding: It uses Gemini’s knowledge of physics, history, science, and cultural context to support more meaningful video storytelling.
Use Cases
Video editing for creators: Change actions, styles, environments, camera angles, objects, or characters in a video using simple prompts.
Creative storytelling: Build scenes that follow real-world logic, such as gravity, movement, fluid dynamics, science, or historical ideas.
Reference-guided video generation: Use a sketch, image, audio clip, or existing video to guide the look, motion, sound, or style of a new video.