Google on Tuesday introduced Gemini Omni, a new multimodal AI model that combines the company’s Gemini AI models with its media-generation tools, including Veo, Nano Banana, and Genie.
The announcement came during Google I/O 2026, where DeepMind CEO Demis Hassabis described Gemini Omni as “our new model that can create anything from any input.”
“It combines Gemini's intelligence with the best of our generative media models for a new level of world understanding, multimodality, and editing,” Hassabis said.
We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video.
Calling Omni a “step towards artificial general intelligence,” Hassabis said Google has spent the past year extending Gemini into “a world model AI that can understand and simulate the world.”
Google says Omni can keep the same characters, backgrounds, and movement consistent even after users make changes to a video—something many AI video models struggle with. The company also says Omni uses Gemini’s reasoning abilities to understand broader instructions, so users can describe the kind of scene they want without manually explaining every detail.
Additional updates include Flow Tools, which allows users to create custom editing workflows using natural-language prompts without coding experience.
Hassabis said Google is starting with video generation, but plans to expand access to Omni, describing it as the long-term vision behind Gemini’s multimodal design.
“This was always our goal with Gemini, and why we built it to be multimodal from the very start,” he said.
Google did not immediately respond to a request for comment by Decrypt.



















