scorecardresearch

Google introduces ‘VideoPoet’ an AI model that makes videos from images, texts & audio

VideoPoet integrates various tasks such as text-to-video, image-to-video, video inpainting and outpainting, video stylisation, and video-to-audio generation, all within a single LLM.

advertisement
artificial intelligence
profile
New Delhi, UPDATED: Jan 2, 2024 17:04 IST

Highlights

  • Google introduces ‘VideoPoet,’ a multimodal LLM that produces videos
  • VideoPoet integrates multiple video generation capabilities into a unified language model
  • Researchers believe VideoPoet holds promising potential for 'any-to-any' format in the future

Google has unveiled ‘VideoPoet,’ a cutting-edge large language model (LLM) that takes video generation to unprecedented heights. This multimodal marvel boasts the ability to process text, images, video, and audio, producing videos like never before.

Revolutionary decoder-only architecture

Google's scientists have developed VideoPoet with a 'decoder-only architecture,' allowing it to generate content for tasks it hasn't been explicitly trained on. This approach involves two key steps: pretraining and task-specific adaptation. Essentially, VideoPoet is a versatile framework customisable for various video generation tasks.

advertisement

Unified approach

Unlike existing video models that use diffusion models, VideoPoet integrates multiple video generation capabilities into a unified language model. This means it excels in various tasks such as text-to-video, image-to-video, video inpainting and outpainting, video stylisation, and video-to-audio generation, all within a single LLM.

Key to VideoPoet's success

VideoPoet's success lies in its autoregressive model, which creates output by building on its previous generations. Trained on video, audio, image, and text, VideoPoet utilises tokenisation, a process crucial for natural language processing, converting input text into smaller units for better analysis.

Unlocking creative possibilities

Researchers believe VideoPoet holds promising potential for 'any-to-any' format in the future. Remarkably, it can even craft a short film by combining multiple video clips. While not currently suited for longer videos, Google suggests overcoming this limitation by conditioning the last second to predict the next second.

Innovative applications

VideoPoet's capabilities extend to altering the movement of objects in existing videos, as exemplified by a quirky scenario where the Mona Lisa yawns. This demonstrates the model's creative prowess in reshaping visual content.

Google's VideoPoet is not just a leap in video generation technology; it's a glimpse into the future of multimedia content creation.

advertisement

Published on: Jan 2, 2024 17:01 ISTPosted by: samira siddiqui, Jan 2, 2024 17:01 IST
IN THIS STORY

COMMENTS 0

Advertisement
Recommended