A model for generating videos from text.
Phenaki is an innovative AI model designed to produce videos based on textual input. It has the capability to create videos that can last several minutes solely from the provided text. Moreover, it also offers the feature to generate videos from a single still image paired with a textual prompt.
One of the distinctive attributes of Phenaki is its video encoder-decoder, which has been demonstrated to surpass the existing per-frame baselines commonly utilized in the field. This tremendous improvement is evident in both the spatio-temporal quality of the videos produced and the number of tokens per video. These accolades are indicative of the exceptional performance and superior outcomes achieved by this novel paradigm.
Phenaki utilizes a bidirectional masked transformer to generate video tokens from the accompanying text. This process is contingent upon the conditioning of the algorithm on pre-computed text tokens. The resultant video tokens are then de-tokenized to form the final video output, thereby completing the conversion of textual input to a fully realized video.
Overall, Phenaki's ability to convert text into compelling video content presents a significant advancement in the field of AI-driven video generation. Through its efficient video encoder-decoder and text-based token generation process, this AI model sets a new standard for creating engaging and high-quality videos from textual prompts and input.