Google’s Genie AI Crafts Games from Single Images

    Published on:

    Google has announced Genie, a mobile game creation app, as part of its continued investment in artificial intelligence.

    A live demo of Genie, a generative AI model developed by Google's AI startup DeepMind, was held. Genie can learn game mechanics from hundreds of thousands of gameplay videos and generate playable games with minimal prompts.

    Also read: Figure AI raises $675 million to develop labor-solving humanoid robots

    Unveiling of the genie

    As stated in Google's official DeepMind blog post, Genie is a foundational world model trained using online videos. Models can generate “an infinite variety of playable (and action-controllable) worlds from composite images, photos, and even sketches.”

    Genie stands for Generative Interactive Environments and was developed in partnership between Google and the University of British Columbia. With just one image of his, he can generate side-scrolling 2D platformers like Contra or Super Mario Brothers based on user prompts.

    However, Google DeepMind said in its announcement that it is introducing a “new paradigm” of generative artificial intelligence (AI) in the form of Genie. Additionally, the company acknowledged the emergence of generative AI models that can generate novel and creative content through language, images, and even video.

    According to Google, of the 200,000 hours of unsupervised public internet game video on which Genie was trained, a significant portion is 2D platformers rather than full virtual reality games.

    Genie specifications

    In terms of dimensions, Genie has 11 billion parameters. The model also includes a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. These specifications allow Genie to operate frame-by-frame in the generated environment without the need for labels or other domain-specific requirements during training.

    Furthermore, even though Genie is trained on video-only data, it can be directed to generate a diverse set of interactive and controllable environments. Unlike many generative AI models that can use language images and videos to create creative content, Genie can create playable environments with just one image prompt.

    However, Google DeepMind developer Tim Rocktäschel said on X (formerly Twitter) that the focus is on scale rather than adding inductive bias.

    He added that they are using a dataset of more than 200,000 hours of video from the 2D platform to train a 1.1 billion world model. The Genie learns various potential actions that consistently control the character in an unsupervised manner.

    Photo credit: Google

    Genie's abilities

    According to Google researchers, Genie is powered by a dynamic model that predicts what will happen in the next frame, a video tokenizer that converts raw video frames into individual tokens, and latent actions that can infer actions between video frames. The model is driven by three models. .

    One of the unique features of Genie's base model is its ability to identify key characters in a game without any action or text annotation training. Thanks to the model that drives it, the user can easily control the character in her AI-generated virtual reality environment.

    Rocktäschel also said that Genie can turn other media into games. Genie can be asked to create various action-controllable virtual worlds from various inputs in his included Google DeepMind research paper.

    Additionally, Rocktäschel said the model can convert any image into a playable 2D world. He says Genie can bring human-designed works, such as sketches, to life. For example, his beautiful artwork by two of the world's youngest creators, Seneca and Caspian.


    Leave a Reply

    Please enter your comment!
    Please enter your name here