Google Labs has a new AI tool called Whisk, an image generator that lets you use pictures as prompts. Instead of recreating it with new details, it works by capturing the essence of the starter images to generate the output.
The input begins with a bare-bones interface for style and subject. As of now, this simple interface only allows you to choose from three predefined styles including enamel pin, plushie, and sticker.
Whisk also offers a more advanced editor called “Start from scratch.” This allows you to use a source image or text in three categories, including subject, scene, and style. There is also an input bar to add additional text for finishing touches. Yet, in its current iteration, the advanced controls fail to produce results that look like the user’s queries.
Google acknowledges that Whisk only draws from “a few key characteristics” of the source image. “For example, the generated subject might have a different height, weight, hairstyle or skin tone,” the company warns.
These limitations are due to the fact that Whisk uses the Gemini language model to write detailed captions of the source image uploaded. It then feeds this description into the Imagen 3 image generator, resulting in an image based on Gemini’s words about your image rather than the source image itself.
Those interested can try out Whisk on the official Google Labs site.