Google’s latest experimental AI tool, Whisk, is transforming image generation by using images, rather than just text, as prompts. Powered by Google’s Imagen 3 image generation model, Whisk allows for rapid visual exploration and offers a unique approach to creating and manipulating images.
Exploring Whisk’s Image-Based Prompting
Whisk’s initial setup is straightforward. After navigating through the welcome page, email signup, and privacy policy, you’re presented with the main interface. The initial prompt I encountered featured a dinosaur plushie as the image style, alongside options like enamel pins and stickers. Upon selecting a style, you upload an image representing your desired subject. My first attempt, using a photograph of a smartwatch, resulted in a persistent loading issue. However, uploading a more cartoonish image yielded immediate results: plushie figurines of three mythical creatures.
Editing and Text Prompt Integration
Once the initial image is generated, Whisk provides an editing section with a text prompt area. Using the suggested prompt, “the character is eating ice cream,” I generated variations of the creatures holding ice cream cones. The “start from scratch” option allows for complete customization, enabling users to upload their own images or input text prompts from the beginning. An “Inspire Me” button provides image and text suggestions for those seeking inspiration.
Managing Your Image Library
Whisk features a “My Library” section to view and manage created images. Users can enable or disable the library, download individual images, or delete library data entirely. Each image displays its corresponding text prompt, which can be copied for use in other tools. Interestingly, Whisk eventually generated the plushie-smartwatch blend I initially attempted, storing it within My Library. This highlights the importance of checking the library for background processes and potentially unexpected results.
Comparing Whisk with Microsoft Designer
Whisk’s image-based prompting contrasts with Microsoft Designer’s text-prompt-driven approach, which utilizes OpenAI’s DALL-E 3 model. Replicating the plushie-smartwatch prompt in Microsoft Designer yielded less detailed and somewhat unsettling results, featuring human faces on watch bodies rather than a distinct watch face. This suggests that Whisk’s Imagen 3 model excels at interpreting image context compared to DALL-E 3’s text processing.
The Power of Image-Based Prompts
While Whisk incorporates text prompts to refine results and address potential inaccuracies, its core strength lies in its image-based prompting system. This innovative approach offers a new level of control and precision in AI image generation, paving the way for exciting possibilities in visual content creation.