OpenAI has significantly enhanced ChatGPT by integrating its 4o model, enabling native image generation directly within the chatbot. This eliminates the need to use OpenAI’s Dall-E as a separate tool, although Dall-E remains available for users who prefer it. Furthermore, OpenAI has also integrated its Sora AI video generator into ChatGPT, expanding the platform’s creative capabilities.
These new features are currently accessible to all ChatGPT users, including free, Plus, Team, and Pro subscribers. Enterprise and education users can expect access next week.
A paparazzi-style photo of Karl Marx walking through a mall parking lot.
Previously, Dall-E 3 served as the image generation plugin for paid ChatGPT subscribers, while free users could access a basic version through Microsoft Copilot. The 4o model is recognized as a leading image generator, particularly in its paid version. While all ChatGPT users now benefit from native image generation, free tier users may encounter limitations such as file upload and data analysis caps.
A horse galloping across the ocean surface.
OpenAI’s extensive post-launch training process, “reinforcement learning from human feedback” (RLHF), has significantly improved the realism and text legibility of images generated by GPT-4o. This year-long effort focused on refining the model and addressing issues like typos and inaccuracies in generated hands and faces.
Enhanced Image Generation Capabilities within ChatGPT
Following the May 2024 announcement of GPT-4o, OpenAI employed a team of over 100 human trainers to meticulously refine the model, correcting typos and common errors in generated images, particularly hands and faces. A key improvement with GPT-4o is the ability to create images with transparent backgrounds, a valuable feature for businesses and creatives designing logos and other iconography.
A photorealistic image of a farmer.
Addressing Challenges and Ethical Considerations
Despite these advancements, GPT-4o still faces challenges, including the persistent issue of AI “hallucinations” and maintaining editing consistency. However, OpenAI has committed to rapid updates and improvements.
Ethical and legal concerns surrounding AI-generated content continue to be a focus. OpenAI asserts that GPT-4o is trained on publicly available data and proprietary data acquired through partnerships with companies like Shutterstock. Images generated within ChatGPT using the 4o model will not have AI watermarks but will include C2PA metadata, the industry standard for identifying AI-generated content.
Conclusion: A Significant Leap for AI-Powered Creativity
The integration of native image and video generation within ChatGPT marks a significant step forward in AI-powered creativity. While challenges remain, OpenAI’s commitment to continuous improvement and addressing ethical concerns positions GPT-4o as a powerful tool for both casual users and professionals. The integration of Sora expands the platform’s multimedia capabilities, opening up new possibilities for content creation. The advancements in image realism and text legibility, coupled with the introduction of transparent backgrounds, significantly enhance the user experience and creative potential of ChatGPT.