Google takes on OpenAI with flashy text-to-image generator

The AI imagery competition is getting personal.

Google this week unveiled a new challenger to OpenAI’s vaunted DALLE-2 text-to-image generator — and took shots at its rival’s efforts.

Both models convert text prompts into pictures. But Google’s researchers claim their system provides “unprecedented photorealism and deep language understanding.”

: Example qualitative comparisons between Imagen and DALL-E 2 [54] on DrawBench prompts from Conflicting category. We observe that both DALL-E 2 and Imagen struggle generating well aligned images for this category. However, Imagen often generates some well aligned samples, e.g. “A panda making latte art.” — Human raters preferred Imagen over DALLE-2 for both sample quality and image-text alignment. Credit: Saharia et al.

The cringingly-named Imagen system uses a large pre-trained language model as a text encoder. A cascade of diffusion models then turn the user’s words into pictures.

How Startup Amsterdam Boosts Innovation and Growth at TNW Conference

Discover how the City of Amsterdam partnered with TNW to amplify its startup ecosystem, attract global talent, and foster innovation that drives economic impact.

Read the case study

In tests, the Google team said Imagen “significantly outperformed” DALL-E 2.

Imagen vs DALL-E 2 on DrawBench a) image-text alignment, and b) image fidelity. — Imagen particularly outshone DALL-E 2 in the colors, positional, text, and description categories. Credit: Saharia et al.

Imagen’s developers have even invented a new method of measuring the supremacy of their creation.

Dubbed DrawBench, the benchmark compares human judgments on the outputs of different text-to-image generators.

Unsurprisingly, Google’s metric gave strong scores to Google’s system.

“With DrawBench, extensive human evaluation shows that Imagen outperforms other recent methods by a significant margin,” the researchers said in their study paper.

Example qualitative comparisons between Imagen and DALL-E 2 [54] on DrawBench prompts from Colors category. We observe that DALL-E 2 generally struggles with correctly assigning the colors to the objects especially for prompts with more than one object. — DALL-E 2 can struggle to correctly assign colors to objects — especially for prompts with more than one object. Credit: Saharia et al.

The images and metrics certainly look impressive, but Google hasn’t offered an opportunity to scrutinize the results.

You can try some interactive demos at the Imagen website, but these only let you use a small selection of phrases to form a constrained sentence.

Until the model and code get a public release, cynics will suspect that Google’s cherry-picking the results.

Example qualitative comparisons between Imagen and DALL-E 2 [54] on DrawBench prompts from Text category. Imagen is significantly better than DALL-E 2 in prompts with quoted text. — Imagen was significantly better than DALL-E 2 in prompts with quoted text. Credit: Saharia et al.

Google’s explanation for keeping the model private echoes one given by OpenAI: the system is too dangerous to release.

The researchers warn that generative methods can spread misinformation, stir harassment, and exacerbate marginalization.

“Our preliminary assessment also suggests Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes,” said the researchers.

Example qualitative comparisons between Imagen and DALL-E 2 [54] on DrawBench prompts from Reddit category. — Imagen significantly outperformed DALL-E 2 in the positional, text, and descriptions categories. Credit: Saharia et al.

The team concludes that Imagen “is not suitable for public use at this time” — but does offer hope of a future release.

I await their update with caution. As someone who creates images for articles every day, the prospect of AI labs competing to offer better results is attractive.

On the other hand, I don’t want our robot overlords to replace artists with algorithms.

Story by Thomas Macaulay

Senior reporter

Thomas is a senior reporter at TNW. He covers European tech, with a focus on AI, cybersecurity, and government policy. Thomas is a senior reporter at TNW. He covers European tech, with a focus on AI, cybersecurity, and government policy.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Google takes on OpenAI with flashy text-to-image generator

Get the TNW newsletter

Also tagged with

AI could transform visual effects in film — but the emerging field is mired in copyright issues

Tech bosses think nuclear fusion is the solution to AI’s energy demands – here’s what they’re missing

Discover TNW All Access

Dutch startup Sympower secures €21M to balance out the energy grid

AI startup Gendo — the Midjourney for architecture — secures fresh capital