Recently Hacker News has been filled with posts about Stable Diffusion, the new AI text-to-image generator, most of which were along the lines of how it was a really big deal.
AI text-to-image generators are having a moment. The basic operation is simple; you input a text ‘prompt’, and a machine learning model does its best to interpret your description to produce some novel images.
First in this new category came DALL•E , developed by OpenAI, in January 2021.
This year DALL•E has been joined by two other programs, Midjourney and, just last month, Stable Diffusion.
Noting this rapid innovation, in June, the Economist used Midjourney to design their front cover.
The ability of these generators is nothing short of astonishing, and the boundaries of what they can do are being pushed forward at an incredible rate.
To make sense of this dizzying change we are going to try to answer a few questions about the new text-to-image algorithms.
How do AI image generators work?
How good is Stable Diffusion?
Are AI image generators a big deal?
This is an amazing story that illustrates just how fast the field of AI is developing.
Let’s dive in.
How do AI image generators work?
“Any sufficiently advanced technology is indistinguishable from magic” wrote Arthur C. Clarke back in 1962.
When DALL•E was released in January 2021, the images it was able to generate were certainly interesting, but more because they were generated by an AI than for the level of sophistication.
They were novel, but perhaps seen as an incremental improvement.
However, this year we have seen another huge leap forward.
DALL•E 2 was released in April 2022, and the difference in just one year is staggering.
Take a look at the two images below, generated with the same prompt.
This is starting to feel like magic.
The level of detail and accuracy in the second image is an order of magnitude better. Of course as usual with all new technologies, other companies were not far behind, with Midjourney and Stable Diffusion being released shortly after.
These three programmes; Stable Diffusion, Midjourney and DALL•E 2 are something quite new - rewriting the rules of what AI is capable of.
For a taste of some of the amazing images that have been generated so far, check out this gallery.
All three work basically the same way.
Ryan O'Connor from Assembly AI explains:
“1. First, a text prompt is input into a text encoder that is trained to map the prompt to a representation space.
Next, a model called the prior maps the text encoding to a corresponding image encoding that captures the semantic information of the prompt contained in the text encoding.
Finally, an image decoder stochastically generates an image which is a visual manifestation of this semantic information.”
To be able to do this well, the models are trained on millions of images, and the final models have billions of parameters.
When DALL•E 2 was released you had to join a waitlist and get in line for access. It was rolled out to the public quite slowly, likely to give the team time to avoid embarrassing issues like that faced by Microsoft when it’s AI chatbot Tay quickly turned racist.
Then along came Stable Diffusion, which is open source, so anyone can grab the code and start playing around with it.
So unlike the controlled ‘slow and steady’ release of DALL•E 2, Stable Diffusion has seen an immediate explosion of projects hacking, remaking and extending the code.
Needless to say, the plea by the company to “Use this in an ethical, moral and legal manner” has already been (partly) ignored.
But more on that later.
How good is Stable Diffusion?
In short it is scarily good.
Imagine having a creative director that could generate almost any image you can conceive of, on demand.
Ever wondered what it would look like if Einstein was eating spaghetti on the international space station?
Ok so the face is still in the uncanny valley, but otherwise that’s pretty amazing.
It can also generate images from other images. Why is that interesting? Well you can start with a basic sketch.
And you of course provide a prompt as well,
“A distant futuristic city full of tall buildings inside a huge transparent glass dome, In the middle of a barren desert full of large dunes, Sun rays, Artstation, Dark sky full of stars with a shiny sun, Massive scale, Fog, Highly detailed, Cinematic, Colorful”
And the result is …stunning.
All the AI image generators we have seen so far have their own strengths and weaknesses, but each are hugely impressive, at least compared to anything we have seen before 2022.
If you want to see the quirks of how they compare, @fabianstelzer did a bunch of comparisons.
Are AI image generators a big deal?
It used to be assumed that the most creative work would be the last human activity to be taken over by AI.
That assumption now seems to be very wrong.
In August, an image generated by Midjourney won a Colorado State Fair’s fine art competition.
In fairness, it looks very cool, but nevertheless when they found out it was created by an AI, people were mad.
In a way, this moment is similar to when computers started beating humans at chess. Just because there are now chess computers that can beat the best human chess players, doesn’t mean that humans don’t continue to play chess for fun and compete with each other.
So with art. Though AI artists are clearly capable of winning art competitions, it won’t stop human artists from continuing to produce art and compete. But most likely we will quickly see rules preventing submissions generated by AI.
Infuriating art critics is really just the first tiny wavelet before the tsunami of changes we are likely to see throughout the creative industries.
Newspaper illustrators, cartoonists, logo designers, web designers and graphic artists are just some of the professions that are likely to feel the pressure sooner rather than later.
And that’s just the beginning.
Within a couple of years with this rate of progress an AI could start to replace 3D animators, architects, interior designers, fashion designers and video game designers.
Basically anyone with ‘design’ in their job title might be starting to feel nervous.
Why employ a human graphic designer for art for your blog/book/magazine when you can use an AI to generate dozens of possible options in a matter of seconds, for a fraction of a cost?
Instead just describe a scene and the AI can render it for you (this is already happening by the way).
An AI can produce essentially unlimited iterations until you get exactly what you are looking for, and it can do so in a few seconds, for a cost that is ~free.
How can a human designer compete?
Now that AI is approaching or exceeding human creative capabilities in this area, it would seem inevitable that this would impact on the demand for human artists, and I think this will start soon.
If the displacement of human artists by AI was the extent of the changes ahead that would already be a massive societal change, but it’s just scratching the surface.
The other foreseeable changes may cause even more disruption, and some are much more sinister.
Inevitably, it wasn’t more than a day after Stability Diffusion was released for the first algorithmically generated porn site to appear.
I am not even sure we have managed to grapple with the implications of on-demand porn uploaded by humans. But what happens when unlimited algorithmically generated porn can cater for every fetish? What about when these algorithms are used to generate illegal pornography?
It gets pretty dark very quickly.
What about when you can generate personalised deepfake porn of your enemies? Or of political leaders?
How will we able to know what we can trust when you can generate a video of anyone saying anything? Or doing anything?
DALL•E 2 and Midjourney don’t allow you to create images of real people, but of course there are no such restrictions with Stable Diffusion.
The ability of these AI generators is increasing exponentially, and the consequences are going to be far reaching.
I don’t want to end on such a pessimistic note, so let’s keep in mind that this is going to unlock huge amounts of new art and design, in ways that no human has ever imagined.
So for now, have a play with this technology whilst it still feels like magic.
Until next time,
Jamie
Did you enjoy this week's newsletter? Help me to improve!
Click on a link to vote:
Thanks for your feedback!