Artificial intelligence as it relates to image creation arrived this year with a wave of unnerving, not-quite-right computer-generated approximations of reality, and tub-thumping pronouncements that the world will never be the same, that another nail had been driven into photography’s coffin.
Time will tell if the quality of the images or the predictions were the more incorrect, but is does feel like we are on the bleeding edge of something new. Like it or not.
Machine learning models like like DALL-E, Stable Diffusion, and Midjourney were presented to the public this year. This ‘Generative AI’ software takes text prompts as input and uses them to generate images. This is called text-to-image prompting and opens up a brave new world of what is called ‘prompt engineering’.
Javier Ideami, computer engineer, leading Artificial Intelligence expert and visual artist spoke in October at the Visual 1st Conference in San Fransisco on this ‘Disruptive Power of Generative AI’. Here’s an edited version of a report of that presentation written by Visual 1st organizer, Hans Hartman:
Ideami (he has single name status in the AI world) commenced by noting that while the photo and video industry has seen its share of disruptive technology innovations in the past – ranging from digital cameras replacing film cameras to smartphones replacing digital cameras – we have never witnessed an industry-disrupting innovation coming to market at the frenetic pace that generative AI implementations have.
He defined Generative AI as a type of AI technology that creates new data that is different from
the data used to train the system, and noted that its applications are far broader than simply mimicking realistic photogrphy, which has so far been the focus of publicity about the technology.
Potential use cases include product design, visual brainstorming, stock imagery, cartoons, logo development, fantasy game characters, and scientific images.
Ideami conceded that right now, most Generative AI applications are still imperfect. Yet the technology has been hyped to the point that it’s likely there will be disappointments ahead. He said it will take a while before we even realise what the most valuable applications will be, and some of them we might not even be able to imagine yet.
Generative AI uses input prompts to create output. Currently, the input prompts are text or images which can generate further text, image, audio and output, with video, 3D models and shapes as additional output types already reported on in research studies.
Ultimately, the input prompts could be any type of data, triggering any type of output, including even multimodal outputs, such as audio combined with visuals. (‘Snow leopard on Mt Everest singing My Way and playing the banjo.’)
Generative AI won’t remove humans from the creative process but, according to Javier, the technology
‘will function like an Iron Man suit that we wear to amplify our creative potential. And there will be
different kinds of these Iron Man suits offering different capabilities and possibilities.’
He elaborated on the discipline which will be critical to the development of Generative AI – prompt engineering. Here’s Wikipedia’s explanation of this new field: ‘In prompt engineering, the description of the task is embedded in the input, eg, as a question instead of it being implicitly given. Prompt engineering typically works by converting one or more tasks to a prompt-based dataset and training a language model with what has been called ‘prompt-based learning’ or just ‘prompt learning’.
‘Prompt engineering may work from a large “frozen” pretrained language model and where only the representation of the prompt is learned, with what has been called “prefix-tuning” or “prompt tuning”.
(Say that again?)
Below is a 15-minute dive into prompt engineering and Generative AI by computer engineer Patrick Debois which does a better job of explaining the technology than Wikipedia. Geeky (of course) but relatively easy to follow: