Generating Images with AI

johnwpphillips
Apr 21, 2024
4 min read

Models for generating images from text, or text and another image, are exposed to billions of text and image pairs, which they archive as algorithms. An image might represent a subject or topic: Donald Trump, A forty-two-year-old Chinese woman, a toy bear. It might represent a style: detailed matte painting, deep colour, fantastical, intricate detail, splash screen, complementary colours, fantasy concept art, 8k resolution, trending on Artstation, Unreal Engine 5. A user creates a prompt that identifies a subject and a style. The prompt functions like a message composed of algorithms, to which the model responds with the algorithms that result in the outcome image.

There are greater possibilities than can be comprehended because the AI models work with such large numbers. A landscape made of denim? A quilted bear? A dragon from Game of Thrones made out of smoke? Examples like these are the new clichés of AI image production.

Photographers worry that AI prompt writers will put them out of business. The panic is a bit ridiculous. Photographers record existing light available for a sensor and produce images using mechanical logic. AI cannot do this and never will be able to. At best the AI will give you an image that looks like a photograph, if you ask it to. Among the billions of image text pairs to which it has been exposed are innumerable photographs. There remains a need for photography. With AI one plays a different game.

The images created by AI models represent things that do not exist. They represent how things that may or may not exist are represented. When you see an AI picture of Donald Trump you do not see a representation of the person but a simulation of the means of representation. Therefore, the most effective and most loved images in the AI art community are those that work with styles. The topics--derided by people holding onto traditional values in aesthetic style and taste--tend to be those of myth, fairy tales, imaginative art, and fantasy: unicorns, elven culture, dragons, scenes from enduring fantasy fiction like Lord of the Rings or Game of Thrones. Or they depict unutterably cute subjects (Chibli-style big-eyed creatures made of sand or cuddling up with what in the real world would be their predators). Or even more simple and ubiquitous in the world today, absurd juxtapositions: an alligator riding a motorbike, a mermaid in space dressed as a goddess, Donald Duck and Winston Churchill with a dinosaur. Never has photography been in less danger of becoming irrelevant.

Artists in the AI universe combine the techniques of writing with those of coding. The most experienced and successful AI artists keep a repertoire of prompt templates at hand, which can be adapted for this or that purpose. Adaptation, in truth, is the name of the game.

An experienced AI artist will have produced any number of low-rank adaptation models (LoRAs) that fine-tune their work, giving it the edge that marks it out as peculiarly theirs. Unlike a standard model for AI generation, the open source Stable Diffusion or DALL-E or Midjourney, which have been trained on billions of text-image pairs, a LoRA is trained on a maximum of 128 images and often as few as 78 or 80. These LoRAs might be adopted by any user. They add to the creative possibilities of AI art by restricting the outcomes to particular orientations, especially if they are modelled on a personalized topic (i.e., a user's image) or style (painting everything with a touch of noir or a cubist look, or albumen, or turning everything to smoke, or creating surfaces of iridescent foil).

In fairness, the art of writing the prompt will normally take care of the work of fine-tuning an image. Everything depends on the relative weight of each element (or algorithm) of the prompt. This includes all aspects of the written prompt as well as a start image, which may be masked, and the LoRA. Sophisticated new models will take care of prompt writing and weighting for users with no experience in the art or perhaps no aptitude for it. We can now expect to see the world of the digital universe filled with images generated entirely in terms of the logic of the trained models, which convert your ill-formed sentences into subtle weighted prompts on the models of such sentences provided in the latest round of training.

It is worth considering that the software, and so the logical principles on which the algorithms operate, is similar if not more or less the same as that which drives the latest astonishing software programs for automated translation, like Deep L. The logic of AI translation like that for text-to-image generation relies on similarly large numbers (billions upon billions) of pairings to which the model is exposed. We do not translate in the same way. If anything the quality of the translations is better with AI now than it ever has been even with those geniuses who speak 30 or 40 languages fluently. This is because it depends on billions, on average aggregates, on expectations, to which psychologists of human behaviour now have access, via similar algorithmic calculations.

[Images in this post were supplied by Thyrsus from the Achresis collective: Thyrsus]

Generating Images with AI

Recent Posts

Comments

Subscribe Form