DALL-E Proves the Unbounded Abilities of Artificial Intelligence

The creative power of the human mind has often been recognized as the greatest force in art. The ability to internalize real-world circumstances and transmit thought into visual form, storytelling or music is a facet of human society that can be traced back to the beginning of recorded history. The sanctity of the human mind within the realm of art has long gone unchallenged, yet modern technology has posed some counterarguments to the assertion that sentience is required to produce creative works. Artificial intelligence, or AI, is a broad category of machine learning technology whereby computer programs are exposed to data and subsequently begin to work independently to complete tasks. One recently announced program has demonstrated abilities that are leaps and bounds beyond the limits of its contemporaries, and has unlocked the yet unforeseen power of AI-generated art.

The new program, known as DALL-E, has demonstrated that the sky is the limit for creative artificial intelligence. DALL-E was developed in 2021 by OpenAI, an artificial intelligence lab that has spent the last seven years programming applications that approximate human ability in various fields. The platform derives its name from two radically different influences: Spanish painter Salvador Dali and the lovable robotic protagonist of Pixar’s “WALL-E.” It has garnered a devoted online following for its revolutionary ability to understand complex phrases and produce unique, original computer-generated visuals based upon written sentences.

The platform’s user interface is reminiscent of many search engines, with a text bar for users to input phrases that serve as instructions for generating the original images. Within 30 seconds of a user hitting enter, half a dozen rendered images appear onscreen. The content of the images varies slightly from one picture to the next, with some demonstrating a literal interpretation of the searched phrase while others explore implied meanings of the searched words. The truly remarkable ability to interpret the strings of words in several manners demonstrates an inventive level of textual understanding that feels impossibly human for an AI. The platform’s website advertises many of its most impressive capabilities, such as: “creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.” These descriptions only scratch the surface of what DALL-E is capable of, yet OpenAI has already moved beyond this first program in a quest to code something even closer to sentient life.

DALL-E was quickly followed by DALL-E 2, a similar application that performs nearly the same function but displays crisper images and has a more advanced understanding of English language syntax. Neither application is available for public use, with the latter in beta testing and made available to select online personalities to advertise its features. It is not apparent when or if the platforms will be released for general use, though it seems likely that it would exist behind a paywall should a public version be developed. The lack of general knowledge concerning the complete functionality of the program or its technical foundation has left many to speculate about what code powers the two applications, though OpenAI’s website provides a wealth of knowledge about certain components of their inner workings.

Since its inception in the 1940s, digital computer technology has been able to interpret human inputs and produce a desired response, typically in the form of text. When a search engine or website is asked to display an image, such as on Google Images, it does so by retrieving an existing file that it understands to be linked with the search terms via machine learning processes. DALL-E is built upon the framework of Generative Pre-trained Transformer 3 (GPT-3), a language algorithm that learns to predict and generate sequences of text. The platform uses this coding model and expands upon it, housing its own database of reference images in a manner reminiscent of a search engine. It harnesses GPT-3 to recognize the order and significance of words and to scan multiple images that are associated with different words in a search. Once it comprehends the string of input vocabulary using these references, it can then generate an original image by combining the disparate content in the search phrase.

There are countless reasons to praise the minds behind DALL-E for concocting a creative tool that has such an elevated understanding of language and visual art, though there is also cause for concern. The art world was immediately concerned about a marketplace in which artificial intelligence can push living artists out of a job. The frenzied discourse around DALL-E is sensible for those who are concerned about their careers, though this is not the first time visual artists have been threatened by, but ultimately survived, the march of technology. Photography was also once a feared new medium, with the ease of capturing real-life imagery seemingly challenging the job security of portrait artists and impressionist painters. Though the medium could have replaced the demand for painted artworks, the classical forms of the visual arts have survived in the era of cameras because photography constituted a separate sector of the art world and was often used by painters to provide inspiration for their work. OpenAI’s stated goal for developing the DALL-E programs is to assist graphic designers by giving them a tool to quickly generate reference images that can be used in several ways for further artistry. The ability to generate reference images in a rapid manner and of a style that the artist may not have considered is an incredible asset for those who learn to use it and will likely contribute more to artists than it will take away.

The impressive technology at play within DALL-E proposes another ethical dilemma. The significant difference between a sentient artist and a robotic curator is the presence of a moral compass within the former. DALL-E can render photorealistic visuals and could hypothetically be asked to depict damaging content without much participation from a user. In preparation for such circumstances, the AI refuses to generate images using some violent or explicit search terms and will also avoid producing visuals containing public figures. These decisions have pre-emptively circumvented some forms of abusing the technology, though crafty users can search precise, uncensored terms to generate imagery that approximates what the program would refuse to depict with censored terminology. It is easy to blame DALL-E for this defect, though the user is still the driving force behind any reprehensible works the application makes. Human artists have also shown tendencies to produce despicable art without the wonders of 21st-century technology, as numerous propaganda artists of past centuries demonstrate. Any method of communication can be channeled for questionable aims, yet it is not sensible to blame the tool for an issue that lies squarely with its user.

Though the platform’s name references Dali, it is actually worth examining the difference between the program and the painter to ease the concerns of those who find DALL-E and its successor dangerous. Salvador Dali was an eccentric abstractionist painter who was instrumental in the 20th-century shift away from impressionist painting toward postmodern art. His incredibly stylized work is instantaneously recognizable and the product of his ingenuity; his brush brought into existence contours and compositions that nobody had previously imagined. DALL-E, on the other hand, can only emulate, and its ability to create new styles or forms beyond what exists in its database of visuals is limited. The program cannot follow in Dali’s footsteps and take the next quantum leap in artistic thought in the same way aspiring artists of today undoubtedly will. Whether or not it is being used to originate, emulate, or outright copy a style or form, it still requires a creative mind to take the wheel and lead it in a certain direction. DALL-E doesn’t need to ring alarm bells for a war against technology, but rather, it reminds us that even when artificial intelligence progresses, we can recognize it as an extension of ourselves.