If AI machines can only learn from what has come before, how can they instigate a paradigm shift in creative form? Dr. Gerard Lynch explores.
With recent rapid advances in computational processing and machine learning – particularly the deep learning paradigm – the creative power of artificial intelligence has been unleashed like never before in history.
Artificial Intelligence (AI) methods have been applied to creative tasks in language and literature, the visual arts, musical composition and even the creation of new fusion cuisine. Businesses have been founded to benefit from the fruits of this computational creativity revolution.
But there are some serious question to answer before we all embrace the new AI creative class. How do we rate the creative output of a machine? Science shows that we, as humans, are harsher towards computational errors than we are to human failings.
In an Aeon essay on the topic of computational art, Oliver Roeder of data blog FiveThirtyEight boldly proclaims in his title “There is no such thing as computer art, it’s all just art.”
However for the unconvinced, questions remain. How are these machines capable of creating novelty out of familiarity? How do we encode serendipity into their DNA, and can machines truly be creative?
Lights, Camera, run script…
2016’s Sunspring is a nine-minute ‘love story” set on a space station, and starring Thomas Middleditch, familiar to fans of TV’s Silicon Valley.
There’s just one twist in the tale: the film’s script was generated using a deep learning technique called an LSTM recurrent neural network. The system was fed dozens of science fiction scripts then given some direction and guidance by human director Oscar Sharp. With the minimum of human input, gently nudging when to change scenes and giving it some topical guidance, it generated a sci-fi script.
The result is a surreal short film with moments of clarity and an ambient soundtrack; the lyrics for which were also generated by the machine, affectionately named Benjamin. To date the film has racked up more than 600,000 views on Youtube and is no more inaccessible than the arthouse creations of Shane Carruth (Primer, Upstream Colour) or David Lynch (Mulholland Drive), although it probably won’t be in the running for any Academy Awards.
AI has also taken the lead in directing music videos. In the case of the yet unnamed clip commissioned by Saatchi and Saatchi, artificial intelligence was consulted on all creative decisions including costumes, cuts and editing, and theme generation. Unfortunately, the band involved in the track were not happy with the final version, which again touches on the topic of how accepting we are of the limitations of AI vs that of a human being.
Even theatre has had the AI treatment with the play Beyond the Greenham Common cited as the first “data-driven” musical; a collaboration between creativity researchers at Cambridge and the West End.
Will these artificially generated works be credited to AI Smithee or will we soon see a new Oscar category for best Short Film Directed by an AI?
Music for the masses
It’s not just disposable fluff that are being generated; there’s money to be made from AI art. One noteworthy London startup which recently raised $2m in venture capital funding has come up with a unique selling point for computational creativity; generating royalty-free background music for use in YouTube clips.
Jukedeck allows users to create tailored backing tracks for their own YouTube videos, bypassing expensive royalties for original tracks and the tedium of browsing through stock music databases. With a few clicks, end-users can channel their inner David Guetta and come up with a slick beat and vibes, based on a number of parameters. The results are far less cringe-worthy than the much lampooned Microsoft Songsmith application, a digital tool for aiding the songwriting process.
Finnish researchers in the Department of Computer Science at Aalto University have the used deep learning techniques to create the ultimate rapping robot. In their paper, titled DopeLearning, Eric Malmi and colleagues train a model using thousands of lines from real human rappers. On http://deepbeat.org/, they allow users to create their own raps using their method, while also showcasing videos of human rappers demoing some of the cuts created by the system.
The Sony Music France helmed European Research Council project Flow Machines recently debuted the first fully computer composed pop song, a jaunty three minutes-plus Beatles pastiche called Daddy’s Car, brought to life with lyrics, orchestration and vocals by composer and project leader Francois Pachet. Their system learns patterns in computational representations of musical scores, and generates original works based on these. Critical response has been grudgingly appreciative, but with nearly 2 million YouTube hits at time of writing, the song is hardly a novelty number.
And finally, not that fans of more highbrow compositions should feel feel marginalised, the US DARPA Research agency has been busy developing a jazz AI with a killer groove.
Cracking the creative code
For decades, researchers in the cognitive sciences have endeavoured to synthesise the creative process into a series of steps; a blueprint for creativity. In a 2002 paper titled Cognitive mechanisms governing the creative process, UBC psychology professor Dr. Liane Gabora outlines four main steps necessary for creative production. The first step, named preparation, involves the assembly of information relevant to the domain at hand. As humans and creative beings, we can imagine listening to the entire collection of works of a particular composer in order to gain inspiration or reading all of the novels of an author. In fact this step encompasses by and large our entire life experiences to date. This step can be fraught with failure, as we seek to imitate a particular style or produce a work to a specific brief which is not often an easy task and many would-be creatives despair rather quickly at this point.
The next step, incubation involves processing and incubating this knowledge unconsciously. We are still concerned with solving a particular problem but have ceased to actively work upon it, rather turning our mind to other tasks.
Following this, is the process of illumination. In some ways, this can be seen as a sort of eureka moment, when a breakthrough is made, a particularly inspiring lyric or chord progressions is found, or an idea for a poem is born. The Irish Nobel prize-winning poet W. B. Yeats is said to have been inspired to write his poem The Lake Isle of Innisfree by the sound of an advertising waterfall in a shop window in London’s Fleet Street. This sound coupled with his nostalgia and homesickness transported him to a simpler period in his life when his greatest desire was to live an isolated existence in rural County Sligo, where he spent his more formative years. The birth of an idea provides a ecstatic rush which drives the creative process in humans, the feeling of forging something primal and undiscovered. This step in the process is often romanticised in Hollywood and elsewhere.
Of course, as the old adage goes regarding inspiration and perspiration, an idea rarely emerges fully formed which leads to the next stage of turning the raw materials into something more refined, this step is dubbed verification in Gabora’s work . It is here where we must shape our nascent idea into something presentable, perhaps conforming more to the norms of our particular medium or genre, and this stage can involve a great deal of effort.
In the music industry, we hear of songs conceived in an afternoon, however there are also counterexamples, such as the 1976 classic rock track More Than A Feeling by US group Boston which was allegedly five years in gestation under the obsessive tinkering of lead guitarist and studio wizard Tom Scholz.
What lies beneath
Drawing parallels between this outline of the long feted creative process and the examples of computational creativity, some differences quickly emerge. We, as humans are driven by our life experiences, upbringing and beliefs and knowledge, however a machine is a blank slate, subject to our initiation. In the data driven paradigm, this involves our curation of data sources for the machine to learn from, in a rule based system the encoding of limits to govern the creative direction. Each approach can have interesting results. A rule based system is indeed limited by the biases and approach of a small set of individuals, where an approach based on data can be subject to the preconceptions of a much larger population.
A recent critical study of a knowledge representation system known as word2vec, which learns about word meanings from their contexts and allows knowledge inferences such as man is to woman as king is to queen, found that the system performed relatively sexist inference. When asked for father is to doctor as mother is to blank, it responded with nurse. This is no fault of the machine per se, it is merely amplifying the status quo in the news media text on which it has been trained.
“Evaluation of human creativity is itself a highly challenging task, which demands a large deal of meta analysis to produce verifiable means for why we rate a piece of creative work over another”
Regarding the incubation of ideas, the machine performs steps two and three in parallel, finding associations between diverse nodes of knowledge as it processes the information. Human creativity is inherently non-deterministic; even if you gave two fraternal twins, a pen, and detailed instructions to write on a particular subject, you would hardly expect them to write an identical essay.
Computational methods such as deep learning also have this property, which makes them ideal for computational creativity projects. The downside of these methods is the lack of accountability and understanding of how a machine produced a particular output.
The final step in the chain, evaluation, can be the most challenging for a machine, to evaluate itself and its productions. Evaluation of human creativity is itself a highly challenging task, which demands a large deal of meta analysis to produce verifiable means for why we rate a piece of creative work over another, as a large degree of subjectivity can be involved. We as humans can almost instinctively tell if something is rotten in the state of a creative work, however training a computer to do this is another leap entirely. How do we account for taste in computational creativity?
The proof of the pudding
There is one area however where computational creativity can produce results which are both tasteful and tasty. IBM’s Watson supercomputer has been known to beat human contestants on popular quiz show Jeopardy but how does it fare in the kitchen? The Chef Watson project added a bit of AI sauce to the creation of novel fusion recipes.
At its core, the system was fed thousands of recipe texts, subtly learning which combinations of ingredients go together using natural language processing, coupled with a scientific knowledge base of what makes different foods taste like they do.
Some of the results of the system might leave a sour taste, and Watson seems particularly fond of combining sweet and savoury ingredients but there’s definitely some food for thought in the results.
Austrian chocolate burritos with edamame beans and apricots anyone?
In a recent TED talk, Dr. Blaise Argeura y Arcas demonstrates the power of creative computing in the visual arts. The Distinguished Scientist behind creative technologies at Microsoft Research and now at Google’s Machine Intelligence division, he and his team have developed sophisticated systems which can learn the characteristics of an image using deep learning and then allow users to apply this image as a filter to their own works. Far from simple Instagram-style filters which mimic sepia toned images or grainy Polaroid snaps, these filters allow the user to skew images in the style of a Modernist painter such as Piet Mondrian or apply the characteristics of a Van Gogh work to their holiday photos. The results are jaw dropping.
Furthermore, this technology is now available to the consumer in the Prisma app, bringing the power of years of machine learning research and artistic expression to the palm of your hand.
Poetry in motion?
Towards the end of the talk, Arcas demonstrates another project from his lab. A man wearing a printer connected to a neural network of machines is generating poetry based on what it sees in the camera. This spin on the photo caption generating AI system which Google has recently developed allows the machine to mix sensory inputs in the creative process.
Given that computers have been able to process text since the early 1960s, it is perhaps unsurprising that some of the early creativity experiments have focused on the written word. A system called RACTER developed by William Chamberlain and Thomas Etter in 1984 generated prose and poetic statements using a set of hand-crafted rules.
The surrealist novel The Policeman’s Beard is Half Constructed is claimed to be the first book entirely to have been written by a computer system, although the examples in the book have been edited from a larger set of outputs by RACTER. The book contains examples of limericks, Shakespeare-like monologues and philosophical statements such as:
“More than iron, more than gold, more than lead, I need electricity. I need it more than I need lamb or pork or lettuce or cucumber. I need it for my dreams”
The Postmodern Essay Generator (try it out at http://www.elsewhere.org/journal/pomo/) thumbs a computational finger at impenetrable academic prose in the humanities. This system randomly generates sample essays in a post-modernist style,. Whether this project conceived in 1996 was a statement about the absurd nature of humanities prose or something else entirely will never be truly confirmed.
Foucaultist power relations and cultural feminism
“Sexual identity is part of the failure of truth,” says Sontag. Baudrillard uses the term ‘cultural feminism’ to denote the role of the observer as poet. But Foucault suggests the use of Batailleist `powerful communication’ to attack the status quo.”
If both author and critic have been computerised, what is left for humans to do?
A perhaps unlikely proponent of computational linguistic creativity in a compositional context was the late David Bowie. In 1997 he developed an application along with a number of friends to help him in the lyric writing process. In a Vice article from earlier this year topic, journalist Mathew Braga writes about how the song Hello Spaceboy from the 1995 album Outside was written mostly using this technique, which involved taking a piece of text and randomly chopping up sentences and rearranging them, a digital evolution of the famous cut-up technique favoured by the Beat writer William Boroughs, among others.
“Computational poetry is one of the oldest and most well established areas of interest, however advances in technology have resulted in a wider range of outputs.”
Dr Simon Colton and his team at Goldsmiths, University of London, created computational poetry based on articles from the Guardian. A complex array of text processing system are employed in this endeavour, including sentiment analysis, which can measure the tone (happy/sad) of all of the news on a particular day, along with linguistic ontology, which models the structure of human language synonyms and relations between words. Computational poetry is one of the oldest and most well established areas of interest, however advances in technology have resulted in a wider range of outputs.
The following poem was generated by the system based on one day’s worth of articles on the Guardian online site. A unique feature of their work is that the system generates a natural language description of the features used to create the poem, a unique insight into its very own creative process.
“It was generally a good news day. I read a story in the Guardian culture section entitled: “South Africa’s ANC celebrates centenary with moment in the sun”. It talked of south africans, interfaith prayers and monochrome photos. Apparently, “The heroic struggle against a racist regime was remembered: those thousands who sacrificed their lives in a quest for human rights and democracy that took more than eight decades” and “At midnight he watched with amusement as Zuma lit the centenary flame, at the second attempt, with some help from a man in blue overalls marked ʻExplosivesʼ”. I wanted to write something highly relevant to the original article. I wrote this poem.
The resulting poem from Colton’s team, Blue overalls, goes like this:
the repetitive attention of some traditional african chants
a heroic struggle, like the personality of a soldier
an unbearable symbolic timing, like a scream
blue overalls, each like a blueberry
some presidential many selfless leaders
oh! such influential presidents such great presidents blueberry-blue overalls
a knight-heroic struggle
It would be of course difficult to write on the topic of computational linguistic creativity without mentioning that I have my own horse in the race. My own pet computational creativity project falls into a similar domain to Colton et al,. Given a news article, can we generate a “kicker” title which is creative while also being informative? A kicker or small descriptive tagline which comes along with the main headline, is used in particular by magazines such as the Economist and online news sources such as Quartz.
Some recent examples from the Economist include: Keeping it Riyal (Gulf currencies) Til Debt do us part (Financing divorces) and Things are looking app (health care, mobile health apps). The approach I took was to extract topical information from an article and use this information to search through a massive database of songs, books, TV shows and movies.
By doing this, we are drawing upon the creativity of hundreds of years of human art, and hopefully ensuring that any output is somehow reflected in grammatically correct, creative phrasing.
Some successful outputs of the system included Obama of the People, based on a Quartz article about influencers on Twitter which included the President himself, and World of a Woman, Half Man Half Woman and No Man’s Woman for an Atlantic Magazine feature on how the Norwegians manage to get equal numbers of men and women on their corporate boards.
In the Obama example above, the system identified the President’s name as a frequent keyword in the article, along with people, Twitter, and other words which don’t occur so frequently in song or book titles. The title which matched best with the most keywords was the well known Karl Marx quote about religion being opium for the people (or the masses, depending on which translation you go with). The final link was the system deciding that Obama sounded enough like opium (controversial) to replace one with another and present this as the most creative and still grammatically correct candidate.
However even journalism’s days could be numbered. US-based Narrative Science is using text generation techniques to create artificial reports, on subjects which might not normally not benefit from journalistic attention, such as Little League baseball in the US, or the thrilling narratives of stock market reporting. Examples of these vignettes can be read on their Forbes page. Their systems were also employed recently during the Rio Olympics on behalf of the Washington Post Is the writing on the wall for sports journalists, or does this open up new opportunities for longer, more introspective analytic articles in this domain.
The road ahead
The field of computational creativity touches on all areas of human creativity and new systems are emerging rapidly.
One highly important question to ask ourselves regarding computational creativity is thatby allowing machines to synthesise creations based on new works, are we simply creating a new form of filter bubble?
As the machines strive to learn our individual preferences and interests and synthesise and deliver content which fits into our world view, they subsequently isolate us from challenging issues. Instead of the World Wild Web, we are become segregated by our specific interests, safe behind our high digital picket fences, protected from the grim realities which may face us outside our own familiar territory.
“If the machines can only learn from what has come before, how can they instigate a paradigm shift in creative form?”
When folk singer-songwriter (and recent Nobel Laureate) Bob Dylan picked up an electric guitar in 1966 he ushered in a new wave of “plugged in” folk music which incorporated new bold sounds challenging the status quo. Before and after the Beatles made their own passage to India, pop artists have been travelling far and wide to incorporate the sounds of world music into their works, inspiring albums such as Graceland by Paul Simon.
Visual artistic movements at the turn of the 20th century abandoned the grand plans of capturing life in a photo-realistic fashion, instead breaking things down into the abstract (Cubism) or basic (Primitivism).
Modern fusion cuisine is born of migration and cultural exchange, take banh mi, a marriage of French bread and cold cuts with a Vietnamese sense of flavour. The temakeria gave Sao Paulo’s Japanese immigrants an opportunity to showcase their sushi skills with a Brazilian twist.
These drastic changes and inventions are what is known as a paradigm shift, when the rulebook is rewritten and lines redrawn.
If the machines can only learn from what has come before, how can they instigate a paradigm shift in creative form? Is there room for the beautiful mistakes that make for lasting creative impressions?
Or will computational system be forever known as one-trick ponies, churning out lame pastiches of popular art until we reach for the off button?
Dr Gerard Lynch has a Ph.D in computational linguistics from Trinity College Dublin and enjoys engineering complex computer systems to carry out seemingly frivolous tasks. He is currently thinking at a health tech startup in London.