By turning ‘numbers into words,’ statistics and data science companies are inventing creative algorithms that can program a bot to write seemingly original stories on sports, finance and food. Is our collective careers as story tellers in danger from a helix of code?
Let’s begin at the beginning. In the beginning, the Neanderthal man discovered that he could mix animal fat, earth minerals, pigments from soil, rock, plants and bind it all together with egg to have the first form of paint. He took pains to fill the walls of his cave home with crude, fantasy drawings of the animals running wild in the hot Savannah. It was a ritual way of portraying the hunt, to depict his mastery over the natural scenes over which he had no control, and as a means to glorify the hunts of yesterday. That’s how man began narrating stories. It was considered a uniquely human ability: to combine memory, history, plot, narrative and the proverbial magic-realism to string together stories of our times gone by and times yet to come. Man had learnt to record his dreams, his anxieties and his hopes for the future. Fortuitous, I would say, because the evolution of writing, recording and modern storage devices can directly be traced to that first stroke of organic pigment to the basalt walls.
Cut to 2013 and today we have machines whose primary job title could very well read as ‘story teller’, ‘reporter’ or ‘writer’. I have come across streams of news highlighting the advent of the bots that are capable of stringing together snippets on topics as diverse and complex as finance and futures trading, medical writing to pharma, and from sporting and games to restaurant reviews. It is much cheaper for a content syndicate to outsource writing jobs to bots, who can cobble anywhere from a 100 to 30,000 articles per day depending on the company coding its logic. The companies depend on some complex juggling of programming, data mining, data analysis, and natural language processing to derive very specific results, either in the form of search, social media friend finding, fact checking on data entry tables, or to compile reports. We are all quite familiar with the end product of this mechanism: algorithm.
What are algorithms and why have they come to rule our digital world? This is HowStuffWorks’ explanation: To make a computer do anything, you have to write a computer program. To write a computer program, you have to tell the computer, step by step, exactly what you want it to do. The computer then “executes” the program, following each step mechanically, to accomplish the end goal. When you are telling the computer what to do, you also get to choose how it’s going to do it. That’s where computer algorithms come in. The algorithm is the basic technique used to get the job done.
Now, you can begin to understand how densely complex Google’s search algorithms are. When you search for “post-graduate courses+astrophysics”, not only does the search engine have to throw up the list of universities offering the courses, it also sorts through the list in terms of chronology and relevance. The algorithm also takes into account where you reside and your previous search history as well, giving you what it ‘thinks’ is the most relevant result, in well under 0.22 seconds. We have also been following the debates on the restrictive world that algorithms promote, inducing us to walk a tight-rope path that begins with the Top 10 search results and ending with the curated list of people, news, events, and web media that algorithms spotlight for us. The most vocal assailants of the culture of algorithm curation are Evgeny Morozov (Algorithms and big data), Eli Parser (The Filter Bubble – What the Internet is Hiding from you), Clay Shirky (Unintelligible Scale of Google Algorithm), and Sherry Turkle (The Second Self).
“Children know that the telephone is a mechanism and that they control it. But it’s not enough to have that kind of understanding about the computer. You have to know how a simulation works. You have to know what an algorithm is.” (via Bill Kerr’s blog)
However, businesses are in for the hefty bottom-line and turnovers that bots guarantee. There is no three-way dialogue between the public, the programmer and the for-profit-venture, all of whom are stakeholders of the Web, but who have different expectations, needs and relationship within this ecosphere. While newer jobs are being invented for the programmers, coders, data miners and analytics executives, is it likely that other set of profiles will soon become redundant? I was particularly keen on understanding where we stand in tandem with the writer bots because story-telling is such an “innately” bio-cultural and even spiritual ability. You can’t get a machine to believe in god, can you? You can get a machine to feel empathy for a hungry kid, right? Similarly, can you get a machine to tell a story that has the right amount of humor, tragedy, romance, thriller element, and closure device at the end?
Well, turns out that the answer to this is a yes and a no. Plots in fictional stories are largely devices that can be machine coded. When I was a student of Cultural Studies, I learnt that folk tales and fairy tales, though varying in the deeds and events, all follow a common theme or structure of narration: a family in the village, a girl / princess in peril, a prince / hero / woodcutter on a quest, the horror element of a witch / curse / evil step women / wolf, rescue mission – love blossoms – another mini-adventure – resolution of conflict – lovers unite – happy ending. If you look at all the famous stories that we grew up on, including Little Red Riding Hood, Snow White, Sleeping Beauty, Beauty and the Beast, Cinderella, Peter Pan, and Thumbelina, all follow a meta-structure that can mixed and matched by taking out or adding different characters. This seemingly ‘computational’ mechanism followed by tales allude to the possibility that perhaps stories are indeed codes, symbols for a narrative that is not really beyond a machine’s logic. (Read about the ancient origins of fairy tales).
Of course, data analysis companies don’t have to contend with something as ancient as folk tales, but fact-based data rooted in 21st century English, something which technology start-up Narrative Science took advantage of. Billing itself as a “technology company that solve problems and generate(s) revenue by leveraging highly structured data, turning it into actionable stories and insights,” the Chicago-based firm has understood the truly sell-able part of data: it’s not the numbers or the pie charts, but the stories that “all you have to do is read.“ Steven Levy, who interviewed the venture for The Wired’s piece on spam bots that can write, explains in his article: Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data…Then comes the structure. Most news stories, particularly about subjects like sports or finance, hew to a pretty predictable formula, and so it’s a relatively simple matter for the meta-writers to create a framework for the articles. To construct sentences, the algorithms use vocabulary compiled by the meta-writers.”
I am sweating. Or, at least, I feel sympathy sweat break out on my forehead for all my friends who are journalists and into professional reporting on sports and finance. Do they enroll for computer science classes at Coursera and Lynda.com and prepare for a fall-back, foolproof career option as coders? The CTO and co-founder of Narrative Science, Kristian Hammond has gone a step further in predicting that a computer would win a Pulitzer Prize in 5 years. I didn’t know that the jury at the Pulitzer were handing out awards for the most logically compiled news snippet. Jokes aside, Hammond rushes to assure that our jobs are not at stake, yet. He says: “This robonews tsunami, he insists, will not wash away the remaining human reporters who still collect paychecks. Instead the universe of news writing will expand dramatically, as computers mine vast troves of data to produce ultracheap, totally readable accounts of events, trends, and developments that no journalist is currently covering.” Definitely not Pulitzer-worthy.
Adds Levy: ”Maybe at some point, humans and algorithms will collaborate, with each partner playing to its strength. Computers, with their flawless memories and ability to access data, might act as legmen to human writers. Or vice-versa, human reporters might interview subjects and pick up stray details—and then send them to a computer that writes it all up.” That day is already here. As this BBC News Online article that talks about how Wikipedia is maintained and managed by not just a team of editors, but also thousands of machine volunteers: “Bots have been around almost as long as Wikipedia itself. The site was founded in 2001, and the next year, one called rambot created about 30,000 articles – at a rate of thousands per day – on individual towns in the US. The bot pulled data directly out of US Census tables. The articles read as if they had been written by a robot. They were short and formulaic and contained little more than strings of demographic statistics. But once they had been created, human editors took over and filled out the entries with historical details, local governance information, and tourist attractions.”
Morozov neatly cuts to to the chase: “To understand the limits and opportunities of algorithms in the context of artistic creation, we need to understand that the latter usually consists of three elements: discovery, production, and recommendation.” The process of putting out a new record and promoting an artist begins with spotting talent, which is most often than not serendipitous, coincidental and random. Why certain bands become a hit with music listeners and others bite the dust is not something we have been able to logically analyze. There is no guarantee that a particular kind of sound that was a hit with the previous band will receive the same reception with the new kids on the block. Audience taste changes and right now we haven’t invented prescient bots or written codes that can predict the next big wave, be it music, art or films. This hasn’t stopped a Beijing-based punk rock band from trying to preempt audience reaction, in a circuitous fashion.
Explains Morozov: “Last December, the Global Times, China’s English-language tabloid, ran a story on the local punk band Bear Warrior, which found an ingenious way to measure the audience response to their songs. Its lead singer is a graduate student majoring in precision instruments at a university in Beijing, so he designed a device—”POGO Thermometer”—that measures the intensity of the audience’s dancing through a series of sensors embedded in the floor carpet in the music hall. The signals are then transmitted to a central computer where they are closely analyzed in order to improve future performances. According to the Global Times, the band found that fans “started moving their bodies when the drums kicked in, and they danced the most energetically when he sang higher notes.” As its lead singer put it, “the data helps us understand how we can improve our performance to make the audience respond to our music like we intend.””
Definitely a sell out.
Which brings me to my intellect’s knight-in-shining armor, Mr Jorge Luis Borges and the Infinite Monkey Theorem (whose germination can be convolutedly traced all the way to Aristotle, Cicero, Blaise Pascal, Jonathan Swift, Aldous Huxley and Jorge Louis Borges). In simple words, the theorem advances that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare. In this context, “almost surely” is a mathematical term with a precise meaning, and the “monkey” is not an actual monkey, but a metaphor for an abstract device that produces an endless random sequence of letters and symbols. The relevance of the theory is questionable—the probability of a monkey exactly typing a complete work such as Shakespeare’s Hamlet is so tiny that the chance of it occurring during a period of time even a hundred thousand orders of magnitude longer than the age of the universe is extremely low (but not zero) – Wikipedia. The “not zero” is the punch line. In effect, the probability of the word ‘banana’ being written ends up being computed as:
(1/50) × (1/50) × (1/50) × (1/50) × (1/50) × (1/50) = (1/50)6 = 1/15 625 000 000
Jorge’s 1939 short story The Total Library, obliquely takes a cue from this theorem and gives us a universe that is a library, an infinite library filled with books composed of 20 symbols (22 letters, space, period, comma) whose variations with repetition encompass all that is possible to express: in all languages. “Everything: the detailed history of the future, The Egyptians of Aeschylus, the precise number of times that the waters of Ganges have reflected the flight of a falcon, the secret and true name of Rome, the encyclopedia Novalis would have built, my dreams at the dawn of August 14, 1934, the proof of Pierre Fermat’s theorem, the unwritten chapters of Edwin Drood , those same chapters translated into the language spoken by the Garamantes, the paradoxes Berkeley invented about time and not published, the Gnostic gospel of Basilides, the faithful catalog of the Library, the demonstration of the fallacy of this catalog. Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings.”
Jorge’s understanding of how permutation and combination worked and our own modern-day preoccupation with data mining and big data analytics shows that mathematics can indeed help us imagine a plausible course where monkeys and machines can both replace humans on the keyboard. Numbers can stand in for words at a level where the idea and information are conveyed. Ten years down, the bots and its creators would have us believe that it’s best we subscribe to news that is crisp, relevant and factual. We are already being fed a diet of news bites that are visibly less nuanced, lack an empathetic approach or an artistic vision. Perhaps, in a 100 years, it wouldn’t matter that our story tellers have gone underground, compiling compendiums of times past, of urban mythologies that begin with the line, ‘Once upon a time, there lived a great story teller called Bot”. History (and psychology) tells us that we don’t miss what is out of sight. Data Analytics fine-tunes that sentiment and says: what is outside the filter bubble simply doesn’t exist.