True intelligence is subjective. As machine learning progresses, we are witnessing AI programs capable of calculation, data recognition and broader analysis far beyond what humans are capable of. One aspect of intelligence that AI has been lacking, however, is true imagination. But is this for the best?
The potential for algorithmically-generated imagination has long been possible, but it has taken the implementation of a new direction of neural network interplay to be realised. Generative Adversarial Networks are a branch of algorithms that have recently garnered a great deal of attention, not only from the machine learning R&D community, but also from the mainstream media.
GANs utilise the power of two neural networks that are pitted in direct competition with one another. The goal is to create completely entirely new example data, particularly images.
How GANs Work
The generative network is initially fed training examples, e.g. pictures of dogs, and constructs completely new outputs that are sent to the discriminative network in order to be analysed and scrutinised for authenticity. If the discriminator receives a dog image from the generator and finds it to be inauthentic compared to the confines of the original training data set, it is rejected and returned to the generator, which then produces a new variation of the output. In the end, the generative network learns from its mistakes to a point where it can fool the discriminative network.
“GANs are the most interesting idea in the last 10 years in ML”
In essence, this is artificial intelligence training itself. Such is the groundbreaking nature of GANs, there is some debate as to what sub genre of machine learning they belong to. Because the ‘authentic’ images used to train the generative model are unlabelled, these adversarial neural networks are classified as unsupervised learning. Equally, the need for a form of initial training data to establish the parameters of the image has led some researchers and developers to dub it ‘semi-unsupervised’ learning.
Whatever their classification, GANs are currently responsible for machine learning’s main headlines, with numerous research papers, news stories and op-ed pieces attracting mainstream attention. In a 2016 online Q&A session, Yann LeCun, Chief Artificial Intelligence Scientist at Facebook AI Research, went so far as to argue that GANs are “the most interesting idea in the last 10 years in ML.”
Deepfakes: Society Killer?
Various media outlets have been quick to showcase the capabilities of GANs - in particular, the formation of so-called deepfakes. These are videos in which GANs have been used to insert celebrities’ faces and voices into pre-existing clips. Essentially, whatever the person in the original video is doing or saying, the celebrity has now replaced them. Steve Buscemi’s face and voice were inserted onto Jennifer Lawrence for her 2016 Golden Globes acceptance speech in one instance, while other notorious examples have included other famous actors inserted into porn clips.
To get a greater understanding of the potential scope of GAN applications, and whether or not they will be well received, BDJ spoke with two leading experts in the world of machine learning and artificial intelligence. Reza Zadeh is founder and CEO of Matroid, and an adjunct professor at Stanford, previously serving on the Technical Advisory Boards of Microsoft and Databricks. Rachel Thomas is the co-founder of fast.ai, and a professor at the University of San Francisco.
The Reality of GANs
Because of the relative infancy of the technology, the degree of autonomy that GANs are capable of is often misunderstood. Reza offers a tempering clarification concerning the current potential for GANs to craft the desired type of unique images, with no initial developer guidance.
“We've never been able to automatically generate realistic images until GANs came along. From a subjective PoV, I think that's quite cool. However, with GANs, we have trouble prescribing what is generated, and need to give many examples of it, which seems counterproductive because: why are we generating pictures of something we already have? The reality is we can use the quirks of the generation to understand processes that lead to realistic images, and hope that we can eventually create realistic images with very little prescribed.”
As Reza points out, the current state of GANs lack what can be described in layman's terms as true inspiration. They can create, but require a large degree of initial hand holding from developers - similar to other forms of machine learning.
“GANs have been overhyped, but that generative models (not necessarily adversarial) are very powerful”
Rachel has been working with GANs at fast.ai for the past couple of years, and has seen firsthand their potential vs the very public speculation regarding their ongoing capabilities.
“I think that GANs have been overhyped, but that generative models (not necessarily adversarial) are very powerful,” Rachel says. “While we have been teaching GANs since early 2017 in our fast.ai course (which has been taken by over 200,000 students), we have more recently focused on other generative models that provide equally good results, only much faster.
“For instance, in the most recent version of the course, we showed how to make blurry photos (such as the one on the left, which was input) more sharp (the algorithm output the one on the right). We did this first using GANs, and then without (which was much faster). The picture below (on the right) did not use a GAN:
“A downside to GANs is that they are brittle and slow to train. A fast.ai student, Jason Antic, has done some great work adding color to old photos in his project DeOldify:
“Jason uses a generative model that is not a GAN for this.”
One doesn’t have to search very hard to find various news articles that showcase the results of GAN image generation, often accompanied with user comments ranging from the highly impressed to the deeply pessimistic regarding the technology’s potential for abuse. Granted, when looking at the generated images of fictitious people, their photo realism is obvious. However, just how big a threat do these capabilities represent, especially with the recent phenomenon of fake news, including allegations against state actors generating fake profiles on social media sites to spread misinformation?
“We have to be more careful to distrust images, that's all. It's not a big deal”
“The danger isn't that big and is very overblown,” Reza says. “We have to be more careful to distrust images, that's all. It's not a big deal. It used to be photos could be vaguely trusted as evidence of something, but not anymore. As long as we educate the public on that, there's not much danger, or any at all really.”
Technology capable of producing convincing doctored imagery is nothing new. Photoshop and other Adobe Creative Suite applications have introduced industry-quality levels of media creation into the consumer sphere, with the limits of their use often being said consumers’ imaginations. Convincing faked photographs, and edits of text/Twitter conversations have all become common sights across the digital sphere.
“There are dangers from generative models (not specific to GANs, which are just one class of generative models) around disinformation,” Rachel says. “Keep in mind that disinformation is already a huge issue, even when using primitive tools like simple memes and Photoshop. Russia in particular has effectively used disinformation to meddle in the 2016 election, to sow divisiveness, and even fuel measles outbreaks by spreading false info about vaccinations.
“Again, this threat is not specific to GANs, or even to images. Consider the concerns about how OpenAI's new language generation model could be used to create computer-generated text at scale. As a society, we are doing a poor job of addressing misinformation (and how easily the major tech company platforms can be manipulated), and generative models can increase this danger.”
GANs may represent a new level of sophistication for these practices, but the underlying concept of faked imagery and video being widely propagated remains the same. All that is required is ongoing awareness of the matter. “As long as we tell journalists and other folks to distrust images from now on, I think it'll be OK,” says Reza. “This is why I put time into speaking with journalists about it.”
The Positive Potential of GANs
The vast majority of media attention given to GANs focuses on their ability to produce faked representations of celebrities, or the generation of fictitious people for nefarious purposes. Of course, such technological capabilities are completely at the mercy of whoever is utilising them. To discount GANs’ potential for image manipulation on a more general scale, though, is to be short sighted about their potential.
“Generative models can be used to create text, summarise paragraphs, or answer questions”
“Generative models (not just GANs) hold a lot of creative potential to fix and enhance images (Adobe is investing heavily in deep learning), as well as to create new artwork,” Rachel argues. “In the area of languages, generative models can be used to create text, summarise paragraphs, or answer questions. There is also potential for generative models to be used to augment data (a technique that helps models to train more accurately on smaller data sets).”
The potential of GANs goes far beyond the realm of image manipulation. The broad nature of the underlying technology though leaves the door open to a wide range of applications.
“The idea of two competing neural networks helps us achieve more performant neural networks across many applications,” says Reza. “VR, AR, and Video game overlays can become much more realistic, and there's also the idea of two competing neural networks which can be applied to other domains. That lets us generate audio and other content we've had trouble generating. So it's quite useful.
“For me, the idea of competing neural networks is the most broad applicability, but a sense of excitement there is hard to convey to non machine learning practitioners.”
Developing neural networks to function in symbiosis with one another is an important step towards developing a new category of artificial intelligence - one that displays the ability to modify its both approaches and responses to a set task. The concepts of teaching and learning require a level of independence from one another, and therefore a level of reasoning. While the direct methodology of GANs at present are still trial and error, there is scope for the adversarial aspect of the technology to increase in complexity.
The Next Step on from GANs
GANs have very much captured the imagination of a large portion of the machine learning community - with their current applications able to generate tangible, visible results that can be shared by those outside of the machine learning field, it is easy to see why. We asked Rachel and Reza what they saw as the most exciting area of machine learning research beyond GANs.
“New research is being done to achieve the same results as GANs with generative (non-adversarial) models, which are often much faster”
“I think the next breakthrough will be ML models that unify interaction with the physical world with vision and language systems,” Reza says. “Tasks like grabbing objects using a robot arm, self-driving cars, and walking/running simulations are all seeing good progress and are likely to see a big increase in capability soon.”
Rachel adds that, “New research is being done to achieve the same results as GANs with generative (non-adversarial) models, which are often much faster. Beyond that, natural language processing is currently going through an explosion of advances (similar to where computer vision was a few years ago).
“Fast progress is being made on generating text, classifying text (e.g. is this review positive or negative?), translation between languages, question answering, text summarisation, and more. For instance, see ULMFit, BERT, and GPT-2 which all came out in the last year.”
Machine learning is rapidly continuing its advancement, and breakthroughs like GANs put the larger field under an intense amount of public scrutiny. The creation of deepfake videos are just one aspect of a potential huge tidal shift in how privacy and personal identity are protected and commoditised. The question is, will legal frameworks need to drastically change to cater for this?
“Legislation has really lagged behind in keeping up with the huge impact and influence that major tech companies are having on our society,” Rachel says. “I think that Anil Dash's framing is helpful: there is no ‘tech industry’ as that label is so broad as to have lost meaning, as more and more industries use technology (and the major tech companies are involved in so many different fields).
“We should focus on regulating specific use cases: such as how algorithms can discriminate in hiring, firing, and criminal justice decisions; how social networks promote extremism and even genocide; how police departments are using facial recognition technology; and increasing surveillance and lack of privacy. I think it is helpful to frame this around what human rights we want to protect.”
The fear over the potential applications of deepfakes has been widespread and, in many cases, hysterical. A future in which no political speech can be trusted and smear campaigns are spawned in the imagination of deepfake programmers is fanciful - so far the faked videos have simply added noise to the conversation rather than diverting it. Those who already hold certain views will want to believe that the Pope endorsed Donald Trump, for example, but the fakes are too easily spotted for most people to be duped.
One reason deepfakes haven’t shaken our political system too violently is because they are fairly straightforward to track. Machine learning algorithms can identify doctored video and, for trolls, time can actually be bettered used disseminating lies that won’t be picked up by an algorithm. So far, the most damaging uses of deepfakes have been in pornography, splicing the faces of famous people onto the bodies of porn actors. Of course, this needs to be addressed and dealt with, but there is little evidence that deepfakes will have any real impact in politics. Forged photographs, for example, are not a political force. Deepfakes are impressive and uncanny, but not terrifying.
Illustrations by Kseniya Forbender
To contact the editor responsible for this story:
Margarita Khartanovich at [email protected]
- What’s Up With... That Virtual Reality (VR)? Is It Still The Thing?
- Can We Stop Our Toasters From Spying on Us?
- Can Blockchain Be Censored?
- Microsoft Cortana Research: Could Negative Perceptions of AI Harm Its Development?
- What Are Digital Twins and Why Are They The Next Stage in the Internet of Things (IoT)?