Celebrity Fashion: GANs generate realistic fake people, and they’re raising big questions in media, the law, and AI
Take a look at the person at the top of this article. She looks friendly — perhaps she’s someone you’d connect with on Linkedin, or hire to run your social media. If you saw her at your kid’s gymnastics class, you’d say hello and make some awkward small talk. Depending on your persuasion, you might even swipe right on her Tinder profile if given the chance.
There’s only one problem — she doesn’t exist. The image above was generated using a novel Machine Learning technique called Generative Adversarial Networks (GANs). Invented in 2014, the technique has exploded in popularity and possibility. Turing award winner Yann LeCun described it as “the coolest idea in machine learning in the last 20 years”. It’s used in video gaming, astronomy, and art. And it’s taking the media and legal worlds by storm.
GANs Under the Hood
GANs works by taking two Deep Learning neural networks and pitting them against each other in a mini battle-royale. The first network is the generative network. It’s usually a Convolutional Neural Network and is trained on a set of sample images. Like all CNNs, it learns the attributes and patterns of the training set and is able to start producing similar images from scratch.
The second network is the discriminative network. It’s trained on the original data, too, and is designed to evaluate whether a specific image does or does not follow the statistical distribution of the original data set. Put more simply, when given a new image, it tries to guess whether the image belongs in the original set of images or does not belong.
This is where things get interesting. Rather than letting the networks quietly hum along on their own, GANs pits them against each other. The generative network’s goal is to create fake images good enough to fool the discriminative network. And the discriminative network’s goal is to avoid being fooled — to accurately guess which images are fakes and which really do belong with the original data.
Over and over, the generative network will make a new image, and the discriminitive network will evaluate it. They’ll then check their work to see who won each round. As they square off against each other for thousands and thousands of rounds, both networks use backpropagation to learn from their mistakes and successes. The generative network gets better at creating realistic fakes, and the discriminitive network gets better at spotting them. Over time, both networks improve by competing with each other. The technique draws on recent advances in parallel computing to train the networks quickly — NVIDIA and other GPU companies are big early adopters.
By the end of the training routine, the generative network has become very good at creating realistic fake images. Running on its own — after all the helpful improvements from its frenemies the discriminitive network— it’s now able to generate fakes which are often good enough to fool a human.
It’s a bit like a baseball player swinging a weighted bat before coming up to the plate, or a college student studying using harder questions than they expect to be on an exam. By training with a cunning adversary that gets better with every trial, the generative network is constantly upping its game. When the tough opponent is removed, and it’s tasked with fooling a run-of-the-mill human, the task is comparatively easy.
GANs Cause Trouble
Unsurprisingly, Artificial Intelligence systems that can create convincing fake humans are causing quite a stir, and far beyond the world of Deep Learning. In the media, GANs are a major threat to credibility. They naturally tie into Deep Fakes — where a neural network impersonates a real person, creating videos or pictures where they appear to be doing or saying something they never actually did or said.
Sure, people have always been able to Photoshop a celebrity or politician into an event they never attended, or show them shaking hands with someone they never actually met. But creating a real video where they appear to make a racist remark or say something that would inflame their own party — that’s a much harder capability and one that’s often aided by GANs. It’s an existential threat to the news media, where (fake news aside) credibility of content is absolutely key. How can you know if a hidden camera clip from a whistleblower is real, or an elaborate fake created by a GAN and intended to sabotage an opponent’s reputation?
And there are even darker, thornier problems. GANs enabled pornography has already appeared on the Internet, often built using the faces of real celebrities. The problem is likely to spread. Celebrities are an easy mark, since there are a lot of photos of them available on the Internet already, and public interest in their personal lives is already high. This makes it comparatively easy to find training data for a GAN, and also very lucrative to release a false video or photo. But as the technologies improve and the required size of training sets shrinks, hackers could potentially create fake X-rated clips of nearly anyone, using these in extortion or blackmail schemes.
Who Owns Fake People?
Beyond these existential threats and creepy risks, even GANs built for positive reasons raise some tricky legal questions. One central question is who actually owns the rights to the images a GAN creates.
US copyright law is pretty clear that a copyrighted work must have a human author. Challenges to this basic premise have failed in diverse and often sensational ways, from cases about the ownership of surveillance footage created by an automated camera to the infamous Monkey Selfie case, where PETA tried to claim that a crested macaque owned the rights to photos he took on a nature photographer’s camera.
If a work needs a human author to receive copyright protection, then does anyone own the rights to images produced by a GAN? After all, they’re not taken by a human wielding a camera, but rather by the end result of two battling computer programs. It’s a tough question, but thankfully there are precedents from other fields, which I’ll get into below.
And going beyond ownership, are there limits on what one can do with a GAN? Can you use it to impersonate anyone you’d like?
Humans Fight Back
Faced with the threat of GANs, many organizations and lawmakers are already fighting back. The Screen Actors’ Guild (SAG)— which represents the interests of actors and entertainers — is actively lobbying for regulation which prevents production companies from replacing living actors with GANs-enabled holograms.
This makes a lot of sense — if you can create a fake, photorealistic Brad Pitt and make him do your bidding, why bother hiring the real actor? The GANs-imagined version doesn’t need breaks, won’t forget his lines, and is unlikely to demand a seven-figure paycheck. When it comes to actors who have passed away, though, things get more complex. SAG would like to lock down the rights to create a synthetic celebrity forever, but this runs up against challenging First Amendment issues.
Several states, too, are already joining the fray. On October 3, 2019, California passed AB-602, which bans the use of GANs to create fake pornography or to produce fake clips of politicians within 60 days of an election. New York is considering legislation that would approach GANs by way of the right of publicity.
GANs for Good
Some amount of regulation for GANs is absolutely needed. Certainly, in the example of synthetic pornography or other exploitative content, it makes sense for lawmakers to step in and weed out bad actors
There are risks with over-regulating GANs as well, though. When used for good, Generative Adversarial Networks can be an incredibly powerful technology with the potential for a lot of powerful benefits.
Take, for example, the reason that GANs were originally created. The technology was originally developed not to produce convincing fake people for illustrative purposes, but rather to generate large data sets for training other Deep Learning systems.
In Machine Learning in general, finding good data is hard. Especially with novel networks and techniques, data scientists need lots of images to train a new visual AI system — sometimes a million images or more. Buying all those images is prohibitively expensive, especially for individual scientists or research groups.
GANs were created to solve this problem. With a GAN, a research scientist who is creating, for example, a new facial recognition system wouldn’t need to go out and purchase millions of images of human faces. Rather, they could train a GAN once, and then use it to generate as many fake face images as they needed, and train their new system on these.
GANs are still used primarily for this purpose. It’s not a perfect solution— a colleague at IBM described this process as similar to photocopying a photocopy rather than photocopying an original document, with the same kinds of distortions and quality losses— but it’s still an important tool in a data scientist’s belt. GANs are also crucial where limited real training data is available. They’ve been proposed, for example, as a partial solution to the lack of training images depicting the faces of people beyond Caucasian males, and thus as a way to increase the diversity of Deep Learning systems. You can experiment with your own fake people at ThisPersonDoesNotExist.com
Beyond Machine Learning, GANs have a wide variety of real-world applications. In stock photography and fashion, they can create believable portraits without the need to hire models or rent a location. This makes it far easier for a photographer or designer — especially someone just starting out — to realize a concept or show a new piece of clothing without a high upfront investment.
In other fields, GANs are used anywhere a visual pattern is present. They can model dark matter in astronomy, create 3D models of physical objects from a 2D photograph, create fake rooms and spaces for video games, show how a person is likely to age, and even generate ideas for new molecules or proteins in cancer research.
GANs Going Forward
In the future, GANs will become even more powerful. At the moment, in the visual sphere, they’re mostly limited to producing relatively bounded, highly patterned images where lots of training data are available. Faces are a perfect example — they vary between people but have many of the same basic attributes. And with 6 billion+ real people in the world, there are lots of faces out there for a GAN to learn from (again, assuming its creator has sufficient resources to buy up large data sets).
As technologies improve, though, these barriers will become less important. It’s widely believed that in the next 3–5 years, GANs will advance to the point where they can create entirely novel scenes from scratch, not just close-ups of faces. A designer could say, for example, “I’d like a shot of a female person walking down some stairs, holding the railing, and looking up and over her shoulder”, and a GAN could create that exact scene in photorealistic detail.
Tantalizing glimpses into this future already exist. A network called StackGAN can already to do this for single objects, generating a picture of a fake bird based on a textual description of its appearance. And another GAN can already generate a decent — if not perfect — street scene using a blocky map of cars, people, etc.
Clearly, as these technologies advance, they could replace huge portions of photography, movie-making, interior design, or any other field relying on visual media.
Should anyone in these industries immediately look for a new job, or risk being replaced by a GAN? Will all the visual arts be replaced by machines?
Back to the Future
Before we enter total freakout mode, it’s important to pause for a moment, and remember that a field already exists with many of the same attributes and end products as GANs.
In this field, someone dips into their memory of millions of people, places or objects they’ve seen. They draw on extensive training, sometimes from harsh or cunning critics. In some cases, they take in textual descriptions of a desired scene or concept, too. They then pick up some tools and use them to create a totally new image. The image may show a known person in a new circumstance, or it may show an imagined person in a scene that doesn’t actually exist. That field is called “illustration.”
Illustrators, animators, and CGI artists routinely do the same things as a GAN — it’s their whole job to imagine new scenarios, people, and places and bring them to life on the page or screen. In many cases, their creations are photorealistic — especially today, when CGI creations are often indistinguishable from real places or real actors.
Seen as a tool for illustration rather than some totally novel threat, GANs are a lot less scary. Sure, they make creating realistic illustrations much easier, and (depending on the skill of the illustrator or animator), more realistic. But in the end, they’re not doing anything new; they’re just applying Deep Learning to an old artistic concept — one that goes back as far as the first human who painted a buffalo on a cave wall.
Viewing GANs as a tool for illustration resolves many of the legal questions about the networks, too. Courts have consistently ruled that works created by CGI are eligible for copyright protection because they are produced through the creative choices of a human operator.
And CGI tools are not always deterministic, either — taking in directions from a human, they routinely “fill in” action between keyframes, or otherwise create novel sequences not directly modelled by the designer. No one would argue that an animator should lose copyright protection for her film because After Effects filled in some of the action between keyframes, or added textures and lighting to her wireframes. It’s her creative choices — and how they lead to the end product — that matter.
GANs exist in a similar space. They can imagine new scenes, but to generate useful output, they still need direction from humans as to what to create. Even when a GANs is randomly generating faces, it still takes a human’s input and direction to decide which ones are believable, which fit with a particular creative project, etc. Providing this direction is a creative exercise itself, and thus one which should enable copyrights. And GANs’ results are not perfect — they still need human help in many cases to yield usable results.
Take, for example, the photo at the top of this article. The woman looks realistic at first glimpse, but look closer. What’s up with her ear? Is that an earring? Some schmutz? The basic image looks pretty good, but it benefits from cleanup and tweaking by this (mildly) skilled human operator. All these tweaks and choices are creative acts that turn the GAN’s raw output into a usable and convincing illustration.