Artificial Intelligence Explained: What Are Generative Adversarial Networks (GANs)?

Artificial Intelligence Explained: What Are Generative Adversarial Networks (GANs)?
Artificial Intelligence Explained: What Are Generative Adversarial Networks (GANs)?

The field of artificial intelligence (AI) is fast-moving, and new breakthroughs are regularly made. One of the more recent terms rising to prominence is Generative Adversarial Network (GAN) – but what does it mean?

The principle behind the GAN was first proposed in 2014, and at its most basic level, it describes a system that pits two AI systems (neural networks) against each other to improve the quality of their results.

To understand how they work, imagine a blind forger trying to create copies of paintings by great masters. To start with, he has no idea what a painting should look like – but he happens to have a friend who has a photographic memory of every masterpiece that’s ever been painted.

This friend – a detective – has to determine whether the paintings his friend is showing match the features of those created by the real great masters, or are obvious forgeries.

This is the basic idea of how a GAN operates – only as they are AIs, both the forger and his friend are able to act at super-speed, making and detecting thousands of forgeries per second. Both of them then “learn” from the outcome to improve their future performance. As the detective becomes better at detecting forgeries, the forger must become better at creating them.

GANs have been the cause of a lot of excitement within the field of AI development in recent years, due to their ability to create “new” information following rules established by existing information. An example might be writing instruction manuals. By training a GAN on thousands of instruction manuals, it could one day be possible to create a system that could look at any tool, device, or software and then create instructions on how to use it.

So, let’s look into how this works in a bit more depth. The “forger” network that creates fake data is termed the generative network, and its job is to read and understand the properties of the training data. It then attempts to replicate it by producing “candidate” datasets that follow the same rules.

The “detective” network tasked with determining whether the generative network is outputting false (artificially generated) data or real (training) data is known as the discriminative network. Because it competes against the generative network, the system as a whole is described as “adversarial.”

For a great working example of a GAN in action, look no further than the popular demonstration This Person Does Not Exist. The network powering the website has learned to produce ultra-realistic images of human faces that, while they follow all of the rules regarding the way a human face should look, do not exist outside of the computer program.

While you might at first assume that the program builds images of faces by putting together pieces from a database of eyes, ears, mouths, and hair, this isn’t the case. The “input” data for the generative network is simply a string of numbers – only the discriminative network sees the training data. The generative network improves its output based entirely on the output of the discriminative network.

As the only feedback the discriminative network gives is yes/no “guesses” at whether the generative output matches the training data, it takes many, many attempts before it starts to produce output that is acceptably close to the desired outputs – in this case, a realistic-looking image of a non-existent person.

(This example actually uses an updated model of the GAN known as proGAN which was developed by Nvidia last year, and works by gradually increasing the resolution of the image that the network generates, starting with a very low-resolution 4 pixel by 4 pixel image.)

The data used for training an adversarial network does not have to be labeled, as the discriminative network can make judgments on the output of the generative network based entirely on features of the training data itself. This means GANs have applications in unsupervised learning as well as supervised (where the data is labeled) and reinforcement learning.

Another useful feature of GANs is that they can be used to efficiently create training datasets for other AI applications. Most current AI techniques, in particular, Deep Learning, rely on access to large amounts of data for training purposes.

GANs can generate datasets that follow all of the rules of “natural” datasets and so, in theory, can be used for training deep learning models. A great example of where this would be useful might be medical images, which can be expensive and time-consuming to collect for real – requiring both patient consent and medical expertise to label them.

GANs can be used for creating images, moving video images, text, and even music. While it’s clear that there is a lot of hype surrounding the concept at the moment, it is clearly one of the most interesting new concepts to appear from the AI field in recent years, and we can expect to see many exciting new applications based on it in the near future.

originally posted on Forbes.com by Bernard Marr