What AlphaGo Can Teach Us About How People Learn

David Silver of DeepMind, who helped create the program that defeated a Go champion, thinks rewards are central to how machines—and humans—acquire knowledge.

David Silver is responsible for several eye-catching demonstrations of artificial intelligence in recent years, working on advances that helped revive interest in the field after the last great AI Winter.

At DeepMind, a subsidiary of Alphabet, Silver has led the development of techniques that let computers learn for themselves how to solve problems that once seemed intractable.

Most famously, this includes AlphaGo, a program revealed in 2017 that taught itself to play the ancient board game Go to a grandmaster level. Go is too subtle and instinctive to be tamed using conventional programming, but AlphaGo learned to play through practice and positive reward—an AI technique known as “reinforcement learning.”

In 2018, Silver and colleagues developed a more general version of the program, called AlphaZero, capable of learning to play expert chess and shogi as well as Go. Then, in November 2019, DeepMind released details of MuZero, a version that learns to play these and other games—but crucially without needing to know the rules beforehand.

Silver met with senior writer Will Knight over Zoom from London to discuss MuZero, reinforcement learning, and the secret to making further progress in AI. This transcript has been edited for length and clarity.

WIRED: Your MuZero work is published in the journal Nature today. For the uninitiated, tell us why it is important.

David Silver: The big step forward with MuZero is we don’t tell it the dynamics of the environment; it has to figure that out for itself in a way that still lets it plan ahead and figure out what’s going to be the most effective strategy. We want to have algorithms that work in the real world, and the real world is complicated and messy and unknown. So you can’t just look ahead, like in a game of chess. You, you have to learn how the world works.

Some observers point out that MuZero, AlphaGo, and AlphaZero don’t really start from scratch. They use algorithms crafted by clever humans to learn how to perform a particular task. Does this miss the point?

I think it does, actually. You never truly have a blank slate. There’s even a theorem in machine learning—the no-free-lunch theorem—that says you have to start with something or you don’t get anywhere. But in this case, the slate is as blank as it gets. We’re providing it with a neural network, and the neural network has to figure out for itself, just from the feedback of the wins and losses in games or the score, how to understand the world.

One thing people picked up on is that we tell MuZero the legal moves in each situation. But if you take reinforcement learning, which is all about trying to solve problems in situations where the world is unknown, it’s normally assumed that you’re told what you can do. You have to tell the agent what choices it has available, and then it takes one of them.

You might critique what we’ve done with it so far. The real world is massively complex, and we haven’t built something which is like a human brain that can adapt to all these things. So that’s a fair critique. But I think MuZero really is discovering for itself how to build a model and understand it just from first principles.

DeepMind recently announced that it had used the technology behind AlphaZero to solve an important practical problem—predicting the shape that a protein will fold into. Where do you think MuZero will have its first big impact?

We are, of course, looking at ways to apply MuZero to real world problems, and there are some encouraging initial results. To give a concrete example, traffic on the internet is dominated by video, and a big open problem is how to compress those videos as efficiently as possible. You can think of this as a reinforcement learning problem because there are these very complicated programs that compress the video, but what you see next is unknown. But when you plug something like MuZero into it, our initial results look very promising in terms of saving significant amounts of data, maybe something like 5 percent of the bits that are used in compressing a video.

Longer term, where do you think reinforcement learning will have the biggest impact?

I think of a system that can help you as a user achieve your goals as effectively as possible. A really powerful system that sees all the things that you see, that has all the same senses that you have, which is able to help you achieve your goals in your life. I think that is a really important one. Another transformative one, looking long term, is something which could provide a personalized health care solution. There are privacy and ethical issues that have to be addressed, but it will have huge transformative value; it will change the face of medicine and people’s quality of life.

Is there anything you think machines will learn to do within your lifetime?

I don’t want to put a timescale on it, but I would say that everything that a human can achieve, I ultimately think that a machine can. The brain is a computational process, I don’t think there’s any magic going on there.

Can we reach the point where we can understand and implement algorithms as effective and powerful as the human brain? Well, I don’t know what the timescale is. But I think that the journey is exciting. And we should be aiming to achieve that. The first step in taking that journey is to try to understand what it even means to achieve intelligence? What problem are we trying to solve in solving intelligence?

Beyond practical uses, are you confident that you can go from mastering games like chess and Atari to real intelligence? What makes you think that reinforcement learning will lead to machines with common sense understanding?

There’s a hypothesis, we call it the reward-is-enough hypothesis, which says that the essential process of intelligence could be as simple as a system seeking to maximize its reward, and that process of trying to achieve a goal and trying to maximize reward is enough to give rise to all the attributes of intelligence that we see in natural intelligence. It’s a hypothesis, we don’t know whether it is true, but it kind of gives a direction to research.

If we take common sense specifically, the reward-is-enough hypothesis says well, if common sense is useful to a system, that means it should actually help it to better achieve its goals.

It sounds like you think that your area of expertise—reinforcement learning—is in some sense fundamental to understanding, or “solving,” intelligence. Is that right?

I really see it as very essential. I think the big question is, is it true? Because it certainly flies in the face of how a lot of people view AI, which is that there’s this incredibly complex collection of mechanisms involved in intelligence, and each one of them has its own kind of problem that it’s solving or its own special way of working, or maybe there’s not even any clear problem definition at all for something like common sense. This theory says, no, actually there may be this one very clear and simple way to think about all of intelligence, which is that it’s a goal-optimizing system, and that if we find the way to optimize goals really, really well, then all of these other things will will will emerge from that process.

Reinforcement learning has been around for decades, but for a while it seemed like a dead end. One of your old advisers in fact told me that she tried to dissuade you from working on it. Why did you ignore her and keep going?

Many people view reinforcement learning as one of many hammers that you could apply to solve the many problems that we need to solve in AI. I don’t view it that way. I view reinforcement learning as the whole thing. If we want to try and describe intelligence as best as possible, I think reinforcement learning essentially characterizes what we really mean by intelligence. And once you start to see it that way, it’s like, how can I not work on this? If this really is the thing that is closest to what we mean by intelligence—if we solve it, we will crack that.

If you look at the work I’ve done, I’ve consistently tried to focus on that problem. When tackling things like Go, in solving it, we learn about what intelligence means in the process. You can think of reinforcement learning as the ability that enables an agent to acquire all other abilities—all the other pieces of intelligence that it needs . You see a little bit of that in something like AlphaGo, where all we asked it to do was to win games, and yet it learned all these things—endgames and openings—that people used to have specialized subsystems for.

Is there pressure at DeepMind to do another big demonstration, something like AlphaGo? Do you feel that at all?

That’s a great question. I feel that we’re in a really privileged position in the sense that we are secure in our positions, in our funding, all of these things are very, very secure.

The only pressure for trying to build a new, big demonstration is the drive to make progress towards general intelligence. It’s a real privilege that you don’t have when you’re either in a startup and trying to secure your funding, or in academia, where you’re trying to secure your grants and so forth.

Powerful AI systems now require enormous amounts of computer power to work. Are you worried that this will hold progress back?

To bring this back to MuZero, it is an example of an algorithm that scales very well and gracefully with computation. We ran an experiment in Atari, where we showed that even using a very modest amount of compute—roughly equivalent to one GPU for a couple of weeks—it works really, really well, and you get performance that far exceeds a human.

There are some figures that suggest if you add up all the compute power that you can leverage right now we’re reaching something comparable to the human brain. So it’s probably more us needing to come up with smarter algorithms.

But the beauty of MuZero is that because it’s building its own model, it’s starting to understand how the world works—to imagine things. And that imagination is a way that you can actually leverage computation to start to look ahead, imagine what might happen next.

Some military contractors are using reinforcement learning to build better weapons systems. How do you feel about that? Do you ever think that some of your work should not be published openly?

I oppose the use of AI in any deadly weapon, and I wish we had made more progress toward a ban on lethal autonomous weapons. DeepMind and its co-founders are signatories of the Lethal Autonomous Weapons Pledge, which outlines the company’s belief in the principle that offensive technology should always remain under appropriate human control.

However, we continue to believe that the appropriate publication of our methods is a cornerstone of science and that the development of general-purpose AI algorithms will lead to greater overall societal benefit across a raft of positive applications.

More Great WIRED Stories

Products You May Like

Articles You May Like

How Watermelon Cupcakes Kicked Off an Internal Storm at Meta
Pressure Grows in Congress to Treat Crypto Investigator Tigran Gambaryan, Jailed in Nigeria, as a Hostage
The EU Is Coming for X’s Paid Blue Checks
The Hidden Ties Between Google and Amazon’s Project Nimbus and Israel’s Military
Amazon Ramps Up Security to Head Off Project Nimbus Protests

Leave a Reply