Flu + teaching AI to play games

April 20, 2016

Flu and strange Ideas

I have a cold/flu thing at the moment and feel rotten, due to interaction with my general health when I get even a mild cold or flu I get pain everywhere and due to the levels of pain killer I take normally, I just have to grin and bare it. The way I tend to cope is to keep my mind occupied to try and not think about it. Strangely this is often a creative time in terms of random thoughts, I guess the body pushes more natural drugs into me to try and counteract the pain leading to me being a bit 'drugged'.

Thinking about teaching (not training) AIs

Last night I was deep in thought about Deep AIs, as you may have noticed from my recent blog posts is something I'm really enjoying and TBH can see myself working in the field. Before the fame of the recent AlphaGo wins, Deepmind were tackling other simpler games from the Atari 2600 machine. The paper "Human-level control through deep reinforcement learning" successfully learnt to play a number of games on the Atari 2600 via unsupervised iterations of playing. Essentially it learnt from nothing how the controls worked and what that did in terms of the game.

Playing Breakout for example, it gradually learns that the left button moves the little graphics at the bottom of the screen to the left. Every now and again it would randomly be under the ball, which bounced back and hit the blocks which gives you some points. The AI uses the game score as its learning control, always aiming to improve its score.

Which if sounds like how humans learn to play games without instructions, is exactly the point!

This was published in the prestigious journal "Nature" last year and is generally consider a major step in 'General AI', that is AIs that learn things by observation and trail and error. Which is one of the major parts of mammalian intelligence like our own. This however has limitations on the complexity of games it can learn directly as it rely on initial random chance and short term goal rewards. Its a building block rather than the end of the line.

For example if you take a more complex game, such as a graphic adventure like 'The Secret of Monkey Island' which uses a mouse 'point and click' control system, its unlikely to do well as the odds of moving a mouse randomly and clicking something that increases your score is tiny. Similarly for a hypothetical Minecraft AI, it would probably learn to explore and avoid/fight enemies but actually crafting is unlikely (how likely is it that it learns to punch trees, make planks, make a crafting bench and then makes a pickaxe?). The probability of a random string of actions providing a direct score increase limits the complexity of the system as is.
AlphaGo is much more advanced, and included playing itself to teach itself now strategies.

It dawned on me that a simply modification of the Q learning AI for Atari 2600 games might improve learning rates and extend the complexity of games it could play. Its a technique nature uses, so is likely a good strategy, that is the idea of 'teaching'.
This is fundamentally different from supervised learning, even though a first glance they seem the same.

Supplement Training with Teaching

Teaching is a form of showing the inputs to the AI versus supervised learning which shows the input and the desired output. Just like in nature, many mammalian brains are shown how to do things whilst young. A litter of cats will learn to hunt via play supervised by their parents, this increases the kittens chance of surviving and breeding even though it puts more stress on the parents, who have to feed and look after them until they are trained enough to go it alone.
It can be thought of a form of indirect neural copying, the parent molds their young's neural network towards known good strategies, however whether the kitten takes on that advise as is, or modifies or even rejects is purely the result of its neural network seeing the benefit. This has several evolutionary advantages,

As they will be indirectly replaying rather than direct copying, new mutations and modification will occur as the child incorporates its own input stimuli.
Obsolete knowledge will die out over generations, as it won't have the reward that it had for their parents.
Young short cut basic training by learning from there parent, which should allow quicker adaption when a rare useful event occurs.

In direct terms of the game playing AI, it postulates that we can improve the adaption rate and/or ability to perform complex actions but showing them how to play the game.

In the Breakout case, if for the first N games, we in the role of parental human take over the controls (but fully observed by the AI) and move the bat back and forward, hitting the ball a few times we should short cut the generations the AI spent just learning how to control the bat and what actually the point of the game is.

I'm sure i'm not the only person to have thought of the idea, but so far haven't found any published papers on it (please let me know if you know of some). Its similar to genetic algorithms in terms of generational transfer, but in a fundamentally different ways. Rather than passing down genes, this is passing down successful strategies to the young.
Next step is to hack together a small modification of the Atari 2600 Q learning code to let me take over the controls for some number of initial games.

If that trends show the idea has merit, adding a more formalised teaching path (recording inputs perhaps) may allow us to explore more complex games, for example if we show enough Minecraft sessions where you make a pickaxe its possible it might learn that and so move it into a completely new phase.

More abstractly, it also brings in a social element to AIs that so far hasn't existed. In this model AIs have teachers, likely a form of parental figure that guides them until some point in there life...

Search This Blog

DeanoC thoughts and musings