for Robot Artificial Inteligence

2. Defining Reinforcement Learning

|

What is Reinforcement Learning?

  • Humans/ AIs alike never sense the entire world/universe at once
  • We have sensors which feed signals to our brain from the environment
  • We don’t even know everything that’s going on in a room
  • Thus the Sensors limit the amount of information we get
  • The measurements we get from these sensors(e.g. sight, sound touch) make up a “state”
  • we’ll only discuss finite state spaces
  • state spaces with an infinite number of states are possible too

  • what’s the # of states?
    • if we simplify the problem so that we can keep adding x’s and o’s even after a player gets 3 in a row
    • Each Location on the board has 3 possibilities: empty, X, O
    • 9 Locations on the Board
    • therefore, # states = 3 x 3 … x3 = $ 3^9 $

Recap so far

3 Important Terms

  1. Agent: thing that sense the environment, thing we’re trying to code intelligence/learning into
  2. Environment: Real World or simulated world that the agent lives in
  3. State: Different Configurations of the environment that the agent can sense
  4. Reward
    • this is what differentiates RL from other types of ML
    • An agent not only tries to maximize its immediate reward, but future rewards as well
    • RL algorithms will find novel ways of accomplishing this
    • Alphago: learning unique/unpredictable strategies that led to beating a world champion
    • Not intuitive to humans, But Rl can figure it out

Unintended consequence(의도치 않은 결과)

  • possible danger of RL: Unintended consequence
  • Commonly repeated ideaL AI could wipe out humanity if it decides that’s the best thing for us(Ex. Minimize human Deaths)
  • AI decided that since # humans grows exponentially, that more people will die in the future, then best to destroy everyone now to minimize dying in the future
  • Lower level example : robot trying to solve a maze
  • Reasonable goal: solve the maze
  • Reward = 1 if solved, reward = 0 if not solved
  • Possible solution: move randomly until maze is solved
  • is that a good strategy? No!
  • we never told the AI that it needs to solve the maze efficiently(we always get the reward in the end)
  • what about this: reward of -1 for every step taken
  • in order to maximize total reward, must minimize # steps
  • Note: reward is always a real number

Terms

So far: agent, environment, state, reward

Next : actions

Actions are what an agent does in its environment. (Ex. agent = a 2-D video game character. Action = {up, down, left, right, jump}). we look at finite sets of actions only.

Sar Triples

We often think about (state, action, reward) as a Triple Notation : (S,A,R)

Timing

  • Timing is Important in RL
  • Every game is a Sequence of states, actions, rewards.
  • Convention(관습): Start in state S(t), take action A(t), receive a reward of R(t+1)
  • Reward always result from (s,a) we took at previous time
  • S(t), A(t) also brings us to a new state, S(t+1)
  • this also makes a triple: [S(t), A(t), S(t+1)]
  • Also Denoted as : (s,a,s’)

Summary

  • Program the agent to be intelligent
  • Agent interacts with its environment by being in a state, taking action based on that state, which brings it to a new state
  • Environment gives the agent a reward, can be +ve or -ve(but must be real number)
  • Reward is received in next state

Reference:

Artificial Intelligence Reinforcement Learning

Advance AI : Deep-Reinforcement Learning

Cutting-Edge Deep-Reinforcement Learning

Comments