What is Reinforcement Learning?

There are 3 big part in AI

Supervised Learning (i.e. Spam Detection, Image Classification)
Unsupervised Learning(i.e Topic Modeling Web Pages, Clustering Genetic Sequences)
Reinforcement Learning(i.e Tic-Tac-Toe,Go,Chess, Walking, Super mario, Doom, Starcraft)

1. Supervised/Unsupervised Interfaces

as we know about ML Theory and Function

class SupervisedModel:
def fit(X,y):
def predict(x):

class UnsupervisedModel:
def fit(x):
def transform(x): #(e.g cluster assignment)

X can represent the state i’m in, Y represent the Target (ideal action to perform in that state)
- state = sensor recording from self-driving car
- state = Video Game ScreenShot
- state = Chess Board Positions
Yes it is, but for example, consider GO: $ N = 8 * 10^100 $
ImageNet, the image Classification benchmark, has $ N=10^6 $ images
- Go is 94 orders of magnitude larger
- Takes ~1 day with good hardware
1 order of magnitude Larger -> 10 days
2 order of magnitude Larger -> 100 days

sometimes you’ll see reference to psychology; RL has been used to model animal behavior
RL agent’s goal is in the future
- in contrast, a supervised model simply tried to get good accuracy / minimized cost on current input
Feedback Signals(Rewards) come from the environment (i.e the agent experiences them)

you might think of supervised Targets/labels as something like rewards. but these handmade labels are coded by humans - they do not come from environment
Supervised inputs/targets are just database tables
Supervised models instantly know if it is wrong/right, because inputs + targets are provided simultaneously
RL is dynamic - if an agent solves a maze, it only know its decisions were correct if it eventually solves the maze

Goal of AlphaGO is to win Go, and the goal of a video game agent is high score/live as long as possible
what is the goal of an animal/human?
Evolutionary psychologists believe in the “selfish Gene” Theory
- Rechard Dawkins - The Selfish Gene
Genes simply want to make more of themselves
We humans(conscious living beings) are totally unaware of this
we can’t ask our genes how they feel
we are simply a vessel for our genes’ proliferation(급증)
is consciousness just an illusion?
Disconnect between what we think we want vs “true goal”
Like Alphago, we’ve found roundabout and unlikely ways of achieving our Goal
The action taken doesn’t necessarily have to have an obvious / explicit relationship to the Goal
we might desire riches/money -but why? Maybe natural selection or leads to better health and social status. there are no laws physics which govern riches and gene replication
it’s a novel solution to the problem
AI can also find such strange or unusual ways to achieve a goal
we can replace “getting rich” with any trait(특성) we want
- being healthy and strong
- Having strong analytical skills
That’s a sociologist(사회학자)’s job
Our interest lies in the fact that there are multiple novel strangies of achieving the same goal(gene replication)
What is considered a good strategy can fluctuate
Ex. Sugar:
- our brain runs on sugar, it gives us energy
- today, it causes disease and death
Thus, a Strategy that seems good right now may not be globally optimal

Animals gain new traits via evolution/mutation/natural selection
- this is slow
- each newborn, even given an advantageous trait, still must learn from scratch
AI can train via simulation
- it can spawn new offspring instantly
- obtain hundreds / thousands of years of experience in the blink of an eye
Reference:

Artificial Intelligence Reinforcement Learning

Advance AI : Deep-Reinforcement Learning

Cutting-Edge Deep-Reinforcement Learning