17. OpenAI Gym Tutorial
03 Oct 2019 | Reinforcement Learning
OpenAI Gym Tutorial
- Tutorial on the basics of Open AI Gym
- install gym : pip install openai
- what we’ll do:
- Connect to an environment
- Play an episode with purely random actions
- Purpose: Familiarize ourselves with the API
Import Gym
- First things :
import gym
Get our environment
env = gym.make('CartPole-v0')
- Full list: https://gym.openai.com/envs
- has short blurbs(简介)
- what the task is
- whether it’s been solved
- leaderboard
Start an episode
# put ourselves in the start state
# it also return the state
env.reset()
# out: array([-0.04533, -0.032314, -0.0146921, 0.04151])
What is the state?
- what do these numbers mean?
- https://github.com/openai/gym/wiki/CartPole-v0
- Documentation is somewhat sparse
In the console
Box = env.observation_space
# In : Box
# out : box(4,)
# in box.
# box.contains box.high, box.simple, box.to_jsonable, box.from_jasonable, box.low , box.shape
- The observation_space defines the structure of the observations our environment will be returning. Learning agents usually need to know this before they start running, in order to set up the policy function. some general-purpose learning agents can bandle a wide range of overvation types: Discrete, box, or pixels(which is usually a Box(0,255,[height,width,3])for RGB pixels)
Action Space
env.action_space
# in : env.action_space
# out : Discrete(2)
# in : env.action_space
# env.action_space.contains, env.action_space.n, env.action_space.to_jsonable, env.action_space.form_jsonable, env.action_space.sample
play an episode
observation, reward, done, info = env.step(action)
- Typically ignore info, since it can’t be used in submission(屈服)(although it’s possible it can help training)
Finish an episode
done = False
while not done:
observation, reward, done, _ = env.step(env.action_space.sample())
- will end quickly since random actions can’t keep the pole up for long
Exercise
- determine how many steps, on average, are taken when actions are randomly sampled
- can be a benchmark to compare later algorithms
# https://deeplearningcourses.com/c/deep-reinforcement-learning-in-python
# https://www.udemy.com/deep-reinforcement-learning-in-python
import gym
# Wiki:
# https://github.com/openai/gym/wiki/CartPole-v0
# Environment page:
# https://gym.openai.com/envs/CartPole-v0
# get the environment
env = gym.make('CartPole-v0')
# put yourself in the start state
# it also returns the state
env.reset()
# Out[50]: array([-0.04533731, -0.03231478, -0.01469216, 0.04151 ])
# what do the state variables mean?
# Num Observation Min Max
# 0 Cart Position -2.4 2.4
# 1 Cart Velocity -Inf Inf
# 2 Pole Angle ~ -41.8° ~ 41.8°
# 3 Pole Velocity At Tip -Inf Inf
box = env.observation_space
# In [53]: box
# Out[53]: Box(4,)
# In [54]: box.
# box.contains box.high box.sample box.to_jsonable
# box.from_jsonable box.low box.shape
env.action_space
# In [71]: env.action_space
# Out[71]: Discrete(2)
# In [72]: env.action_space.
# env.action_space.contains env.action_space.n env.action_space.to_jsonable
# env.action_space.from_jsonable env.action_space.sample
# pick an action
action = env.action_space.sample()
# do an action
observation, reward, done, info = env.step(action)
# run through an episode
done = False
while not done:
observation, reward, done, _ = env.step(env.action_space.sample())
print(box.contains) # <bound method Box.contains of Box(4,)>
num_states=10**env.observation_space.shape[0]
num_actions= env.action_space.n
import numpy as np
a=np.random.uniform(low=-1, high=1, size=(num_states, num_actions))
len(a) #10000
observation_examples = np.array([env.observation_space.sample() for x in range(10000)])
observation_examples.shape
#(10000, 4)
observation_examples
array([[-9.7157693e-01, -7.4095410e+37, -4.1871965e-01, 6.7507521e+37],
[-3.7798457e+00, -2.5458868e+38, 2.8346911e-01, 3.4012554e+38],
[ 2.6570508e+00, -3.3872902e+38, -4.0225080e-01, 4.4559389e+37],
...,
[-2.4882526e+00, 2.3085303e+38, -4.1779616e-01, -1.0087459e+38],
[-3.6637935e-01, 8.6948986e+37, -2.4446332e-01, -3.9084766e+37],
[ 2.8921950e+00, 3.1869669e+38, -2.1086818e-02, -8.7358295e+37]],
dtype=float32)
np.random.random((20000, 4))*2 - 1
a =np.random.random((20000, 4))*2 - 1
a.shape #(20000, 4)
array([[-0.24646925, -0.71345292, -0.13534508, 0.56783749],
[ 0.94458905, 0.51363389, 0.16503288, -0.42810724],
[ 0.09917716, -0.44244101, -0.46836587, -0.04689816],
...,
[-0.59025553, -0.39560999, 0.35734313, 0.43326763],
[ 0.03911455, -0.6548753 , -0.11969308, 0.91571025],
[-0.10430864, 0.15491529, -0.73753709, -0.05311987]])
Reference:
Artificial Intelligence Reinforcement Learning
OpenAI Gym Tutorial
- Tutorial on the basics of Open AI Gym
- install gym : pip install openai
- what we’ll do:
- Connect to an environment
- Play an episode with purely random actions
- Purpose: Familiarize ourselves with the API
Import Gym
- First things :
import gym
Get our environment
env = gym.make('CartPole-v0')
- Full list: https://gym.openai.com/envs
- has short blurbs(简介)
- what the task is
- whether it’s been solved
- leaderboard
Start an episode
# put ourselves in the start state
# it also return the state
env.reset()
# out: array([-0.04533, -0.032314, -0.0146921, 0.04151])
What is the state?
- what do these numbers mean?
- https://github.com/openai/gym/wiki/CartPole-v0
- Documentation is somewhat sparse
In the console
Box = env.observation_space
# In : Box
# out : box(4,)
# in box.
# box.contains box.high, box.simple, box.to_jsonable, box.from_jasonable, box.low , box.shape
- The observation_space defines the structure of the observations our environment will be returning. Learning agents usually need to know this before they start running, in order to set up the policy function. some general-purpose learning agents can bandle a wide range of overvation types: Discrete, box, or pixels(which is usually a Box(0,255,[height,width,3])for RGB pixels)
Action Space
env.action_space
# in : env.action_space
# out : Discrete(2)
# in : env.action_space
# env.action_space.contains, env.action_space.n, env.action_space.to_jsonable, env.action_space.form_jsonable, env.action_space.sample
play an episode
observation, reward, done, info = env.step(action)
- Typically ignore info, since it can’t be used in submission(屈服)(although it’s possible it can help training)
Finish an episode
done = False
while not done:
observation, reward, done, _ = env.step(env.action_space.sample())
- will end quickly since random actions can’t keep the pole up for long
Exercise
- determine how many steps, on average, are taken when actions are randomly sampled
- can be a benchmark to compare later algorithms
# https://deeplearningcourses.com/c/deep-reinforcement-learning-in-python
# https://www.udemy.com/deep-reinforcement-learning-in-python
import gym
# Wiki:
# https://github.com/openai/gym/wiki/CartPole-v0
# Environment page:
# https://gym.openai.com/envs/CartPole-v0
# get the environment
env = gym.make('CartPole-v0')
# put yourself in the start state
# it also returns the state
env.reset()
# Out[50]: array([-0.04533731, -0.03231478, -0.01469216, 0.04151 ])
# what do the state variables mean?
# Num Observation Min Max
# 0 Cart Position -2.4 2.4
# 1 Cart Velocity -Inf Inf
# 2 Pole Angle ~ -41.8° ~ 41.8°
# 3 Pole Velocity At Tip -Inf Inf
box = env.observation_space
# In [53]: box
# Out[53]: Box(4,)
# In [54]: box.
# box.contains box.high box.sample box.to_jsonable
# box.from_jsonable box.low box.shape
env.action_space
# In [71]: env.action_space
# Out[71]: Discrete(2)
# In [72]: env.action_space.
# env.action_space.contains env.action_space.n env.action_space.to_jsonable
# env.action_space.from_jsonable env.action_space.sample
# pick an action
action = env.action_space.sample()
# do an action
observation, reward, done, info = env.step(action)
# run through an episode
done = False
while not done:
observation, reward, done, _ = env.step(env.action_space.sample())
print(box.contains) # <bound method Box.contains of Box(4,)>
num_states=10**env.observation_space.shape[0]
num_actions= env.action_space.n
import numpy as np
a=np.random.uniform(low=-1, high=1, size=(num_states, num_actions))
len(a) #10000
observation_examples = np.array([env.observation_space.sample() for x in range(10000)])
observation_examples.shape
#(10000, 4)
observation_examples
array([[-9.7157693e-01, -7.4095410e+37, -4.1871965e-01, 6.7507521e+37],
[-3.7798457e+00, -2.5458868e+38, 2.8346911e-01, 3.4012554e+38],
[ 2.6570508e+00, -3.3872902e+38, -4.0225080e-01, 4.4559389e+37],
...,
[-2.4882526e+00, 2.3085303e+38, -4.1779616e-01, -1.0087459e+38],
[-3.6637935e-01, 8.6948986e+37, -2.4446332e-01, -3.9084766e+37],
[ 2.8921950e+00, 3.1869669e+38, -2.1086818e-02, -8.7358295e+37]],
dtype=float32)
np.random.random((20000, 4))*2 - 1
a =np.random.random((20000, 4))*2 - 1
a.shape #(20000, 4)
array([[-0.24646925, -0.71345292, -0.13534508, 0.56783749],
[ 0.94458905, 0.51363389, 0.16503288, -0.42810724],
[ 0.09917716, -0.44244101, -0.46836587, -0.04689816],
...,
[-0.59025553, -0.39560999, 0.35734313, 0.43326763],
[ 0.03911455, -0.6548753 , -0.11969308, 0.91571025],
[-0.10430864, 0.15491529, -0.73753709, -0.05311987]])
Reference:
Artificial Intelligence Reinforcement Learning
Comments