OpenAI Gym Tutorial

Tutorial on the basics of Open AI Gym
install gym : pip install openai
what we’ll do:
- Connect to an environment
- Play an episode with purely random actions
Purpose: Familiarize ourselves with the API

Import Gym

First things :
```
import gym
```

Get our environment

env = gym.make('CartPole-v0')

Full list: https://gym.openai.com/envs
has short blurbs(简介)
- what the task is
- whether it’s been solved
- leaderboard

Start an episode

# put ourselves in the start state
# it also return the state
env.reset()
# out: array([-0.04533, -0.032314, -0.0146921, 0.04151])

What is the state?

what do these numbers mean?
https://github.com/openai/gym/wiki/CartPole-v0
Documentation is somewhat sparse

In the console

Box = env.observation_space
# In : Box
# out : box(4,)
# in box.
# box.contains box.high, box.simple, box.to_jsonable, box.from_jasonable, box.low , box.shape

The observation_space defines the structure of the observations our environment will be returning. Learning agents usually need to know this before they start running, in order to set up the policy function. some general-purpose learning agents can bandle a wide range of overvation types: Discrete, box, or pixels(which is usually a Box(0,255,[height,width,3])for RGB pixels)

Action Space

env.action_space
# in : env.action_space
# out : Discrete(2)
# in : env.action_space
# env.action_space.contains, env.action_space.n, env.action_space.to_jsonable, env.action_space.form_jsonable, env.action_space.sample

play an episode

observation, reward, done, info = env.step(action)

Typically ignore info, since it can’t be used in submission(屈服)(although it’s possible it can help training)

Finish an episode

done = False
while not done:
  observation, reward, done, _ = env.step(env.action_space.sample())

will end quickly since random actions can’t keep the pole up for long

Exercise

determine how many steps, on average, are taken when actions are randomly sampled
can be a benchmark to compare later algorithms

# https://deeplearningcourses.com/c/deep-reinforcement-learning-in-python
# https://www.udemy.com/deep-reinforcement-learning-in-python
import gym
# Wiki:
# https://github.com/openai/gym/wiki/CartPole-v0
# Environment page:
# https://gym.openai.com/envs/CartPole-v0

# get the environment
env = gym.make('CartPole-v0')

# put yourself in the start state
# it also returns the state
env.reset()
# Out[50]: array([-0.04533731, -0.03231478, -0.01469216,  0.04151   ])

# what do the state variables mean?
# Num Observation Min Max
# 0 Cart Position -2.4  2.4
# 1 Cart Velocity -Inf  Inf
# 2 Pole Angle  ~ -41.8°  ~ 41.8°
# 3 Pole Velocity At Tip  -Inf  Inf

box = env.observation_space

# In [53]: box
# Out[53]: Box(4,)

# In [54]: box.
# box.contains       box.high           box.sample         box.to_jsonable
# box.from_jsonable  box.low            box.shape

env.action_space

# In [71]: env.action_space
# Out[71]: Discrete(2)

# In [72]: env.action_space.
# env.action_space.contains       env.action_space.n              env.action_space.to_jsonable
# env.action_space.from_jsonable  env.action_space.sample

# pick an action
action = env.action_space.sample()

# do an action
observation, reward, done, info = env.step(action)


# run through an episode
done = False
while not done:
  observation, reward, done, _ = env.step(env.action_space.sample())

print(box.contains) # <bound method Box.contains of Box(4,)>
num_states=10**env.observation_space.shape[0]
num_actions= env.action_space.n
import numpy as np
a=np.random.uniform(low=-1, high=1, size=(num_states, num_actions))
len(a) #10000
observation_examples = np.array([env.observation_space.sample() for x in range(10000)])
observation_examples.shape
#(10000, 4)

observation_examples

array([[-9.7157693e-01, -7.4095410e+37, -4.1871965e-01,  6.7507521e+37],
       [-3.7798457e+00, -2.5458868e+38,  2.8346911e-01,  3.4012554e+38],
       [ 2.6570508e+00, -3.3872902e+38, -4.0225080e-01,  4.4559389e+37],
       ...,
       [-2.4882526e+00,  2.3085303e+38, -4.1779616e-01, -1.0087459e+38],
       [-3.6637935e-01,  8.6948986e+37, -2.4446332e-01, -3.9084766e+37],
       [ 2.8921950e+00,  3.1869669e+38, -2.1086818e-02, -8.7358295e+37]],
      dtype=float32)

np.random.random((20000, 4))*2 - 1
a =np.random.random((20000, 4))*2 - 1
a.shape #(20000, 4)

array([[-0.24646925, -0.71345292, -0.13534508,  0.56783749],
       [ 0.94458905,  0.51363389,  0.16503288, -0.42810724],
       [ 0.09917716, -0.44244101, -0.46836587, -0.04689816],
       ...,
       [-0.59025553, -0.39560999,  0.35734313,  0.43326763],
       [ 0.03911455, -0.6548753 , -0.11969308,  0.91571025],
       [-0.10430864,  0.15491529, -0.73753709, -0.05311987]])

Reference:

Artificial Intelligence Reinforcement Learning

Advance AI : Deep-Reinforcement Learning

Cutting-Edge Deep-Reinforcement Learning

for Robot Artificial Inteligence

17. OpenAI Gym Tutorial

03 Oct 2019 | Reinforcement Learning

OpenAI Gym Tutorial

Import Gym

Get our environment

Start an episode

What is the state?

In the console

Action Space

play an episode

Finish an episode

Exercise

Comments

for Robot Artificial Inteligence

17. OpenAI Gym Tutorial

03 Oct 2019 | Reinforcement Learning (adsbygoogle = window.adsbygoogle || []).push({});

OpenAI Gym Tutorial

Import Gym

Get our environment

Start an episode

What is the state?

In the console

Action Space

play an episode

Finish an episode

Exercise

Comments

03 Oct 2019 | Reinforcement Learning