for Robot Artificial Inteligence

25. Custom Exceptions

|

Example 1 (Basic)

#include <iostream>
#include <string>
using namespace std;

void mightGoWrong() {

	bool error1 = false;
	bool error2 = true;

	if(error1) {
		throw "Something went wrong.";
	}

	if(error2) {
		//throw string("Something else went wrong.");
		throw int ("4");
	}

}

void usesMightGoWrong() {
	mightGoWrong();
}



int main() {

	try {
		usesMightGoWrong();
	}
	catch(int e) { // the throwing the value catched and apply to "int e"
		cout << "Error code: " << e << endl;
	}
	catch(char const * e) { // character, so it save one by one alpabet in one matrix place
		cout << "Error message: " << e << endl;
	}
	catch(string &e) { // c++ does not gracefully handle situation where there are two exceptions active at the same time in string.
	    //it imeediatley terminates the program without calling any destructor
		cout << "string error message: " << e << endl;
	}

	cout << "Still running" << endl;

	return 0;
}

Result

Example 2(Custom Exception)

#include <iostream>
#include <exception>

using namespace std;

class MyException: public exception {
    //public exepction, is that inherit exception class
public:
	virtual const char* what() const throw(){
	//can not throw the exception.
	//it is the optimize the way
	//it halt the time.
		return "Something bad happened!";
	}
};

class Test {
public:
	void goesWrong() {
		throw MyException();
	}
};

int main() {

	Test test;

	try {
		test.goesWrong();
	}
	catch(MyException &e) {
		cout << e.what() << endl;
	}


	return 0;
}

Result

Example 3(standard Exception)

  • bad_alloic is hesitate to iostream of exception.h
#include <iostream>
using namespace std;

class CanGoWrong {
public:
	CanGoWrong() {
		char *pMemory = new char[91231312312312999999999999999]; // this is the memory to handle
		// new char[999] number of bite
		delete[] pMemory;
	}
};

int main() {

	try {
		CanGoWrong wrong;
	}
	catch(bad_alloc &e) {
	    //bad_alloc is the catching up the error
	    //bad_alloc is name of class is to pick a convetnion and stick to it
		cout << "Caught exception: " << e.what() << endl;
	}

	cout << "Still running" << endl;

	return 0;
}

Example 4(Catching subclass Execptions)

#include <iostream>
#include <exception>
using namespace std;

void goeswrong (){
    bool error1Detected = true;
    bool error2Detected = false;

    if(error1Detected){
        throw bad_alloc();
    }
    if(error2Detected){
        throw exception();
    }

}
int main(){
    try{
    goeswrong();
    }
    catch(exception &e){
        cout<<e.what()<<endl;
        // what is the state of ".e"
        // like saying yeah that is std::exception.
    }

    catch(bad_alloc &e){
        cout<<e.what()<<endl;
    }


    return 0;

}

Comment  Read more

7. The Value function

|

The Value function

1. Assigning Rewards

  • the programmers are like “coaches(教练)” to the AI
  • Pet Owner is a good analogy
  • We define how to give rewards to agent (Ex. Give same reward no matter what agent does - agent will always just behave randomly)(Ex2. Give our dog a treat when it behaves badly - encourages bad behavior)

2. Maze Example

  • Robot Trying to solve maze
  • Reward of 1 for finding the maze exit, else 0
  • But robot unlikely to solve the maze with this structure
  • if robot only sees reward is the max reward if all we see is 0
  • better: Every step yields reward of -1
  • Now robot is encouraged so solve maze as quickly as possible

3. Something like

  • be careful not to build our own prior knowledge into the AI ( Ex. Chess)
  • Agent should be rewarded for winning, not taking opponent’s pieces
  • No reward for implementing strategy us read about in chess book
  • Free the agent to find its own solution
  • OK to lose all but one piece and then win
  • tell the agent what we want it to achieve, not how we want it to be achieved

4. Planning

  • Scenario: we are thinking about studying for tomorrow’s exam. we would rather hang out with friends.
    • hangout with friends -> dopamine hit -> happy
    • Study -> feel tired and bored
    • why study?
  • we don’t think about immediate rewards, but future rewards too. we want to assign some value to the current state that reflects the future too. call this the “VALUE FUNCTION”

5. Credit Assignment Problem

  • we receive a reward - getting hired for our dream job
  • what previous action led to this success
  • that called the “Credit Assignment Problem”
  • Ask the Question: “What did I do in the past that led to the reward I’m receiving now “
  • what action gets the credit

6. Attribution

  • Related to online advertising concept of attribution
  • if we show the user the same ad 10 times before they buy, which ad gets the credit?
  • In RL we don’t just assign ad-hoc(点对点) like this

7. Delayed Rewards

  • Delayed Rewards: another way of thinking of the same thing
  • Credit assignment: Present -> Past
  • Delayed from the other direction: Present -> Future
  • Related to field known as “planning”
  • the idea of delayed rewards tell us that an AI needs to have the ability of foresight or planning, planning is actually a field of study that crosses over with reinforcement learning

8. Scenario

  • 2 possible next states from A: B or C
  • 50% probability of ending up in either
  • Reasonable value of A?
  • Value(A) = 0.5x1 + 0.5x0 = 0.5

  • 1 Possible next state form A:B
  • 100% Probability of ending up in B
  • Reasonable value for A ?
  • Value(A) = 1 x 1 = 1
  • Value tells us the “Future goodness” of a state

9. Value Function

  • V(s) - The Value(Taking into Account the probability of all possible future rewards) of a state
  • Value is a measure of possible future rewards we may get from being in this state
  • reward is immediate(Ex. Jumping on a Goomba will immediately increase our score)(Ex2. Standing in front of a Goomba will not increase our score, but will put us in a position to jump in the next few states)
  • Estimating the value function is a central task in RL
  • Not RL algorithms require it(Ex, Evolutionary algorithm that mutates & spawns offspring, only those who survive the longest make it to next generation)
  • by pure evolution + natural selection we can breed better and better agents
  • but not the type of algorithm we are interested in for RL most of the time

10. Efficiency

  • the Value function is a fast & efficient way of searching the game tree
  • time consuming to enumerate every possible state transition and their probabilities of occurring.
  • tic-tac-toe: 3^(3*3) = 19683
  • Connect 4: 3^(4*4) = 43Millon
  • Exponential Growth is never good
  • Curse of dimensionality
  • V(S) gives answer instantly O(1)

11. Finding V(s)

  • V(s) = E[all future rewards S(t) = s]
  • E[x] = average of X
  • this is generic algorithm here
  • so we need to introduce some constraints here
  • iterative algorithm
  • initialize V(s):
    • V(s) = 1 if s = winning rate
    • V(s) = 0 if s = lose or draw
    • V(s) = 0.5 otherwise
  • when we study “real” algorithm, we won’t need such careful initialization
  • V(s) can be interpreted(说明) as probability of winning after arriving in s( for this game only )(S.t V(s) is winning rate )
  • After we initialize V(s), we update it as follows
  • s = current state, s’=next state
  • s represents every state we encounter in an episode
  • Means we need to actually play an episode and keep track of state history
  • terminal state never updated since it doesn’t have a next state
  • we’ll do this over many episode

pseudocode

for t in range(max_iterations):
  state_history = play_game
  for(s,s') in state_history from end to start:
    V(s) = V(s) + learning_rate*(V(s')-V(s))

12. Playing the Game

  • how do we actually play the game/generate an episode
  • Take random action? No!
  • we don’t need to, we have the value function
maxV = 0
maxA = None
for a,s' in possible_next_states:
  if V(s') > maxV :
    maxV = V(s') # 좋은 max Value를 골랐으니 다음에 나오는 station이 된다.
    maxA = a
perform action max A

Note : would taking random actions even give us the right value functions? No, Because a game tree w/ random actions has different probabilities than a game tree w/ “best” actions

13. Problem

  • Problem with previous approach: Value function isn’t accurate
  • if we had the true value function, we wouldn’t need to do any of this work
  • Example of the explore-exploit dilemma
  • Random actions lead us to states we may not have otherwise visited
  • we can thus improve our value estimate for those states
  • but to win, we need to do the action that yields maximum value
  • we will use upsilon-greedy

14. Intuition

  • should remind you of the low-pass-filter / average-value-finding equation we saw earlier(+ gradient descent if we’ve seen that)
  • since we visit the states stochastically(随机地), V(s) will try to get close to V(s’) for all possible next s’
  • by playing infinitely many episodes, the proportion of time we spend in each’s will approach the true probabilities
  • Extremely Important detail, hard to discern(识别) from update equation alone
  • what order to update V(s)?
  • Key: We’re moving V(s) closer to V(s’)
  • Therefore we want V(s’) to be more accurate than V(s)
  • V(terminal) = 0, 1
  • For all others, if V(s’) is no better than V(s), this update doesn’t help
  • therefore, update goes backwards along state history

15. Summary

  • Credit assignment problem/ delayed rewards
  • Value function for representing future reward
  • value function efficiency vs searching game tree
  • iterative algorithm to find the value function
  • warning! Not the “formal” value function.
  • Everything in tic-tac-toe section is informal, designed to get us acquainted(熟悉) with solving a RL Problem

Reference:

Artificial Intelligence Reinforcement Learning

Advance AI : Deep-Reinforcement Learning

Cutting-Edge Deep-Reinforcement Learning

Comment  Read more

24. Mini Database

|

Mini Database Exercise

1. conio.h

  • console input output header
  • 즉 콘솔에서의 입력과 출력에 관한 헤더이다.
  • 함수들
    • kbhit: 키보드가 눌렸는지 확인
    • getch: 콘솔에서 buffer나 echo(타이핑한 키가 화면에 표시) 없이 한 개의 키 값을 입력 받음
    • getche: 콘솔에서 키보드 입력으로 한 개의 키 입력을 받음 에코 됨
    • ungetch: 키보드 버퍼로 문자 1개를 넣음
    • gscanf: 콘솔 입력으로 부터 포매팅된 입력을 받음
    • cputs: 콘솔에 직접 문자열을 출력
    • cprintf: 포맷을 사용해 콘솔에 문자열을 출력
    • clrscr: 화면을 지움

      2.cstdlib.h

  • rand(),srand, time들이 여기 헤더파일에 존재한다.
#include <iostream>
#include <conio.h>
#include <cstdlib>
#include <fstream>

using namespace std;

struct Person
{
    string name;
    string surname;
    short age;
    string telephone;
};
short peopleInDataBase;
Person people[20]; // it can be making 20 of them person.

void requireEnter();

void addPerson();
void showPeople();
void savePeopleToFile();
void loadPeopleFromFile();
void searchDatabase();
void removePersonFromDatabase();

int main()
{
    char test;
    loadPeopleFromFile(); // load it when the programming start

    do
    {
        cout << "Number of People in Database: " << peopleInDataBase << endl;
        cout << "MENU: " << endl;

        cout << "1. Add Person" << endl;
        cout << "2. Show All People" << endl;
        cout << "3. Save People to File" << endl;
        cout << "4. Load People from File" << endl;
        cout << "5. Search for a Person" << endl;
        cout << "6. Remove Person from DataBase" << endl;

        cout << endl;

        test = getch();

        switch(test)
        {
            case '1':
                addPerson();
                break;
            case '2':
                showPeople();
                break;
            case '3':
                savePeopleToFile();
                break;
            case '4':
                loadPeopleFromFile();
                break;
            case '5':
                searchDatabase();
                break;
            case '6':
                removePersonFromDatabase();
                break;
        }


        requireEnter();
        system("cls");
    }while(test != 27); // 27 is the keyboard asci code, so if we put "space bar" it doens work


    return 0;
}
void requireEnter()
{
    cout << "Click Enter to continue... " << endl;
    while(getch() != 13); // this function never stop the function.
}
void addPerson()
{
    cout << "Type name: ";
    cin >> people[peopleInDataBase].name;

    cout << "Type surname: ";
    cin >> people[peopleInDataBase].surname;

    cout << "Type age: ";
    cin >> people[peopleInDataBase].age;

    cout << "Type telephone: ";
    cin >> people[peopleInDataBase].telephone;

    peopleInDataBase++;
}
void showPeople()
{
    if(peopleInDataBase > 0)
    {
        for (int i = 0; i < peopleInDataBase; i++)
        {
            cout << "Person index: " << (i+1) << endl;
            cout << "Name: " << people[i].name << endl;
            cout << "Surname: " << people[i].surname << endl;
            cout << "Age: " << people[i].age << endl;
            cout << "Telephone: " << people[i].telephone << endl << endl;
        }
    }
    else
        cout << "There is nobody in database."  << endl;
}
void savePeopleToFile()
{
    ofstream file("database.txt"); // only putting some data inside

    if (file.is_open())
    {
        file << peopleInDataBase << endl;

        for (int i = 0; i < peopleInDataBase; i++)
        {
            file << people[i].name << endl;
            file << people[i].surname << endl;
            file << people[i].age << endl;
            file << people[i].telephone << endl;
        }

        file.close();
    }
    else
        cout << "I couldnt save to database" << endl;

}
void loadPeopleFromFile()
{
    ifstream file("database.txt"); //ifstream is only taking to read

    if (file.is_open())
    {
        file >> peopleInDataBase;

        if (peopleInDataBase > 0)
        {
            /*for (int i = 0; i < peopleInDataBase; i++)
            {
                file >> people[i].name;
                file >> people[i].surname;
                file >> people[i].age;
                file >> people[i].telephone;
            }*/
            int i = 0;
            do
            {
                file >> people[i].name; // where we want to put things from
                file >> people[i].surname; // loading >> this way
                file >> people[i].age;
                file >> people[i].telephone;

                i++;
            }while(!file.eof());

            cout << "People has been loaded properly. " << endl;
        }
        else
            cout << "Databse is Empty" << endl;
    }
    else
        cout << "The file database.txt doesnt exist" << endl;
}
void searchDatabase()
{
    if (peopleInDataBase > 0)
    {
        string name;
        cout << "Type a name of person you want to look for: ";
        cin >> name;

        for (int i = 0; i < peopleInDataBase; i++)
        {
            if (name == people[i].name)
            {
                cout << "Person index: " << (i+1) << endl;
                cout << "Name: " << people[i].name << endl;
                cout << "Surname: " << people[i].surname << endl;
                cout << "Age: " << people[i].age << endl;
                cout << "Telephone: " << people[i].telephone << endl << endl;
            }
        }
    }
    else
        cout << "There is nobody in database what do you want to look for??!" << endl;


}
void removePersonFromDatabase()
{
    if (peopleInDataBase > 0)
    {
        short index;
        cout << "Who do you want to remove? Type index: " << endl;
        cin >> index;

        if (peopleInDataBase >= index)
        {
            for (short k = index; k < peopleInDataBase; k++) // because we want to remove from some point
            {
                people[k-1].name = people[k].name; // because it started from 0 matrix
                people[k-1].surname = people[k].surname; // so when index reserve 1 in the txt file it means 0 person
                people[k-1].age = people[k].age;
                people[k-1].telephone = people[k].telephone;
            }

            peopleInDataBase--;
            savePeopleToFile();
        }
        else
            cout << "There is nobody like that" << endl;
    }
    else
        cout << "There is nothing to remove" << endl;
}

return

Comment  Read more

23. Fstream(3)

|

Peek

  • 그 다음 문자를 살짝 훔쳐본다(즉, 스트림에서 빼오지는 않고 읽기만 한다)따라서 get처럼 스트림에서 문자를 빼내는 것이 아니라, 그 문자는 그냥 스트림에 남아있게 된다.

Example 1

#include <iostream>

//putback
using namespace std;
int main ()
{
    char c = cin.peek();


    if (c > '0' && c < '9')
    {
        int number;

        cin >> number;

        cout << "Number is: " << number << endl;
    }
    else
    {
        string txt;

        cin >> txt;

        cout << "Text is: " << txt << endl;
    }

    return 0;
}

return

Put

  • 항상 파일은 리셋된다.
  • ’.’<- 가 나올떄까지 계속 쓸 수 있다
#include <iostream>
#include <fstream>

using namespace std;

int main()
{
    /*
        put(character to put on stream)
    */

    string txt = "thistextisconnected";


    for (int i = 0; i < txt.length(); i++)
    {
        cout.put(txt[i]).put(' ');
    }

    fstream file;

    file.open("test.txt", ios::out | ios::binary);
    if (file.is_open())
    {
        char c;
        do
        {
            c = cin.get();

            file.put(c);
        }while(c!='.');
    }
    else
        cout << "Nie udalo sie poprawnie otworzyc pliku" << endl;


    return 0;
}

result

Put back

#include <iostream>

//putback
using namespace std;
int main ()
{
    char c = cin.get();

    cin.putback(c);
    if (c > '0' && c < '9')
    {
        int number;

        cin >> number;

        cout << "Number is: " << number << endl;
    }
    else
    {
        string txt;

        cin >> txt;

        cout << "Text is: " << txt << endl;
    }

    return 0;
}


result

Write writing bytes to file

#include <iostream>
#include <fstream>

using namespace std;

int main()
{
    /*
        write
    */

    fstream file;

    file.open("sample.txt", ios::out | ios::binary);

    if (file.is_open())
    {
        char sample[] = "hello, this is sample file";

        cout << sizeof(sample) << endl;

        file.write(sample, sizeof(sample)-1 ); // '-1' because when we use write function, it have one more character end of word " " <- this one
        // so we need to delete it and it is working fine
    }
    else
        cout << "I couldnt open the file" << endl;



    return 0;
}

Comment  Read more

6. Build a Intelligence

|

1. Native(本地的) Solution to Tic-Tac-Toe

  • Small set of rules to ensure we never lose
  • we enumerate(列举) the rule of Tic-Tac-Toe. but we won’t. still consider what they might be( Ex, if board is empty and it’s our turn, place our piece in the center or corner)(Ex2, if opponent has 2 piece in a row, block the 3rd postion so they don’t win)(Ex3, if we have 2 piece in a row, add a 3rd so we can win the game)
  • Code would like this
    if(...):
    {...}
    else if (...):
    {...}
    else if (...):
    {...}
    
  • Goes against the whole idea of ML
  • We want one algorithm that can generalize(概括) to many problem(E.g neural network can classify pictures of animals, can also classify music)
  • An agent made up of it statements will never be able to do anything other than just play tic-tac-toe(Agent only doing tic-tac-toe)
  • Important: we have a model for how the game works. we know everything about tic-tac-toe
  • We Know what state yields the highest reward(any state where we have 3 pieces in a row)
  • i.e. by doing some action, we know the next state is given the current state(predict)
  • Seems Trivial(不重要的), but not always the case

Model

  • the idea of having a model of the environment will come into play later.
  • some algorithms we’ll learn will require us to have a model, like tic-tac-toe
  • other algorithms won’t require us to know anything, we’ll just explore the world and consume information as we go along

2. Components of a RL systems

State

  • Note that State involves only what the agent can sense, not everything about the environment(Ex. Vacuum robot in Australia won’t be affected by something happening in india)

Actions and Rewards

  • Actions:
    • Things Agent can do that will affect its state. in tic-tac-toe, that’s placing a piece on the board(like Player)
    • Preforming an action always brings us to the next state, which also comes with a possible reward.
  • Rewards:
    • Rewards tell us “how good” our actions was, not whether it was a correct/incorrect action
    • doesn’t tell us the best/Worst action
    • it’s just a number
    • rewards we’ve gotten over the course of our existence doesn’t necessarily represent possible rewards we could get in the future(E.g Search a bad part of state space, hit local max of 10 pts, but global max is 1000pts)
    • Agent Doesn’t know that.
    • Much like life as an animal/Human Being
    • Rewards are only meaningful relative to each other

Funny Notation

  • S(t),A(t) -> R(t+1),A(t+1)
  • sometimes represented as the 4-tuples: (s,a,r,s’)
  • oddly, the “prime” symbol doesn’t strictly mean “at the t+1”
  • Instead:
    • s’= state we go to when doing “a” from state “s”
    • r = reward get when we do “a” while in state “s”

Episode

  • Episode represents one run of the game(Ex. start tic-tac-toe with empty board)
  • As soon as one player gets 3 pieces in a row, that’s the end of the episode
  • our RL agent will learn across many episodes(Ex. after playing 1000,10000 or 100000 episodes, we can possibly have trained an intelligent agent)
  • . # episodes we use to train is a hyperparameter
  • playing the game tic-tac-toe is an episode task because we play it again and again
  • Different from a continuous task which never ends
  • when is the end of an episode?
  • Certain states in the state space tell us when the episode is over.
  • There are states from which more action can be taken
  • They are called Terminal States
  • For Tic-Tac_toe:
    • one player gets 3 in a row
    • Board is full(draw)

3. Other Games(Cart-pole/inverted pendulum/Groundhog Day)

  • Control systems: Inverted pendulum
  • RL : Cart-Pole
  • Unstable system
  • Episode starts which pole vertical, soon falls
  • Agent: Move to Keep the pole within certain angle.
  • Terminal : any angle past the point of no return
  • Angle requires continuous state space
  • infinite number of states
  • being able to learn from multiple episode is like the move Groundhog day
  • Main Character wakes up and lives through the same day, everyday
  • Allows him to get better and better at making the most out of the day
  • RL agent may suck at first, but will gradually learn with each passing episode. each episode is a fresh start

4. Summary

  • what do we learn so far?
  • Agent
  • environment
  • State, action, reward
  • Episodes
  • Terminal States

Reference:

Artificial Intelligence Reinforcement Learning

Advance AI : Deep-Reinforcement Learning

Cutting-Edge Deep-Reinforcement Learning

Comment  Read more