for Robot Artificial Inteligence

Multi-Agent path finding(MAPF) & REDUCTION-BASED SOLVERS

|

REDUCTION-BASED SOLVERS

  • How to exploit knowledge of others for solving own problems?
    • by translating the problem P to another problem Q (문제를 바꾼다)
  • Why is it useful?
    • If anybody improves the solver for Q then we get an improved solver for P for free
    • Staying on the shoulders of giants
  • Reduction, compilation, re-formulation techniques
  • Boolean satisfiability
  • fast SAT solvers
  • Constraint programming
    • global constraints for pruning(修剪树枝) search space
  • Answer set programming
    • declarative(陈述的) framework
  • Combinatorial auctions
  • a Boolean satisfiability (SAT)

1. Introduction to SAT

  • Express (model) the problem as a SAT formula in a conjunctive normal form (CNF)
  • a Boolean satisfiability (SAT)
    Boolean variables (true/false values)
    clause = a disjunction(分离) of literals (variables and negated(使无效) variables)
    formula = a conjunction of clauses
    solution = an instantiation of variables such that the formula is satisfied
    

Example:

(X or Y ) and (not X or not Y)
[exactly one of X and Y is true]
  • SAT model is expressed as a CNF formulaConjunctive normal form
  • We can go beyond CNF and use abstract expressions that are translated to CNF
  • We can even use numerical variables (and constraints).

2. SAT encoding: core idea

  • In MAPF, we do not know the lengths of plans (due to possible re-visits of nodes)!
  • We can encode(把…译成电码) plans of a known length using a layered graph (temporally extended graph).
  • Each layer corresponds to one time slice and indicates positions of agents at that time

3. SAT encoding with all-different

  • Uses multi-valued state variables (logarithmic encoding) encoding position of agents in layers (∧ = And, ∨ = OR)
  • Agent waits or moves to a neighbor
  • No-train constraint
  • Agents are not at the same nodes

4. Direct SAT encoding

  • Directly encodes positions of agents in layers
  • Agent is placed at exactly one node in each layer (¬ 논리 부정)
  • No two agents are placed at the same node in each layer
  • Agent waits or moves to a neighbor
  • No-swap and no-train (nodes before and after move are empty)

5. Comparison of SAT encodings

Finding makespan optimal solutions (composition with Independence Detection (OD+ID))

6. Mixed model

  • Using layered graph describing agent positions at each time step
    • $B_tav$ : agent a occupies vertex(顶点) v at time t
  • Constraints:
    • each agent occupies exactly one vertex at each time
    • no two agents occupy the same vertex at any time.
    • if agent a occupies vertex v at time t, then a occupies a neighboring vertex or stay at v at time t + 1.
  • Preprocessing:
    • $B_tav$ = 0 if agent a cannot reach vertex v at time t or a cannot reach the destination being at v at time t
import sat

def path (N,As):
  K = len(As)
  lower_upper_bounds(As,LB,UB) # Incremental generation of layers
  between(LB,UB,M),
  B = new_array(M+1, K,N),
  B :: 0..1,

  % Initialize the first and last states(Setting the initial and destination locations)
  foreach(循环) (A in 1..K)
    (V,FV) = As[A]
    B[1,A,V] = 1,
    B[M+1,A,FV] = 1
  end,

  % Each agent occupies exactly one vertex
  foreach(T in 1..M+1, A in 1..K)
    sum([B[T,A,V]] : V in 1..N) #=1
  end,

  % No two agents occupy the same vertex (No Conflict between agents)
  foreach ( T in 1..M+1, V in 1..K)
    sum([B[T,A,V] : A in 1..K]) # =1

  % Every Transition is valid(Agent moves to a neighboring vertex)
  foreach (T in 1..M, A in 1..K, V in 1..N)
    neibs(V,neibs),
    B[T,A,V] # =>
    sum([B[T+1, A, U]: U in neibs]) #>= 1
  end,

  % K-robustness
  foreach(T in 1..M1, A in 1..K, V in 1..N)
  B[T,A,V] # => sum([B[Prev,A2,V]:
                    A2 in 1..K, A2!=A
                    prev in max(1,T-L)..T]) # = 0
  end,

  solve(B)
  output_plan(B)

  • Picat provides assignment and loop statements for programming everyday things.
  • a Boolean satisfiability (SAT)
  • Assembly Sequence Planning (ASP)

7. Objectives in SAT

  • Makespan (minimize the maximum end time)
    • incrementally add layers until a solution found
  • Sum of cost (minimize the sum of end times)
    • incrementally add layers and look for the SOC optimal solution in each iteration (makespan + SOC optimal)
    • generate more layers (upper bound) and then optimize SOC (naïve)
    • incrementally add layers and increase the cost limit until a solution is found [Surynek et al, ECAI 2016]

8. Real World

  • Objective
    • How to develop robust and scalable AI technology for dealing with complex dynamic application scenarios?
  • What’s needed?
    • a model scenario
    • Robotic intra-logistics (물류)
  • why?
    • rich : multi-faceted, full of variations
    • scalable : layout, objects, granularity(粒度)
    • measurable : makespan, energy, quality of service
    • integrative(综合的) : mapf, data, constraints, decisions
    • relevant : industry 4.0
  • Robotics systems for logistics(物流) and warehouse automation based on many mobile robots and movable shelves
  • Main tasks: order fulfillment(routing(路由), order picking, replenishment(补充))
  • Many competing industry solutions : Amazon, Dematic, Genzebach, Gray Orange, Swisslog
  • what’s in the environment?
    • Objects : floor, robots, shelves, products, people, etc
    • Relations : positions, carries/d, capacity, orientation, durations, etc
    • Actions : move, pickup, putdown, pick, charge, restock, etc.
    • Objectives : deadlines, throughput, exploitation, energy management, human machine interaction, etc.

9. Beyond MAPF

  • Classified by objects, measurability, constraints, decisions
    • MAPF (Multi-Agent Path Finding)
      • Most simple, straightforwards extension of APF
      • Objects: only robots and the map
      • anonymous: n agents, n targets, any agent can be assigned to any target
      • non-anonymous: n agents, n targets, each agent is assigned a (pre-defined) target
    • TAPF (Combined Target Assignment and Path Finding)
      • Proposed in [2]: teams of robots
      • Multiple teams of robots (objects: only robots and the map)
      • Targets assigned to teams (constraint: one robot - one target)
      • Collision free paths for robots to targets (no swapping), with minimal maxspan
    • GTAPF (Generalized-TAPF)
      • Proposed in [3], inspired by online store order fulfilling requirements
      • Order #1
        • “Vintage LEGO Kit” and “Programming LEGO”
        • Rush order: 2/1/2019
      • Order #2
        • “Vintage LEGO Kit” and “Dancing with the Stars video”
        • International shipping
      • requirements
        • Group: an order might contain many items
        • Deadline: each order needs to be accomplished before a timestamp
        • Checkpoint: to fulfill certain item, some checkpoint needs to be visited
      • Multiple teams of robots (same as TAPF).
      • Sets of orders (multiple targets for an order, #robots 6= #orders possible)
      • Checkpoints for robots/teams (certain locations must be visited before targets)
      • Deadlines for orders.
      • Group completion (one order at a time).
      • Collision free paths for robots to targets, with minimal maxspan.
      • ASP-based(Answer Set Programming) solutions.
    • Others
      • inspired by real-world applications, di↵erent considerations:
        • Continuous vs. discrete movement
        • Online vs. offine
        • Checkpoints not to be (can be) revisited
        • Suboptimal solutions vs. scalability
        • Complex actions: transfers of items/targets between robots when pickup/putdown actions are considered
        • Multi-dimensional G-TAPF: on the ground (two dimensions, cars) vs. in the air (three dimensions, drones)

10. ASPRILO

  • Standardized benchmark domains
    • Concise problem specification
    • Domains ranging from MAPF to full order fulfillment
  • Formal specification
    • Formal elaboration(精心制作)
    • Correctness, completeness, optimality
  • Versatile instance generator
    • Rich set of customization options
    • Leverages(杠杆作用) multi-shot ASP for generation
  • Visualizer for problems and (candidate) solutions
    • Animated playback of plans
    • Graphical editor for instances
  • Solution checker with error feedback
    • Specific error descriptions
    • Modular design, easily extensible
  • Reference ASP encodings
    • High-level, elaboration-tolerant
    • Test bed for ASP and KRR technology

11. Conclusion

  • A real-world multi-agent application
  • A very challenging multi-agent planning problem
  • No clear dominant approach (yet)
    • Search-based vs. constraints programming vs. SAT vs..
  • Execution is bound to differ from the plan (integration…)
  • Challenge: MAPF with Self-Interested Agents
  • Incentives and mechanism designs [Bnaya et al. ‘13, Amir ‘15]
  • What if the other agent is adversarial(对立的)? or even worse, a human?
  • Challenges: Applying MAPF for Real Problems
    • Robotics
      • Kinematic constraints (Ma et al. ‘16)
      • Uncertainty is a first-class citizen
      • Continuous configuration space
      • Any-angle motion [Yakovlav et al. ‘17]
    • Traffic management
      • Flow-based approaches
      • No collisions, only traffic jams
      • Scale
  • Challenge: MAPF as Part of a System
    • Task allocation
    • Pick up and delivery tasks
    • Online settings
  • Challenge: Relation to General Multi-Agent Planning
    • Cross fertilization((무엇을 발전시키기 위해 다른 분야의 생각들을) 받아들이다) seems natural
    • MAPF is a special case of MAP
  • MAP
    • Many models, rich literature
    • Much work on uncertainty
    • Poor scaling
  • MAPF
    • Fewer models, growing literature
    • Not much work on uncertainty
    • Scales well

Reference: postassco

Comment  Read more

27. Computational Complexity(작업 처리 속도)

|

복잡도(complexity)

  • 컴퓨터 공학에서 계산속도가 빠르다라는 말은 주관적일 수 밖에 없습니다.
  • 따라서 어떠한 작업의 수행 속도를 나타내기 위해선 수학적으로 나타내야 합니다.
  • 컴퓨터 공학에선 어떠한 작업의 처리 속도를 복잡도(complexity) 라고 부르고, 그 복잡도를 Big O 표기법이라는 것으로 나타냅니다.
  • 이 표기법은, NN 개의 데이터가 주어져 있을 때 그 작업을 수행하기 위해 몇 번의 작업을 필요로 하는지 NN 에 대한 식으로 표현하는 방식입니다.
  • 복잡도가 클수록 작업 수행하는데 걸리는 시간이 늘어난다.
  • 예제 버블 정렬의 코드와 함께 복잡도를 보겠습니다.
for (int i = 0; i < N; i++) {
  for (int j = i + 1; j < N; j++) {
    if (arr[i] > arr[j]) {
      swap(arr, i, j)
    }
  }
}
  • N 개의 원소가 있는 arr 이라는 배열을 정렬하기 위해서는 일단 적어도:
  • 번의 반복이 필요 합니다. ((N-1+N-2+…+1)) 따라서 Big O 표현법으로 이 정렬이 얼마나 빠르게 수행될 수 있는지 나타낸다면 :
  • 보통 Big O 표현법으로 나타낼 때 최고 차항만을 나타냅니다.(그리고 통상적으로 최고 차항의 계수도 생략합니다.). 왜냐하면 N 이 엄청 커지게 되면 최고 차항 말고는 그닥 의미가 없기 때문입니다.
  • 따라서 버블 정렬 알고리즘의 복잡도는 : O($N^2$)
  • 일반적으로 알고리즘이 O($N^2$) 꼴이면 좋은 편은 아닙니다. 왜냐하면 N 이 10000만 되더라도 $10^8$번의 작업처리를 해야하기 떄문입니다.
  • 다행이도 정렬 알고리즘의 경우 퀵소트(Quicksort)라는 알고리즘을 활용하면 아래와 같은 복잡도로 연산을 처리할 수 있습니다.
  • 물론 퀵소트 알고리즘을 사용했을 때 항상 버블 정렬 방식 보다 빠르게 정렬할 수 있다는 의미는 아닙니다.
  • 왜냐하면 저 항 앞에 어떠한 계수가 붙어있는지 알 수 없기 때문이지요. 만약에 버블 정렬이 O($N^2$)이고 퀵소트가 O(100000NlogN)이였다면 N이 1000일 떄 버블 정렬이 더 빠르게 수행 됩니다.(물론 이렇게 극단적이지 않습니다. 퀵소트가 거의 대부분 더 빠르게 됩니다)
  • 하지만, NN 이 정말 커진다면 언젠가는 퀵소트가 버블 정렬보다 더 빨리 수행되는 때가 발생합니다.
  • 아래와 차트를 보면 각각의 O에 대해 복잡도가 어떻게 증가하는지 볼 수 있습니다.
  • 가장 이상적인 복잡도는 O(1)O(1) 이지만 이는 거의 불가능하고 (이는 마치 전체 데이터를 채 보지 않은 채 작업을 끝낼 수 있다는 의미 입니다), 보통 O(logn)O(logn) 이 알고리즘이 낼 수 있는 가장 빠른 속도를 의미합니다. 그 다음으로 좋은 것이 당연히 O(n)O(n) 이고, O(nlogn)O(nlogn) 순 입니다.

REFERENCE

모두의코드

Comment  Read more

Multi-Agent path finding(MAPF) & Search algorithm

|

Multi-Agent path finding(MAPF)

  • Multi-Agent(many robot) play to find the target position and move
  • MAPF problem: Find a collision-free plan(path) for each agent
  • Another name we called it : Cooperative path finding(CPF), Multi-robot path planning, pebble(鹅卵石) motion

1) Introduction to MAPF

  • a graph (directed or undirected)
  • a set of agents(robot), each agent is assigned to two locations(nodes) in the graph(start, destination)
  • Each agent can perform either move(to a neighbouring node) or wait(in the same node) actions

Typical Assumption: all move and wait actions have identical durations(plans for agents are synchronized)

  • Plan is a sequence of actions for the agent leading from its start location to its destination
    • the length of plan(for an agent) is defined by the time when the agent reaches its destination and does not leave it anymore.
  • Find Plans for all agents such that the plans do not collide in time and space (no two agents are at the same location at the same time)
  • some necessary conditions for plan existence:
    • no two agents are at the same start node
    • no two agents share the same destination node(unless an agent disappears when reaching its destination)
    • the number of agents is strictly smaller than the number of nodes
  • agent at $v_i$ cannot perform move $v_j$ at the same time when agent at $v_j$ perform move $v_i$
  • Agents may swap position. but agents use the same edges at the same time, swap is not allowed
  • Agent can approach node that is currently occupied but will be free before arrival(Agent at $v_i$ cannot perform move $v_j$ if there is another agent at $v_j$)
  • if any agent is delayed then trains may cause collision during execution
  • to prevent such collisions we may introduce more space between agents

K-robustness

  • An agent can visit a node, if that node has not been occupied in recent k steps.
  • 1-robustness covers both no-swap and no-train constraints (1번밖에 못 거치는것)

Other constraints

  • No Plan(path) has a cycle
  • No Two plans(paths) visit the same location
  • waiting is not allowed
  • some specific locations must be visited

How to Measure Quality of plans?

There are two typical criteria(to minimize):

  • Makespan
    • distance between the start time of the first agent and the complete time of the last agent(아무개 에이전트의 시작과 아무개 에이전트의 마지막 동작)
    • maximum of lengths of plans(end times)
  • Sum of costs(SOC)
    • Sum of lengths of plans(end times)
  • Optimal single agent path finding is tractable(易处理的)
    • e.g. Dijkstra’s algorithm
  • Sub-optimal Multi-Agent path finding(with two free unoccupied nodes) is tractable
    • e.g algorithm Push and Rotate
  • MARF, where agents have joint goal nodes(it does not matter which agent reaches which goal) is tractable
    • reduction to min-cost flow problem
  • Optimal(Makespan,SOC) multi-agent path finding is NP-HARD

Solving approaches

  1. Search-Based Techniques
    • state-space search(A*)
      • state = location of agents at nodes
      • transition = performing one action for each agent
    • Conflict-based search
  2. ** Reduction-based Techniques
    • Translate the problem to another formalism(形式主义)
      • SAT/CSP/ASP …

2. Search-Based Solvers

  • Search is a General Problem solving technique
  • To expand or not expand, this is the question
  • Why do we need SEARCH for MAPF? Because it is Finding an optimal solution to hundreds of agents
  • this is classical application of Search
  • Solving Multi-Agent Path Finding with Search
  • From A* to prioritized planners
  • From prioritized(按重要性排列) planners back to A*
  • The Increasing Cost tree search(ICTS)
  • The Conflict-Based Search(CBS) framework
  • Approximately optimal search-based solvers
  • Single Agent Search Problem Properties
    • Number of state = ? (it will be 4 )
    • Branching factor = ? (4)
  • Two Agent Search Problem Properties
    • Number of States = ? ($20^2$)
    • Branching factor = ? ($5^2$)
  • then, What about K Agents?
    • Number of States = ($20^k$)
    • Branching factor = ($5^k$)
  • IT IS VERY HARD PROBLEM
  • SO Key Idea: Plan for each agent seperately
  • Challenge: Maintaining soundness, completeness, and optimality

Prioritized Planning (Silver 2005) Analysis: First Agent

Prioritized Planning (Silver 2005) Analysis: Second Agent

  • Complexity?
    • Polynomial in the grid size and max time
  • Soundness?
    • Yes
  • Complete? Optimal?
    • No
  • Smart Agent Prioritization
    • conflict oriented WHCA*[Bnaya and Felner ‘14]
    • Re-prioritization and safe intervals [Andreychuk and Yakovlev ‘18]
  • Integrate planning and execution
    • Windowed Hierarchical CA* [Silver ‘06]
  • High-level idea : reservation-based planning (ex) Number of states = 4 x 5 x maxTime)
  • FAST, requires almost no coordination
    • but incomplete and not optimal

      Search based solver overview

  • Can MARF algorithm be Complete and efficient?

MAPF as a puzzle

  • MAPF is highly related to pebble motion problems
    • Each agent is a pebble(조약돌)
    • Need to move each pebble to its goal
    • cannot put two pebbles in one hole
  • Pebble Motion can be solved Polynomial
    • but far from optimally
    • complex formulation
  • Similar approaches
    • slidable(滑动) Multi-Agent Path Planning[Wang & Botea, IJCAI, 2009]
    • Push and Swap [Luna & Bekris, IJCAI, 2011]
      • Parallel push and swap [Sajid, Luna, and Bekris, SoCS 2012]
      • Push and Rotate [de Wilde et al. AAMAS 2013]
    • Tree-based agent swapping strategy [Khorshid at el. SOCS, 2011]

Examples

Search based solver Summary

Can a MAPF algorithm be complete and efficient and optimal?

On the Complexity of Optimal Parallel Cooperative Path-Finding, Surynek 2015 Planning Optimal Paths for Multiple Robots on Graphs, Yu and LaValle, 2013

From Tiles to Agents

  • Can we adpat technique from these extreme cases? answer is YES(invent some new techniques also(optimal MAPF))
  1. Searching the k-agent search space
    • A*+OD+ID [Standley ‘10]
    • EPEA* [Felner ‘X, Goldenberg ‘Y]
    • M* [Wagner & Choset ‘Z]
  2. Other search-based approaches
    • ICTS [Sharon et al ‘13]
    • CBS [Sharon et al ’15]

Optimal MARF with A*

  • A* expands nodes
  • A* gain efficiency by choosing which node to expand

    What is the complexity of expanding a single node in MARF with 20 agents?

$5^20$ = 95,367,431,640,625

Search Tree Growth with Operator Decomposition

  • Pros
    • Branching factor is reduced to 5(= single agent)
    • with a perfect heuristic can solve the problem
  • Cons
    • Solution is deeper by a factor of k
    • More nodes may be expanded, due to intermediates

Enhanced Partial Expansion A*

Independence Detection

  • Simple Independence Detection
    • Solve optimally each agent separately
    • while some agents conflict
        1. Merge conflicting agents to one group
        1. solve optimally new group
  • Independence Detection
    • Solve optimally each agent separately
    • While some agents conflict
        1. Try to avoid conflict, with the same cost
        1. Merge conflicting agents to one group
        1. Solve optimally new group
  • but what if we have an environment like that

M* (Wagner & Choset ‘11,’14)

  • Find optimal path for each agent individually
  • Start the search. Generate only nodes on optimal paths
  • If conflict occurs – backtrack and consider all ignored actions

Recursive M* (Wagner & Choset ‘11,’14)

  • Recursive M*
    • Find optimal path for each agent individually
    • Start the search. Generate only nodes on optimal paths
    • If conflict occurs – backtrack and consider all ignored actions
      • Apply M* recursively after backtracking
  • but problem is :
  • joint path up to bottleneck can be long

Other Search-based approaches(ICTS,CBS)

  1. Increasing Cost Tree Search (Sharon et al. ‘12)
    • Does it work? - Not always
  2. Conflict Based Search(CBS)
    • CBS only performs single agents(But, many times, and under different constraints)
    • Conflict: agent 1 and agent 2, plan to occupy C at time 2
    • Constrain: agent 1, avoid C at time 2 or agent 2 to avoid C at time 2
    • The Constraint Tree
      • Nodes:
    • A set of individual constraints for each agent
    • a set of paths consistent with the constraints - Goal Test:
    • Are the paths conflict free
  • Analysis: *Example 1**
    • How many states A* will expand?
    • How many states CBS will?
    • Motivation: case with bottlenecks:
    • A*
      • g+h=6: All $m^2$ combinations of ($A_i$,$B_j$) will be expanded
      • f=7 : 3 states are expanded
      • A* $m^2$ = O($m^2$)
      • CBS : 2m+14 = O(m) states
    • When m > 4 CBS will examine fewer states than A*
  • Analysis: *Example 2**
    • States expanded by CBS?
    • States expanded by A*?
    • 4 optimal solutions for each agent
    • Each pair of solutions has a conflict
    • Rough analysis:
      • CBS: exponential in #conflicts = 54 states
      • A*: exponential in #agents = 8 states
  • Trends observed
    • In open spaces: use A*
    • In bottlenecks: use CBS
  • What if we have both?

Meta-Agent CBS (MA-CBS)

  • Plan for each agent individually
  • Validate(确证) plans
  • If the plans of agents A and B conflict
    If (should merge(A,B))
    merge A and B into a meta-agent
    and solve with A*
    Else
    
  • Constrain A to avoid the conflicts or Constrain B to avoid the conflict
  • Should Merge(A,B) (simple rule):
    • Merge when observed more than T conflicts between A,B

CBS Enhancements

  • Which conflict to resolve? [Boyarski et al. ‘16]
  • What to do after merging? [Boyarski et al. ‘16]
  • Heuristics for the constraint tree search [Felner et al. ‘18]
  • Augmenting CBS with human knowledge [Cohen et al.]
  • Which low-level solver to use?
  • When to merge the agents ?

Summary

  • A* (M, EPEA, A*+OD+ID)
    • Main factors: #agents, graph size, heuristic accuracy
  • ICTS
    • Main factors: #agents, Δ, graph size
  • CBS and its variants
    • Main factors: #conflicts

Solving MAPF

Bounded Suboptimal Algorithms

  • An algorithm is bounded suboptimal iff
    • It accepts a parameter !
    • It outputs a solution whose cost is at most 1 + ! ⋅Optimal
  • How to create a bounded suboptimal algorithm?
    • Different search algorithms
    • Inadmissible(不允许的) heuristics

Suboptimal ICTS

Suboptimal A*

Suboptimal M*

Suboptimal CBS

  • Observation: Suboptimality can be introduced in both levels
    • ECBS (Barer et al. ‘14)
    • ECBS+Highways (Cohen et al. ’15, ‘16)

Advanced Issues in Search-based MAPF Algorithms

  • When to use which algorithm? Ensembles(全体)?
  • Using knowledge about past plans (Cohen et al. ’15)
  • Stronger heuristics for all algorithms
  • Deeper analysis of algorithms’ complexity
  • Beyond grid worlds
    • Kinematic constraints (Ma et al. ‘16)
    • Any-angle planning (Yakovlev et al. ‘17)
    • Hierarchical environments (Walker et al. ’17)
    • Large agents (Li et al. ‘19)
  • Priorized planning based on CBS (Ma et al. ‘19)
  • Planning & execution

Reference: postassco

Comment  Read more

26. File

|

Parsing text in files

  • parsing means 다이아몬드가 많이 나오는 위치로 이동을 일단 한 후에 돌을 많이 캔다음에 다이아몬드만 쏙쏙 뽑아서 보석으로 가공하는 과정
#include <iostream>
#include <fstream>
#include <sstream>

//sstream is <string stream>
using namespace std;

int main() {

	string filename = "stats.txt";
	ifstream input;

	input.open(filename);

	if(!input.is_open()) {
		return 1;
	}

	while(input) {
		string line;

		getline(input, line, ':');
		// to create singe code by ':'
		// indivisual data/
		// so "line" variable get before ':' value.
		//population is the after ':' variable.

		int population;
		input >> population; // to give input int value to population

        input.get(); // this is for white space, whatever white space they read, but in c++ 11, use we can read white space.
		input >> ws;
		// ws is that Extracts and dsicards characters from the stream as long as the
		// next available character is a whitespace or unitil there are no more characters available.
		// white space is a term that refer to characters that are used for formatting purpose.
		// In C++, this refres primarily to spaces, tabs and newlines.
		// the C++ compiler generally ignores whitespace.
		if(!input) {
			break;
		}

		cout << "'" << line << "'" << " -- '" << population << "'" << endl;
	}

	input.close();

	return 0;
}


Result

Reading and writing binary files

  • Class 나 structor 에서 , instance가 list 로 되어있으면 &(reference)를 사용하여 address를 가르쳐 줘야한다.
  • 또한 받는 밸류 값에 대한 instance가 list 형 식이면 받는 형식은 포인터를 지정하여 address 주소에 대한 밸류를 가르킨다.
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

struct Data {
	char name[100];
	int age;
	double height;
};

int main() {

	string filename = "data.bin";
	// now we can open
	Data data = { "Pixie", 120, 0.8 };

	/*
	 data.name = "Pixie";
	 data.age = 120;
	 data.height = 0.8;
	 */



/* write binary file*/

	ofstream output;

	output.open(filename);
	// bitwise , bit operator

	if (!output.is_open()) {
		cout << "Could not create " << filename << endl;
	}

	output.write((char *) &data, sizeof(data)); // data을 저장한다.
	// pointing some data.
	// &data <- becuase the data is  in list, so need to tell them data address
	// and (char *) <- it is pointer to data address for real value.
	// sizeof() <- in here the maens how many data we will recieve.


	if (!output) {
		cout << "Could not write data to file " << filename << endl;
	}

	output.close();


/* Read binary file*/


	ifstream input;

	input.open(filename);

	if (!input.is_open()) {
		cout << "Could not read " << filename << endl;
	}

	Data inputData;
	// 읽어드릴데이터의 함수

	input.read((char *) &inputData, sizeof(data)); // input data에 데이터 값을 콜해준다.

	if (!input) {
		cout << "Could not read data from file " << filename << endl;
	}

	input.close();

	cout << inputData.name << ": " << inputData.age << ": " << inputData.height << endl;

	return 0;
}

Reading Text file

#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main() {

	string inFileName = "test.txt";

	ifstream inFile;
	//ifsttream is the ios::in like only read the file.


	inFile.open(inFileName);

	if (inFile.is_open()) {

		string line;

		while (inFile) {
			getline(inFile, line);
			cout << line << endl;
		}
		// getline(infile,line) means that get line from infile file one line by one line
		inFile.close();
	} else {
		cout << "Cannot open file: " << inFileName << endl;
	}

	return 0;
}


Binary files

  • pragma pack(push,1)
  • 변수 선언의 순서가 바껴도 구조체의 크기를 그대로로 하는 것
  • pragma pack(1)에서 1은 1바이트 단위로 저장하겠다는 것.
  • 예를 들어 Person, Person1 구조체가 같아 내용은 같으나 크기가 다르다.
  • 그렇다면 왜 모두 1 바이트로 해서 메모리의 낭비가 없도록 하지 않는 걸까.
  • 32비트 cpu에서는 4(32비트)의 단위로 데이터를 처리하는 것이 가장 빠르기 때문이다.
  • 즉 #pragma pack(1)이라고 선언해놓고 원래대로 돌려놓지 않는다면 속도저하의 문제가 생길 수 있다.
  • 따라서 위의 소스에서 구조체 선언이 끝나는 부분에 #pragma pack(4)라고 선언해 줘야 할 것이다.
  • 하지만, 여기에도 문제가 있다. 만약 이 소스를 32비트의 PC가 아닌 다른 CPU가 장착된 장비에서 컴파일하게 된다면 어떻게 될 것인가.
  • 그런 문제를 해결해주기 위해 C++ 기술인 #pragma pack(push, 1)
  • #pragma pack(pop)
  • 기존의 바이트를 스택에 push(padding 충전제) 하고 1바이트 단위로 처리한 다음 끝나는 부분에 원래 바이트 단위를 pop해주는 코드이다 .
#include <iostream>
#include <fstream>
using namespace std;
// #<- call the library function.
#pragma pack(push, 1)
// 변수 선언의 순서가 바껴도 구조체의 크기를 그대로로 하는 것
// pragma pack(1)에서 1은 1바이트 단위로 저장하겠다는 것.
// 예를 들어 Person, Person1  구조체가 같아 내용은 같으나 크기가 다르다.
// 그렇다면 왜 모두 1 바이트로 해서 메모리의 낭비가 없도록 하지 않는 걸까.
// 32비트 cpu에서는 4(32비트)의 단위로 데이터를 처리하는 것이 가장 빠르기 때문이다.
// 즉 #pragma pack(1)이라고 선언해놓고 원래대로 돌려놓지 않는다면 속도저하의 문제가 생길 수 있다.
// 따라서 위의 소스에서 구조체 선언이 끝나는 부분에 #pragma pack(4)라고 선언해 줘야 할 것이다.
// 하지만, 여기에도 문제가 있다. 만약 이 소스를 32비트의 PC가 아닌 다른 CPU가 장착된 장비에서 컴파일하게 된다면 어떻게 될 것인가.
// 그런 문제를 해결해주기 위해 C++ 기술인 #pragma pack(push, 1)
// #pragma pack(pop)
// 기존의 바이트를 스택에 push(padding 충전제) 하고 1바이트 단위로 처리한 다음 끝나는 부분에 원래 바이트 단위를 pop해주는 코드이다 .
// data pack(padding 쌓아 놓기 alignment)
struct Person {
	char name[50];
	int age;
	double weight;
};

#pragma pack(pop)

int main() {

	cout << sizeof(Person) << endl;




	return 0;
}

writing Text file

#include <iostream>
#include <fstream>
#include <string>

//fstream is the file stream
//we can read and write

using namespace std;

int main() {

	// ofstream outFile;
	// this is one kind of convenience class
	// and it sort of means the same ofstream means that outputstream
    string filename = "text.txt";
	fstream file;

	//string outputFileName = "text.txt";

    file.open(filename, ios::out | ios::binary);
    //outFile.open(outputFileName, ios::out);
    //ios::out means that writing.
	 // so if we use fstream, instead of , we need to specify ios::out
// we can use both of them
	if(file.is_open()) {
		file << "Hello there" << endl;
		file << 123 << endl;
		file.close();
	}
	else {
		cout << "Could not create file: " << "text.txt" << endl;
	}

	return 0;
}

Result

Comment  Read more

8. Tic-Tac-Toe

|

1. Outline

  • play_game(p1,p2,env)
  • USE OOP approach

2. representing States

  • last post, I told you this would be an O(1) lookup (as in an array )
  • Possible solution: Convert the board to tuple of tuples, use as dictionary Key
  • Better: Map Each state to a number, use numpy array

3. Mapping state to a number

  • there are 3 state, “O”, “X”, “Empty”
  • 9 array so all possible state is 3^9 = 19683
  • overhead not a problem, these states are unreachable(e.g. 3x’s in a row and 3o’s in a row on the same board)
  • Should remind us of binary numbers(2 Possible symbols in each location vs 3)
  • binary to decimal(小数):
  • For us:

In code

k=0
h=0
for i in xrange(Length):
  for j in xrange(Length):
    if self.board[i,j] == 0:
      v = 0
    elif self.board[i,j] == self.x:
      v = 1
    elif self.board[i,j] == self.o:
      v = 2
    h + = (3**k)*v
    k + = 1

4. Initializing the value function

  • Recall, we initialize V(s) as :
    • 1 if s == winning terminal state
    • 0 if s == lose or draw terminal state
    • 0.5 otherwise
  • how do we find all the “s”?

5. Enumerating state

  • lets think which is better ? 1) Create a “game tree “ of all possible in the game? 2) Create a permutation(排列) of all possible settings of all possible positions on the board?

Game Tree

  • Problem : redundant states
  • i.e. see the same state more than once in the tree

Start: 9 choices then: 8 choices then: 7 choices ….

9! = 362880 » 3^9

permutation(순열)

  • How? think binary first:
  • Generate permutation of Length N:

    0 + Generate permutation of length N -1
    1 + Generate Permutation of Length N - 1

 def generate_all_binary_numbers(N):
   results = {}
   child_results = generate_all_binary_numbers(N-1)
   for prefix in ('0', '1'):
     for suffix in child_results:
       new_result = prefix + suffix
       returns.append(new_result)
   return results

(base case not shown for simplicity)
  • if don’t get about this please check in here

Enumerating States Recursively

  • that is all combinations of inputs an programs that halt
  • Function: get_state_hash_and_winner
  • Returns : List of triples(state, winner, ended)
  • state : configuration of the board as a hashed int
  • Winner is None if ended is False (즉 승자가 나올떄 까지 계속 돈다)
  • inputs : env, i, j
max_length=0
while(true):
  max_length = max_length +1
  for each program "P" of length <= max_length: # P is finite
    for each input "I" of length <= max_length:
      if "env" halts on 'I' after max_length steps:
        print P,I

Summary

  • In order to initialize V(s) to our desired values, we must enumerate all the states
  • we can either search the game tree(9!) or find board permutations($3^9$)
  • 9! is super-exponential
  • Enumerating game states Recursively
  • initializing V(s) is different for X and O

6. The Environment Class(Code)

Method: __init__() : #initialize Important instance vars
is_empthy(i,j): # returns true if(i,j) is empty
reward(symbol)
get_state()
game_over()
draw_board()

Constructor

class Environment:
  def __init__(self):
    self.board = np.zeros((length, length))
    self.x = -1 # represents an x on the board, player 1
    self.o = 1 # represents an o on the board player 2
    self.winner = None
    self.ended = False
    self.num_states = 3**(length*length)

is_empthy

def is_empthy(self, i, j):
  return self.board[i,j] == 0

reward

def reward(self, sym):
  # no reward until game is over
  if not self.game_over():
    return 0
  # if we get here, game is over
  # sym will be self.x or self.o
  return 1 if self.winner == sym else 0

get_state

def get_state(self):
  # returns the current state, represented as an int
  # from 0...|S|-1, where S = set of all possible states
  # |S| = 3^(BOARD SIZE), since each cell can have 3 possible values - empty, x, o
  # some states are not possible, e.g. all cells are x, but we ignore that detail
  # this is like finding the integer represented by a base-3 number
  k = 0
  h = 0
  for i in range(LENGTH):
    for j in range(LENGTH):
      if self.board[i,j] == 0:
        v = 0
      elif self.board[i,j] == self.x:
        v = 1
      elif self.board[i,j] == self.o:
        v = 2
      h += (3**k) * v # Value Funtion
      k += 1
  return h

game_over

def game_over(self, force_recalculate==False):
  if not force_recalculate and self.ended:
    return self.ended

    # check rows
    for i in range(LENGTH):
      for player in (self.x, self.o):
        if self.board[i].sum() == player*LENGTH:
          self.winner = player
          self.ended = True
          return True

    # check columns
    for j in range(LENGTH):
      for player in (self.x, self.o):
        if self.board[:,j].sum() == player*LENGTH:
          self.winner = player
          self.ended = True
          return True

    # check diagonals
    for player in (self.x, self.o):
      # top-left -> bottom-right diagonal
      if self.board.trace() == player*LENGTH:
        self.winner = player
        self.ended = True
        return True
      # top-right -> bottom-left diagonal
      if np.fliplr(self.board).trace() == player*LENGTH:
        self.winner = player
        self.ended = True
        return True

    # check if draw
    if np.all((self.board == 0) == False):
      # winner stays None
      self.winner = None
      self.ended = True
      return True

    # game is not over
    self.winner = None
    return False

draw_board

def draw_board(self):
  for i in range(LENGTH):
    print("-------------")
    for j in range(LENGTH):
      print("  ", end="")
      if self.board[i,j] == self.x:
        print("x ", end="")
      elif self.board[i,j] == self.o:
        print("o ", end="")
      else:
        print("  ", end="")
    print("")
  print("-------------")

7. The Agent class

  • Contains the AI
  • different from supervised/unsupervised ML, because we don’t just feed it data
  • it has to interact with the Environment
  • one-line update for V(s) is a very small part of the agent, yet responsible for 100% of its intelligence

The Agent Class

class Agent:
  def __init__(self, eps=0.1, alpha=0.5):
    self.eps = eps # probability of choosing random action instead of greedy
    self.alpha = alpha # learning rate
    self.verbose = False
    self.state_history = []

  def setV(self, V): # SetV is the allow us to initialize the value function
    self.V = V

  def set_symbol(self, sym): # symbol is allow us to give this agent a symbol that it'll use to play on the board
    self.sym = sym

  def set_verbose(self, v):
    # if true, will print values for each position on the board
    self.verbose = v

  def reset_history(self):
    self.state_history = []

Take_action

def take_action(self, env):
  # choose an action based on epsilon-greedy strategy
  r = np.random.rand()
  best_state = None
  if r < self.eps:
    # take a random action
    if self.verbose:
      print("Taking a random action")

    possible_moves = []
    for i in range(LENGTH):
      for j in range(LENGTH):
        if env.is_empty(i, j):
          possible_moves.append((i, j))
    idx = np.random.choice(len(possible_moves))
    next_move = possible_moves[idx]
  else:
    # choose the best action based on current values of states
    # loop through all possible moves, get their values
    # keep track of the best value
    pos2value = {} # for debugging
    next_move = None
    best_value = -1
    for i in range(LENGTH):
      for j in range(LENGTH):
        if env.is_empty(i, j):
          # what is the state if we made this move?
          env.board[i,j] = self.sym
          state = env.get_state()
          env.board[i,j] = 0 # don't forget to change it back!
          pos2value[(i,j)] = self.V[state]
          if self.V[state] > best_value:
            best_value = self.V[state]
            best_state = state
            next_move = (i, j)

    # if verbose, draw the board w/ the values
    if self.verbose:
      print("Taking a greedy action")
      for i in range(LENGTH):
        print("------------------")
        for j in range(LENGTH):
          if env.is_empty(i, j):
            # print the value
            print(" %.2f|" % pos2value[(i,j)], end="")
          else:
            print("  ", end="")
            if env.board[i,j] == env.x:
              print("x  |", end="")
            elif env.board[i,j] == env.o:
              print("o  |", end="")
            else:
              print("   |", end="")
        print("")
      print("------------------")

  # make the move
  env.board[next_move[0], next_move[1]] = self.sym

update_state_history

def update_state_history(self, s):
  # cannot put this in take_action, because take_action only happens
  # once every other iteration for each player
  # state history needs to be updated every iteration
  # s = env.get_state() # don't want to do this twice so pass it in
  self.state_history.append(s)

update

def update(self, env):
  # we want to BACKTRACK over the states, so that:
  # V(prev_state) = V(prev_state) + alpha*(V(next_state) - V(prev_state))
  # where V(next_state) = reward if it's the most current state
  #
  # NOTE: we ONLY do this at the end of an episode
  # not so for all the algorithms we will study
  reward = env.reward(self.sym)
  target = reward
  for prev in reversed(self.state_history):
    value = self.V[prev] + self.alpha*(target - self.V[prev])
    self.V[prev] = value
    target = value
  self.reset_history()

8. The Human Class(Code)

Human class

class Human:
  def __init__(self):
    pass

  def set_symbol(self, sym):
    self.sym = sym

  def take_action(self, env):
    while True:
      # break if we make a legal move
      move = input("Enter coordinates i,j for your next move (i,j=0..2): ")
      i, j = move.split(',')
      i = int(i)
      j = int(j)
      if env.is_empty(i, j):
        env.board[i,j] = self.sym
        break

  def update(self, env):
    pass

  def update_state_history(self, s):
    pass


# recursive function that will return all
# possible states (as ints) and who the corresponding winner is for those states (if any)
# (i, j) refers to the next cell on the board to permute (we need to try -1, 0, 1)
# impossible games are ignored, i.e. 3x's and 3o's in a row simultaneously
# since that will never happen in a real game

play games

def play_game(p1, p2, env, draw=False):
  # loops until the game is over
  current_player = None
  while not env.game_over():
    # alternate between players
    # p1 always starts first
    if current_player == p1:
      current_player = p2
    else:
      current_player = p1

    # draw the board before the user who wants to see it makes a move
    if draw:
      if draw == 1 and current_player == p1:
        env.draw_board()
      if draw == 2 and current_player == p2:
        env.draw_board()

    # current player makes a move
    current_player.take_action(env)

    # update state histories
    state = env.get_state()
    p1.update_state_history(state)
    p2.update_state_history(state)

  if draw:
    env.draw_board()

  # do the value function update
  p1.update(env)
  p2.update(env)

Main

if __name__ == '__main__':
  # train the agent
  p1 = Agent()
  p2 = Agent()

  # set initial V for p1 and p2
  env = Environment()
  state_winner_triples = get_state_hash_and_winner(env)


  Vx = initialV_x(env, state_winner_triples)
  p1.setV(Vx)
  Vo = initialV_o(env, state_winner_triples)
  p2.setV(Vo)

  # give each player their symbol
  p1.set_symbol(env.x)
  p2.set_symbol(env.o)

  T = 10000 #train
  for t in range(T):
    if t % 200 == 0:
      print(t)
    play_game(p1, p2, Environment())

  # play human vs. agent
  # do you think the agent learned to play the game well?
  # human Verification
  human = Human()
  human.set_symbol(env.o)
  while True:
    p1.set_verbose(True)
    play_game(p1, human, Environment(), draw=2)
    # I made the agent player 1 because I wanted to see if it would
    # select the center as its starting move. If you want the agent
    # to go second you can switch the human and AI.
    answer = input("Play again? [Y/n]: ")
    if answer and answer.lower()[0] == 'n':
      break

9. Summary

  • Would not be able to increase intelligence by learning from experience
  • would not be generalizable to other Environments
  • Components of a RL system (Agent, Environment, state, action, reward)
  • Episodes, terminal states
  • Choosing rewards carefully
  • credit assignment(움직임에 따라 승리의 확율을 나타낸다)
  • Delayed Reward
  • Planning intelligently to maximize overall rewards
  • Value function
  • V(s) <- V(s) + α(V(s’)- V(s))

RL vs SL

  • why not use a supervised learning model that maps states -> actions directly
  • i.e model y = f(x) where x = state, y= action
  • how would we come up with labels?
  • very easy for state space to grow too large to enumerate
  • connect-4/ tic-tac-toe is probably the limit
  • imagine: HD graphics 60 FPS video game - we can’t even conceptualize how large the state space
  • thus, how could we create a label for every state?
  • Rewards are delayed(데이터가 없는 상태에서 액션을 즉시 선택하는 것은 올바르지 않기 때문에, 데이터를 쌓은 상태에서 액션을 취하려 Reward Delayed를 한다)
  • but in Rl value function has the “future” built into it
  • if doing SL, and somehow manage to get around a humongous(巨大的) state-space, how would we choose the right actions?
  • instead of letting the value function take care of measuring the future, we need to somehow keep track of what’s best for all future possibilities
  • Rewards can have component of randomness. (E.g same state gives reward of 1 or 0 )(E.g2. we study for our exam tomorrow, then there is a snow storm and our exam is canceled)
  • Next states can also be random
  • In RL, we are never told if a state-action pair is good or not
  • the only way we can find out is if we “discover” it ourselves by performing the action
  • RL is online. after each episode of tic-tac-toe, we get better
  • compare to decision tree - a tree is made based on entire training set
  • if we want to incorporate new data, start training all over again

Reference:

Artificial Intelligence Reinforcement Learning

Advance AI : Deep-Reinforcement Learning

Cutting-Edge Deep-Reinforcement Learning

Comment  Read more