Multi-Agent path finding(MAPF) & REDUCTION-BASED SOLVERS

27 Sep 2019 | ROBOTICS

REDUCTION-BASED SOLVERS

How to exploit knowledge of others for solving own problems?
- by translating the problem P to another problem Q (문제를 바꾼다)
Why is it useful?
- If anybody improves the solver for Q then we get an improved solver for P for free
- Staying on the shoulders of giants
Reduction, compilation, re-formulation techniques
Boolean satisfiability
fast SAT solvers
Constraint programming
- global constraints for pruning(修剪树枝) search space
Answer set programming
- declarative(陈述的) framework
Combinatorial auctions
a Boolean satisfiability (SAT)

1. Introduction to SAT

Express (model) the problem as a SAT formula in a conjunctive normal form (CNF)

a Boolean satisfiability (SAT)

Boolean variables (true/false values)
clause = a disjunction(分离) of literals (variables and negated(使无效) variables)
formula = a conjunction of clauses
solution = an instantiation of variables such that the formula is satisfied

Example:

(X or Y ) and (not X or not Y)
[exactly one of X and Y is true]

SAT model is expressed as a CNF formulaConjunctive normal form
We can go beyond CNF and use abstract expressions that are translated to CNF
We can even use numerical variables (and constraints).

2. SAT encoding: core idea

In MAPF, we do not know the lengths of plans (due to possible re-visits of nodes)!
We can encode(把…译成电码) plans of a known length using a layered graph (temporally extended graph).
Each layer corresponds to one time slice and indicates positions of agents at that time

3. SAT encoding with all-different

Uses multi-valued state variables (logarithmic encoding) encoding position of agents in layers (∧ = And, ∨ = OR)
Agent waits or moves to a neighbor
No-train constraint
Agents are not at the same nodes

4. Direct SAT encoding

Directly encodes positions of agents in layers
Agent is placed at exactly one node in each layer (¬ 논리 부정)
No two agents are placed at the same node in each layer
Agent waits or moves to a neighbor
No-swap and no-train (nodes before and after move are empty)

5. Comparison of SAT encodings

Finding makespan optimal solutions (composition with Independence Detection (OD+ID))

6. Mixed model

Using layered graph describing agent positions at each time step
- $B_tav$ : agent a occupies vertex(顶点) v at time t
Constraints:
- each agent occupies exactly one vertex at each time
- no two agents occupy the same vertex at any time.
- if agent a occupies vertex v at time t, then a occupies a neighboring vertex or stay at v at time t + 1.
Preprocessing:
- $B_tav$ = 0 if agent a cannot reach vertex v at time t or a cannot reach the destination being at v at time t

import sat

def path (N,As):
  K = len(As)
  lower_upper_bounds(As,LB,UB) # Incremental generation of layers
  between(LB,UB,M),
  B = new_array(M+1, K,N),
  B :: 0..1,

  % Initialize the first and last states(Setting the initial and destination locations)
  foreach(循环) (A in 1..K)
    (V,FV) = As[A]
    B[1,A,V] = 1,
    B[M+1,A,FV] = 1
  end,

  % Each agent occupies exactly one vertex
  foreach(T in 1..M+1, A in 1..K)
    sum([B[T,A,V]] : V in 1..N) #=1
  end,

  % No two agents occupy the same vertex (No Conflict between agents)
  foreach ( T in 1..M+1, V in 1..K)
    sum([B[T,A,V] : A in 1..K]) # =1

  % Every Transition is valid(Agent moves to a neighboring vertex)
  foreach (T in 1..M, A in 1..K, V in 1..N)
    neibs(V,neibs),
    B[T,A,V] # =>
    sum([B[T+1, A, U]: U in neibs]) #>= 1
  end,

  % K-robustness
  foreach(T in 1..M1, A in 1..K, V in 1..N)
  B[T,A,V] # => sum([B[Prev,A2,V]:
                    A2 in 1..K, A2!=A
                    prev in max(1,T-L)..T]) # = 0
  end,

  solve(B)
  output_plan(B)

Picat provides assignment and loop statements for programming everyday things.
a Boolean satisfiability (SAT)
Assembly Sequence Planning (ASP)

7. Objectives in SAT

Makespan (minimize the maximum end time)
- incrementally add layers until a solution found
Sum of cost (minimize the sum of end times)
- incrementally add layers and look for the SOC optimal solution in each iteration (makespan + SOC optimal)
- generate more layers (upper bound) and then optimize SOC (naïve)
- incrementally add layers and increase the cost limit until a solution is found [Surynek et al, ECAI 2016]

8. Real World

Objective
- How to develop robust and scalable AI technology for dealing with complex dynamic application scenarios?
What’s needed?
- a model scenario
- Robotic intra-logistics (물류)
why?
- rich : multi-faceted, full of variations
- scalable : layout, objects, granularity(粒度)
- measurable : makespan, energy, quality of service
- integrative(综合的) : mapf, data, constraints, decisions
- relevant : industry 4.0
Robotics systems for logistics(物流) and warehouse automation based on many mobile robots and movable shelves
Main tasks: order fulfillment(routing(路由), order picking, replenishment(补充))
Many competing industry solutions : Amazon, Dematic, Genzebach, Gray Orange, Swisslog
what’s in the environment?
- Objects : floor, robots, shelves, products, people, etc
- Relations : positions, carries/d, capacity, orientation, durations, etc
- Actions : move, pickup, putdown, pick, charge, restock, etc.
- Objectives : deadlines, throughput, exploitation, energy management, human machine interaction, etc.

9. Beyond MAPF

Classified by objects, measurability, constraints, decisions
- MAPF (Multi-Agent Path Finding)
  - Most simple, straightforwards extension of APF
  - Objects: only robots and the map
  - anonymous: n agents, n targets, any agent can be assigned to any target
  - non-anonymous: n agents, n targets, each agent is assigned a (pre-defined) target
- TAPF (Combined Target Assignment and Path Finding)
  - Proposed in [2]: teams of robots
  - Multiple teams of robots (objects: only robots and the map)
  - Targets assigned to teams (constraint: one robot - one target)
  - Collision free paths for robots to targets (no swapping), with minimal maxspan
- GTAPF (Generalized-TAPF)
  - Proposed in [3], inspired by online store order fulfilling requirements
  - Order #1
    - “Vintage LEGO Kit” and “Programming LEGO”
    - Rush order: 2/1/2019
  - Order #2
    - “Vintage LEGO Kit” and “Dancing with the Stars video”
    - International shipping
  - requirements
    - Group: an order might contain many items
    - Deadline: each order needs to be accomplished before a timestamp
    - Checkpoint: to fulfill certain item, some checkpoint needs to be visited
  - Multiple teams of robots (same as TAPF).
  - Sets of orders (multiple targets for an order, #robots 6= #orders possible)
  - Checkpoints for robots/teams (certain locations must be visited before targets)
  - Deadlines for orders.
  - Group completion (one order at a time).
  - Collision free paths for robots to targets, with minimal maxspan.
  - ASP-based(Answer Set Programming) solutions.
- Others
  - inspired by real-world applications, di↵erent considerations:
    - Continuous vs. discrete movement
    - Online vs. offine
    - Checkpoints not to be (can be) revisited
    - Suboptimal solutions vs. scalability
    - Complex actions: transfers of items/targets between robots when pickup/putdown actions are considered
    - Multi-dimensional G-TAPF: on the ground (two dimensions, cars) vs. in the air (three dimensions, drones)

10. ASPRILO

Standardized benchmark domains
- Concise problem specification
- Domains ranging from MAPF to full order fulfillment
Formal specification
- Formal elaboration(精心制作)
- Correctness, completeness, optimality
Versatile instance generator
- Rich set of customization options
- Leverages(杠杆作用) multi-shot ASP for generation
Visualizer for problems and (candidate) solutions
- Animated playback of plans
- Graphical editor for instances
Solution checker with error feedback
- Specific error descriptions
- Modular design, easily extensible
Reference ASP encodings
- High-level, elaboration-tolerant
- Test bed for ASP and KRR technology

11. Conclusion

A real-world multi-agent application
A very challenging multi-agent planning problem
No clear dominant approach (yet)
- Search-based vs. constraints programming vs. SAT vs..
Execution is bound to differ from the plan (integration…)
Challenge: MAPF with Self-Interested Agents
Incentives and mechanism designs [Bnaya et al. ‘13, Amir ‘15]
What if the other agent is adversarial(对立的)? or even worse, a human?
Challenges: Applying MAPF for Real Problems
- Robotics
  - Kinematic constraints (Ma et al. ‘16)
  - Uncertainty is a first-class citizen
  - Continuous configuration space
  - Any-angle motion [Yakovlav et al. ‘17]
- Traffic management
  - Flow-based approaches
  - No collisions, only traffic jams
  - Scale
Challenge: MAPF as Part of a System
- Task allocation
- Pick up and delivery tasks
- Online settings
Challenge: Relation to General Multi-Agent Planning
- Cross fertilization((무엇을 발전시키기 위해 다른 분야의 생각들을) 받아들이다) seems natural
- MAPF is a special case of MAP
MAP
- Many models, rich literature
- Much work on uncertainty
- Poor scaling
MAPF
- Fewer models, growing literature
- Not much work on uncertainty
- Scales well

Reference: postassco

Comment Read more

27. Computational Complexity(작업 처리 속도)

27 Sep 2019 | C++

복잡도(complexity)

컴퓨터 공학에서 계산속도가 빠르다라는 말은 주관적일 수 밖에 없습니다.
따라서 어떠한 작업의 수행 속도를 나타내기 위해선 수학적으로 나타내야 합니다.
컴퓨터 공학에선 어떠한 작업의 처리 속도를 복잡도(complexity) 라고 부르고, 그 복잡도를 Big O 표기법이라는 것으로 나타냅니다.
이 표기법은, NN 개의 데이터가 주어져 있을 때 그 작업을 수행하기 위해 몇 번의 작업을 필요로 하는지 NN 에 대한 식으로 표현하는 방식입니다.
복잡도가 클수록 작업 수행하는데 걸리는 시간이 늘어난다.
예제 버블 정렬의 코드와 함께 복잡도를 보겠습니다.

for (int i = 0; i < N; i++) {
  for (int j = i + 1; j < N; j++) {
    if (arr[i] > arr[j]) {
      swap(arr, i, j)
    }
  }
}

N 개의 원소가 있는 arr 이라는 배열을 정렬하기 위해서는 일단 적어도:
번의 반복이 필요 합니다. ((N-1+N-2+…+1)) 따라서 Big O 표현법으로 이 정렬이 얼마나 빠르게 수행될 수 있는지 나타낸다면 :
보통 Big O 표현법으로 나타낼 때 최고 차항만을 나타냅니다.(그리고 통상적으로 최고 차항의 계수도 생략합니다.). 왜냐하면 N 이 엄청 커지게 되면 최고 차항 말고는 그닥 의미가 없기 때문입니다.
따라서 버블 정렬 알고리즘의 복잡도는 : O($N^2$)
일반적으로 알고리즘이 O($N^2$) 꼴이면 좋은 편은 아닙니다. 왜냐하면 N 이 10000만 되더라도 $10^8$번의 작업처리를 해야하기 떄문입니다.
다행이도 정렬 알고리즘의 경우 퀵소트(Quicksort)라는 알고리즘을 활용하면 아래와 같은 복잡도로 연산을 처리할 수 있습니다.
물론 퀵소트 알고리즘을 사용했을 때 항상 버블 정렬 방식 보다 빠르게 정렬할 수 있다는 의미는 아닙니다.
왜냐하면 저 항 앞에 어떠한 계수가 붙어있는지 알 수 없기 때문이지요. 만약에 버블 정렬이 O($N^2$)이고 퀵소트가 O(100000NlogN)이였다면 N이 1000일 떄 버블 정렬이 더 빠르게 수행 됩니다.(물론 이렇게 극단적이지 않습니다. 퀵소트가 거의 대부분 더 빠르게 됩니다)
하지만, NN 이 정말 커진다면 언젠가는 퀵소트가 버블 정렬보다 더 빨리 수행되는 때가 발생합니다.
아래와 차트를 보면 각각의 O에 대해 복잡도가 어떻게 증가하는지 볼 수 있습니다.
가장 이상적인 복잡도는 O(1)O(1) 이지만 이는 거의 불가능하고 (이는 마치 전체 데이터를 채 보지 않은 채 작업을 끝낼 수 있다는 의미 입니다), 보통 O(logn)O(logn) 이 알고리즘이 낼 수 있는 가장 빠른 속도를 의미합니다. 그 다음으로 좋은 것이 당연히 O(n)O(n) 이고, O(nlogn)O(nlogn) 순 입니다.

REFERENCE

모두의코드

Comment Read more

Multi-Agent path finding(MAPF) & Search algorithm

26 Sep 2019 | ROBOTICS

Multi-Agent path finding(MAPF)

Multi-Agent(many robot) play to find the target position and move
MAPF problem: Find a collision-free plan(path) for each agent
Another name we called it : Cooperative path finding(CPF), Multi-robot path planning, pebble(鹅卵石) motion

1) Introduction to MAPF

a graph (directed or undirected)
a set of agents(robot), each agent is assigned to two locations(nodes) in the graph(start, destination)
Each agent can perform either move(to a neighbouring node) or wait(in the same node) actions

Typical Assumption: all move and wait actions have identical durations(plans for agents are synchronized)

Plan is a sequence of actions for the agent leading from its start location to its destination
- the length of plan(for an agent) is defined by the time when the agent reaches its destination and does not leave it anymore.
Find Plans for all agents such that the plans do not collide in time and space (no two agents are at the same location at the same time)
some necessary conditions for plan existence:
- no two agents are at the same start node
- no two agents share the same destination node(unless an agent disappears when reaching its destination)
- the number of agents is strictly smaller than the number of nodes
agent at $v_i$ cannot perform move $v_j$ at the same time when agent at $v_j$ perform move $v_i$
Agents may swap position. but agents use the same edges at the same time, swap is not allowed
Agent can approach node that is currently occupied but will be free before arrival(Agent at $v_i$ cannot perform move $v_j$ if there is another agent at $v_j$)
if any agent is delayed then trains may cause collision during execution
to prevent such collisions we may introduce more space between agents

K-robustness

An agent can visit a node, if that node has not been occupied in recent k steps.
1-robustness covers both no-swap and no-train constraints (1번밖에 못 거치는것)

Other constraints

No Plan(path) has a cycle
No Two plans(paths) visit the same location
waiting is not allowed
some specific locations must be visited

How to Measure Quality of plans?

There are two typical criteria(to minimize):

Makespan
- distance between the start time of the first agent and the complete time of the last agent(아무개 에이전트의 시작과 아무개 에이전트의 마지막 동작)
- maximum of lengths of plans(end times)
Sum of costs(SOC)
- Sum of lengths of plans(end times)
Optimal single agent path finding is tractable(易处理的)
- e.g. Dijkstra’s algorithm
Sub-optimal Multi-Agent path finding(with two free unoccupied nodes) is tractable
- e.g algorithm Push and Rotate
MARF, where agents have joint goal nodes(it does not matter which agent reaches which goal) is tractable
- reduction to min-cost flow problem
Optimal(Makespan,SOC) multi-agent path finding is NP-HARD

Solving approaches

Search-Based Techniques
- state-space search(A*)
  - state = location of agents at nodes
  - transition = performing one action for each agent
- Conflict-based search
** Reduction-based Techniques
- Translate the problem to another formalism(形式主义)
  - SAT/CSP/ASP …

2. Search-Based Solvers

Search is a General Problem solving technique
To expand or not expand, this is the question
Why do we need SEARCH for MAPF? Because it is Finding an optimal solution to hundreds of agents
this is classical application of Search
Solving Multi-Agent Path Finding with Search
From A* to prioritized planners
From prioritized(按重要性排列) planners back to A*
The Increasing Cost tree search(ICTS)
The Conflict-Based Search(CBS) framework
Approximately optimal search-based solvers
Single Agent Search Problem Properties
- Number of state = ? (it will be 4 )
- Branching factor = ? (4)
Two Agent Search Problem Properties
- Number of States = ? ($20^2$)
- Branching factor = ? ($5^2$)
then, What about K Agents?
- Number of States = ($20^k$)
- Branching factor = ($5^k$)
IT IS VERY HARD PROBLEM
SO Key Idea: Plan for each agent seperately
Challenge: Maintaining soundness, completeness, and optimality

Prioritized Planning (Silver 2005) Analysis: First Agent

Prioritized Planning (Silver 2005) Analysis: Second Agent

Complexity?
- Polynomial in the grid size and max time
Soundness?
- Yes
Complete? Optimal?
- No
Smart Agent Prioritization
- conflict oriented WHCA*[Bnaya and Felner ‘14]
- Re-prioritization and safe intervals [Andreychuk and Yakovlev ‘18]
Integrate planning and execution
- Windowed Hierarchical CA* [Silver ‘06]
High-level idea : reservation-based planning (ex) Number of states = 4 x 5 x maxTime)
FAST, requires almost no coordination
- but incomplete and not optimal
  Search based solver overview
Can MARF algorithm be Complete and efficient?

MAPF as a puzzle

MAPF is highly related to pebble motion problems
- Each agent is a pebble(조약돌)
- Need to move each pebble to its goal
- cannot put two pebbles in one hole
Pebble Motion can be solved Polynomial
- but far from optimally
- complex formulation
Similar approaches
- slidable(滑动) Multi-Agent Path Planning[Wang & Botea, IJCAI, 2009]
- Push and Swap [Luna & Bekris, IJCAI, 2011]
  - Parallel push and swap [Sajid, Luna, and Bekris, SoCS 2012]
  - Push and Rotate [de Wilde et al. AAMAS 2013]
- Tree-based agent swapping strategy [Khorshid at el. SOCS, 2011]

Examples

Search based solver Summary

Can a MAPF algorithm be complete and efficient and optimal?

On the Complexity of Optimal Parallel Cooperative Path-Finding, Surynek 2015 Planning Optimal Paths for Multiple Robots on Graphs, Yu and LaValle, 2013

From Tiles to Agents

Can we adpat technique from these extreme cases? answer is YES(invent some new techniques also(optimal MAPF))

Searching the k-agent search space
- A*+OD+ID [Standley ‘10]
- EPEA* [Felner ‘X, Goldenberg ‘Y]
- M* [Wagner & Choset ‘Z]
Other search-based approaches
- ICTS [Sharon et al ‘13]
- CBS [Sharon et al ’15]

Optimal MARF with A*

A* expands nodes
A* gain efficiency by choosing which node to expand

What is the complexity of expanding a single node in MARF with 20 agents?

$5^20$ = 95,367,431,640,625

Search Tree Growth with Operator Decomposition

Pros
- Branching factor is reduced to 5(= single agent)
- with a perfect heuristic can solve the problem
Cons
- Solution is deeper by a factor of k
- More nodes may be expanded, due to intermediates

Enhanced Partial Expansion A*

Independence Detection

Simple Independence Detection
- Solve optimally each agent separately
- while some agents conflict
  - 1. Merge conflicting agents to one group
  - 1. solve optimally new group
Independence Detection
- Solve optimally each agent separately
- While some agents conflict
  - 1. Try to avoid conflict, with the same cost
  - 1. Merge conflicting agents to one group
  - 1. Solve optimally new group
but what if we have an environment like that

M* (Wagner & Choset ‘11,’14)

Find optimal path for each agent individually
Start the search. Generate only nodes on optimal paths
If conflict occurs – backtrack and consider all ignored actions

Recursive M* (Wagner & Choset ‘11,’14)

Recursive M*
- Find optimal path for each agent individually
- Start the search. Generate only nodes on optimal paths
- If conflict occurs – backtrack and consider all ignored actions
  - Apply M* recursively after backtracking
but problem is :
joint path up to bottleneck can be long

Other Search-based approaches(ICTS,CBS)

Increasing Cost Tree Search (Sharon et al. ‘12)
- Does it work? - Not always
Conflict Based Search(CBS)
- CBS only performs single agents(But, many times, and under different constraints)
- Conflict: agent 1 and agent 2, plan to occupy C at time 2
- Constrain: agent 1, avoid C at time 2 or agent 2 to avoid C at time 2
- The Constraint Tree
  - Nodes:
- A set of individual constraints for each agent
- a set of paths consistent with the constraints - Goal Test:
- Are the paths conflict free

Analysis: *Example 1**
- How many states A* will expand?
- How many states CBS will?
- Motivation: case with bottlenecks:
- A*
  - g+h=6: All $m^2$ combinations of ($A_i$,$B_j$) will be expanded
  - f=7 : 3 states are expanded
  - A* $m^2$ = O($m^2$)
  - CBS : 2m+14 = O(m) states
- When m > 4 CBS will examine fewer states than A*
Analysis: *Example 2**
- States expanded by CBS?
- States expanded by A*?
- 4 optimal solutions for each agent
- Each pair of solutions has a conflict
- Rough analysis:
  - CBS: exponential in #conflicts = 54 states
  - A*: exponential in #agents = 8 states
Trends observed
- In open spaces: use A*
- In bottlenecks: use CBS
What if we have both?

Meta-Agent CBS (MA-CBS)

Plan for each agent individually
Validate(确证) plans

If the plans of agents A and B conflict

If (should merge(A,B))
merge A and B into a meta-agent
and solve with A*
Else

Constrain A to avoid the conflicts or Constrain B to avoid the conflict
Should Merge(A,B) (simple rule):
- Merge when observed more than T conflicts between A,B

CBS Enhancements

Which conflict to resolve? [Boyarski et al. ‘16]
What to do after merging? [Boyarski et al. ‘16]
Heuristics for the constraint tree search [Felner et al. ‘18]
Augmenting CBS with human knowledge [Cohen et al.]
Which low-level solver to use?
When to merge the agents ?

Summary

A* (M, EPEA, A*+OD+ID)
- Main factors: #agents, graph size, heuristic accuracy
ICTS
- Main factors: #agents, Δ, graph size
CBS and its variants
- Main factors: #conflicts

Solving MAPF

Bounded Suboptimal Algorithms

An algorithm is bounded suboptimal iff
- It accepts a parameter !
- It outputs a solution whose cost is at most 1 + ! ⋅Optimal
How to create a bounded suboptimal algorithm?
- Different search algorithms
- Inadmissible(不允许的) heuristics

Suboptimal ICTS

Suboptimal A*

Suboptimal M*

Suboptimal CBS

Observation: Suboptimality can be introduced in both levels
- ECBS (Barer et al. ‘14)
- ECBS+Highways (Cohen et al. ’15, ‘16)

Advanced Issues in Search-based MAPF Algorithms

When to use which algorithm? Ensembles(全体)?
Using knowledge about past plans (Cohen et al. ’15)
Stronger heuristics for all algorithms
Deeper analysis of algorithms’ complexity
Beyond grid worlds
- Kinematic constraints (Ma et al. ‘16)
- Any-angle planning (Yakovlev et al. ‘17)
- Hierarchical environments (Walker et al. ’17)
- Large agents (Li et al. ‘19)
Priorized planning based on CBS (Ma et al. ‘19)
Planning & execution

Reference: postassco

Comment Read more

26. File

26 Sep 2019 | C++

Parsing text in files

parsing means 다이아몬드가 많이 나오는 위치로 이동을 일단 한 후에 돌을 많이 캔다음에 다이아몬드만 쏙쏙 뽑아서 보석으로 가공하는 과정

#include <iostream>
#include <fstream>
#include <sstream>

//sstream is <string stream>
using namespace std;

int main() {

	string filename = "stats.txt";
	ifstream input;

	input.open(filename);

	if(!input.is_open()) {
		return 1;
	}

	while(input) {
		string line;

		getline(input, line, ':');
		// to create singe code by ':'
		// indivisual data/
		// so "line" variable get before ':' value.
		//population is the after ':' variable.

		int population;
		input >> population; // to give input int value to population

        input.get(); // this is for white space, whatever white space they read, but in c++ 11, use we can read white space.
		input >> ws;
		// ws is that Extracts and dsicards characters from the stream as long as the
		// next available character is a whitespace or unitil there are no more characters available.
		// white space is a term that refer to characters that are used for formatting purpose.
		// In C++, this refres primarily to spaces, tabs and newlines.
		// the C++ compiler generally ignores whitespace.
		if(!input) {
			break;
		}

		cout << "'" << line << "'" << " -- '" << population << "'" << endl;
	}

	input.close();

	return 0;
}

Result

Reading and writing binary files

Class 나 structor 에서 , instance가 list 로 되어있으면 &(reference)를 사용하여 address를 가르쳐 줘야한다.
또한 받는 밸류 값에 대한 instance가 list 형 식이면 받는 형식은 포인터를 지정하여 address 주소에 대한 밸류를 가르킨다.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

struct Data {
	char name[100];
	int age;
	double height;
};

int main() {

	string filename = "data.bin";
	// now we can open
	Data data = { "Pixie", 120, 0.8 };

	/*
	 data.name = "Pixie";
	 data.age = 120;
	 data.height = 0.8;
	 */



/* write binary file*/

	ofstream output;

	output.open(filename);
	// bitwise , bit operator

	if (!output.is_open()) {
		cout << "Could not create " << filename << endl;
	}

	output.write((char *) &data, sizeof(data)); // data을 저장한다.
	// pointing some data.
	// &data <- becuase the data is  in list, so need to tell them data address
	// and (char *) <- it is pointer to data address for real value.
	// sizeof() <- in here the maens how many data we will recieve.


	if (!output) {
		cout << "Could not write data to file " << filename << endl;
	}

	output.close();


/* Read binary file*/


	ifstream input;

	input.open(filename);

	if (!input.is_open()) {
		cout << "Could not read " << filename << endl;
	}

	Data inputData;
	// 읽어드릴데이터의 함수

	input.read((char *) &inputData, sizeof(data)); // input data에 데이터 값을 콜해준다.

	if (!input) {
		cout << "Could not read data from file " << filename << endl;
	}

	input.close();

	cout << inputData.name << ": " << inputData.age << ": " << inputData.height << endl;

	return 0;
}

Reading Text file

#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main() {

	string inFileName = "test.txt";

	ifstream inFile;
	//ifsttream is the ios::in like only read the file.


	inFile.open(inFileName);

	if (inFile.is_open()) {

		string line;

		while (inFile) {
			getline(inFile, line);
			cout << line << endl;
		}
		// getline(infile,line) means that get line from infile file one line by one line
		inFile.close();
	} else {
		cout << "Cannot open file: " << inFileName << endl;
	}

	return 0;
}

Binary files

pragma pack(push,1)
변수 선언의 순서가 바껴도 구조체의 크기를 그대로로 하는 것
pragma pack(1)에서 1은 1바이트 단위로 저장하겠다는 것.
예를 들어 Person, Person1 구조체가 같아 내용은 같으나 크기가 다르다.
그렇다면 왜 모두 1 바이트로 해서 메모리의 낭비가 없도록 하지 않는 걸까.
32비트 cpu에서는 4(32비트)의 단위로 데이터를 처리하는 것이 가장 빠르기 때문이다.
즉 #pragma pack(1)이라고 선언해놓고 원래대로 돌려놓지 않는다면 속도저하의 문제가 생길 수 있다.
따라서 위의 소스에서 구조체 선언이 끝나는 부분에 #pragma pack(4)라고 선언해 줘야 할 것이다.
하지만, 여기에도 문제가 있다. 만약 이 소스를 32비트의 PC가 아닌 다른 CPU가 장착된 장비에서 컴파일하게 된다면 어떻게 될 것인가.
그런 문제를 해결해주기 위해 C++ 기술인 #pragma pack(push, 1)
#pragma pack(pop)
기존의 바이트를 스택에 push（padding 충전제) 하고 1바이트 단위로 처리한 다음 끝나는 부분에 원래 바이트 단위를 pop해주는 코드이다 .

#include <iostream>
#include <fstream>
using namespace std;
// #<- call the library function.
#pragma pack(push, 1)
// 변수 선언의 순서가 바껴도 구조체의 크기를 그대로로 하는 것
// pragma pack(1)에서 1은 1바이트 단위로 저장하겠다는 것.
// 예를 들어 Person, Person1  구조체가 같아 내용은 같으나 크기가 다르다.
// 그렇다면 왜 모두 1 바이트로 해서 메모리의 낭비가 없도록 하지 않는 걸까.
// 32비트 cpu에서는 4(32비트)의 단위로 데이터를 처리하는 것이 가장 빠르기 때문이다.
// 즉 #pragma pack(1)이라고 선언해놓고 원래대로 돌려놓지 않는다면 속도저하의 문제가 생길 수 있다.
// 따라서 위의 소스에서 구조체 선언이 끝나는 부분에 #pragma pack(4)라고 선언해 줘야 할 것이다.
// 하지만, 여기에도 문제가 있다. 만약 이 소스를 32비트의 PC가 아닌 다른 CPU가 장착된 장비에서 컴파일하게 된다면 어떻게 될 것인가.
// 그런 문제를 해결해주기 위해 C++ 기술인 #pragma pack(push, 1)
// #pragma pack(pop)
// 기존의 바이트를 스택에 push（padding 충전제) 하고 1바이트 단위로 처리한 다음 끝나는 부분에 원래 바이트 단위를 pop해주는 코드이다 .
// data pack(padding 쌓아 놓기 alignment)
struct Person {
	char name[50];
	int age;
	double weight;
};

#pragma pack(pop)

int main() {

	cout << sizeof(Person) << endl;




	return 0;
}

writing Text file

#include <iostream>
#include <fstream>
#include <string>

//fstream is the file stream
//we can read and write

using namespace std;

int main() {

	// ofstream outFile;
	// this is one kind of convenience class
	// and it sort of means the same ofstream means that outputstream
    string filename = "text.txt";
	fstream file;

	//string outputFileName = "text.txt";

    file.open(filename, ios::out | ios::binary);
    //outFile.open(outputFileName, ios::out);
    //ios::out means that writing.
	 // so if we use fstream, instead of , we need to specify ios::out
// we can use both of them
	if(file.is_open()) {
		file << "Hello there" << endl;
		file << 123 << endl;
		file.close();
	}
	else {
		cout << "Could not create file: " << "text.txt" << endl;
	}

	return 0;
}

Result

Comment Read more

8. Tic-Tac-Toe

25 Sep 2019 | Reinforcement Learning

1. Outline

play_game(p1,p2,env)
USE OOP approach

2. representing States

last post, I told you this would be an O(1) lookup (as in an array )
Possible solution: Convert the board to tuple of tuples, use as dictionary Key
Better: Map Each state to a number, use numpy array

3. Mapping state to a number

there are 3 state, “O”, “X”, “Empty”
9 array so all possible state is 3^9 = 19683
overhead not a problem, these states are unreachable(e.g. 3x’s in a row and 3o’s in a row on the same board)
Should remind us of binary numbers(2 Possible symbols in each location vs 3)
binary to decimal(小数):
For us:

In code

k=0
h=0
for i in xrange(Length):
  for j in xrange(Length):
    if self.board[i,j] == 0:
      v = 0
    elif self.board[i,j] == self.x:
      v = 1
    elif self.board[i,j] == self.o:
      v = 2
    h + = (3**k)*v
    k + = 1

4. Initializing the value function

Recall, we initialize V(s) as :
- 1 if s == winning terminal state
- 0 if s == lose or draw terminal state
- 0.5 otherwise
how do we find all the “s”?

5. Enumerating state

lets think which is better ? 1) Create a “game tree “ of all possible in the game? 2) Create a permutation(排列) of all possible settings of all possible positions on the board?

Game Tree

Problem : redundant states
i.e. see the same state more than once in the tree

Start: 9 choices then: 8 choices then: 7 choices ….

9! = 362880 » 3^9

permutation(순열)

How? think binary first:
Generate permutation of Length N:

0 + Generate permutation of length N -1
1 + Generate Permutation of Length N - 1

 def generate_all_binary_numbers(N):
   results = {}
   child_results = generate_all_binary_numbers(N-1)
   for prefix in ('0', '1'):
     for suffix in child_results:
       new_result = prefix + suffix
       returns.append(new_result)
   return results

(base case not shown for simplicity)

if don’t get about this please check in here

Enumerating States Recursively

that is all combinations of inputs an programs that halt
Function: get_state_hash_and_winner
Returns : List of triples(state, winner, ended)
state : configuration of the board as a hashed int
Winner is None if ended is False (즉 승자가 나올떄 까지 계속 돈다)
inputs : env, i, j

max_length=0
while(true):
  max_length = max_length +1
  for each program "P" of length <= max_length: # P is finite
    for each input "I" of length <= max_length:
      if "env" halts on 'I' after max_length steps:
        print P,I

Summary

In order to initialize V(s) to our desired values, we must enumerate all the states
we can either search the game tree(9!) or find board permutations($3^9$)
9! is super-exponential
Enumerating game states Recursively
initializing V(s) is different for X and O

6. The Environment Class(Code)

Method: __init__() : #initialize Important instance vars
is_empthy(i,j): # returns true if(i,j) is empty
reward(symbol)
get_state()
game_over()
draw_board()

Constructor

class Environment:
  def __init__(self):
    self.board = np.zeros((length, length))
    self.x = -1 # represents an x on the board, player 1
    self.o = 1 # represents an o on the board player 2
    self.winner = None
    self.ended = False
    self.num_states = 3**(length*length)

is_empthy

def is_empthy(self, i, j):
  return self.board[i,j] == 0

reward

def reward(self, sym):
  # no reward until game is over
  if not self.game_over():
    return 0
  # if we get here, game is over
  # sym will be self.x or self.o
  return 1 if self.winner == sym else 0

get_state

def get_state(self):
  # returns the current state, represented as an int
  # from 0...|S|-1, where S = set of all possible states
  # |S| = 3^(BOARD SIZE), since each cell can have 3 possible values - empty, x, o
  # some states are not possible, e.g. all cells are x, but we ignore that detail
  # this is like finding the integer represented by a base-3 number
  k = 0
  h = 0
  for i in range(LENGTH):
    for j in range(LENGTH):
      if self.board[i,j] == 0:
        v = 0
      elif self.board[i,j] == self.x:
        v = 1
      elif self.board[i,j] == self.o:
        v = 2
      h += (3**k) * v # Value Funtion
      k += 1
  return h

game_over

def game_over(self, force_recalculate==False):
  if not force_recalculate and self.ended:
    return self.ended

    # check rows
    for i in range(LENGTH):
      for player in (self.x, self.o):
        if self.board[i].sum() == player*LENGTH:
          self.winner = player
          self.ended = True
          return True

    # check columns
    for j in range(LENGTH):
      for player in (self.x, self.o):
        if self.board[:,j].sum() == player*LENGTH:
          self.winner = player
          self.ended = True
          return True

    # check diagonals
    for player in (self.x, self.o):
      # top-left -> bottom-right diagonal
      if self.board.trace() == player*LENGTH:
        self.winner = player
        self.ended = True
        return True
      # top-right -> bottom-left diagonal
      if np.fliplr(self.board).trace() == player*LENGTH:
        self.winner = player
        self.ended = True
        return True

    # check if draw
    if np.all((self.board == 0) == False):
      # winner stays None
      self.winner = None
      self.ended = True
      return True

    # game is not over
    self.winner = None
    return False

draw_board

def draw_board(self):
  for i in range(LENGTH):
    print("-------------")
    for j in range(LENGTH):
      print("  ", end="")
      if self.board[i,j] == self.x:
        print("x ", end="")
      elif self.board[i,j] == self.o:
        print("o ", end="")
      else:
        print("  ", end="")
    print("")
  print("-------------")

7. The Agent class

Contains the AI
different from supervised/unsupervised ML, because we don’t just feed it data
it has to interact with the Environment
one-line update for V(s) is a very small part of the agent, yet responsible for 100% of its intelligence

The Agent Class

class Agent:
  def __init__(self, eps=0.1, alpha=0.5):
    self.eps = eps # probability of choosing random action instead of greedy
    self.alpha = alpha # learning rate
    self.verbose = False
    self.state_history = []

  def setV(self, V): # SetV is the allow us to initialize the value function
    self.V = V

  def set_symbol(self, sym): # symbol is allow us to give this agent a symbol that it'll use to play on the board
    self.sym = sym

  def set_verbose(self, v):
    # if true, will print values for each position on the board
    self.verbose = v

  def reset_history(self):
    self.state_history = []

Take_action

def take_action(self, env):
  # choose an action based on epsilon-greedy strategy
  r = np.random.rand()
  best_state = None
  if r < self.eps:
    # take a random action
    if self.verbose:
      print("Taking a random action")

    possible_moves = []
    for i in range(LENGTH):
      for j in range(LENGTH):
        if env.is_empty(i, j):
          possible_moves.append((i, j))
    idx = np.random.choice(len(possible_moves))
    next_move = possible_moves[idx]
  else:
    # choose the best action based on current values of states
    # loop through all possible moves, get their values
    # keep track of the best value
    pos2value = {} # for debugging
    next_move = None
    best_value = -1
    for i in range(LENGTH):
      for j in range(LENGTH):
        if env.is_empty(i, j):
          # what is the state if we made this move?
          env.board[i,j] = self.sym
          state = env.get_state()
          env.board[i,j] = 0 # don't forget to change it back!
          pos2value[(i,j)] = self.V[state]
          if self.V[state] > best_value:
            best_value = self.V[state]
            best_state = state
            next_move = (i, j)

    # if verbose, draw the board w/ the values
    if self.verbose:
      print("Taking a greedy action")
      for i in range(LENGTH):
        print("------------------")
        for j in range(LENGTH):
          if env.is_empty(i, j):
            # print the value
            print(" %.2f|" % pos2value[(i,j)], end="")
          else:
            print("  ", end="")
            if env.board[i,j] == env.x:
              print("x  |", end="")
            elif env.board[i,j] == env.o:
              print("o  |", end="")
            else:
              print("   |", end="")
        print("")
      print("------------------")

  # make the move
  env.board[next_move[0], next_move[1]] = self.sym

update_state_history

def update_state_history(self, s):
  # cannot put this in take_action, because take_action only happens
  # once every other iteration for each player
  # state history needs to be updated every iteration
  # s = env.get_state() # don't want to do this twice so pass it in
  self.state_history.append(s)

update

def update(self, env):
  # we want to BACKTRACK over the states, so that:
  # V(prev_state) = V(prev_state) + alpha*(V(next_state) - V(prev_state))
  # where V(next_state) = reward if it's the most current state
  #
  # NOTE: we ONLY do this at the end of an episode
  # not so for all the algorithms we will study
  reward = env.reward(self.sym)
  target = reward
  for prev in reversed(self.state_history):
    value = self.V[prev] + self.alpha*(target - self.V[prev])
    self.V[prev] = value
    target = value
  self.reset_history()

8. The Human Class(Code)

Human class

class Human:
  def __init__(self):
    pass

  def set_symbol(self, sym):
    self.sym = sym

  def take_action(self, env):
    while True:
      # break if we make a legal move
      move = input("Enter coordinates i,j for your next move (i,j=0..2): ")
      i, j = move.split(',')
      i = int(i)
      j = int(j)
      if env.is_empty(i, j):
        env.board[i,j] = self.sym
        break

  def update(self, env):
    pass

  def update_state_history(self, s):
    pass


# recursive function that will return all
# possible states (as ints) and who the corresponding winner is for those states (if any)
# (i, j) refers to the next cell on the board to permute (we need to try -1, 0, 1)
# impossible games are ignored, i.e. 3x's and 3o's in a row simultaneously
# since that will never happen in a real game

play games

def play_game(p1, p2, env, draw=False):
  # loops until the game is over
  current_player = None
  while not env.game_over():
    # alternate between players
    # p1 always starts first
    if current_player == p1:
      current_player = p2
    else:
      current_player = p1

    # draw the board before the user who wants to see it makes a move
    if draw:
      if draw == 1 and current_player == p1:
        env.draw_board()
      if draw == 2 and current_player == p2:
        env.draw_board()

    # current player makes a move
    current_player.take_action(env)

    # update state histories
    state = env.get_state()
    p1.update_state_history(state)
    p2.update_state_history(state)

  if draw:
    env.draw_board()

  # do the value function update
  p1.update(env)
  p2.update(env)

Main

if __name__ == '__main__':
  # train the agent
  p1 = Agent()
  p2 = Agent()

  # set initial V for p1 and p2
  env = Environment()
  state_winner_triples = get_state_hash_and_winner(env)


  Vx = initialV_x(env, state_winner_triples)
  p1.setV(Vx)
  Vo = initialV_o(env, state_winner_triples)
  p2.setV(Vo)

  # give each player their symbol
  p1.set_symbol(env.x)
  p2.set_symbol(env.o)

  T = 10000 #train
  for t in range(T):
    if t % 200 == 0:
      print(t)
    play_game(p1, p2, Environment())

  # play human vs. agent
  # do you think the agent learned to play the game well?
  # human Verification
  human = Human()
  human.set_symbol(env.o)
  while True:
    p1.set_verbose(True)
    play_game(p1, human, Environment(), draw=2)
    # I made the agent player 1 because I wanted to see if it would
    # select the center as its starting move. If you want the agent
    # to go second you can switch the human and AI.
    answer = input("Play again? [Y/n]: ")
    if answer and answer.lower()[0] == 'n':
      break

9. Summary

Would not be able to increase intelligence by learning from experience
would not be generalizable to other Environments
Components of a RL system (Agent, Environment, state, action, reward)
Episodes, terminal states
Choosing rewards carefully
credit assignment(움직임에 따라 승리의 확율을 나타낸다)
Delayed Reward
Planning intelligently to maximize overall rewards
Value function
V(s) <- V(s) + α(V(s’)- V(s))

RL vs SL

why not use a supervised learning model that maps states -> actions directly
i.e model y = f(x) where x = state, y= action
how would we come up with labels?
very easy for state space to grow too large to enumerate
connect-4/ tic-tac-toe is probably the limit
imagine: HD graphics 60 FPS video game - we can’t even conceptualize how large the state space
thus, how could we create a label for every state?
Rewards are delayed(데이터가 없는 상태에서 액션을 즉시 선택하는 것은 올바르지 않기 때문에, 데이터를 쌓은 상태에서 액션을 취하려 Reward Delayed를 한다)
but in Rl value function has the “future” built into it
if doing SL, and somehow manage to get around a humongous(巨大的) state-space, how would we choose the right actions?
instead of letting the value function take care of measuring the future, we need to somehow keep track of what’s best for all future possibilities
Rewards can have component of randomness. (E.g same state gives reward of 1 or 0 )(E.g2. we study for our exam tomorrow, then there is a snow storm and our exam is canceled)
Next states can also be random
In RL, we are never told if a state-action pair is good or not
the only way we can find out is if we “discover” it ourselves by performing the action
RL is online. after each episode of tic-tac-toe, we get better
compare to decision tree - a tree is made based on entire training set
if we want to incorporate new data, start training all over again

Reference:

Artificial Intelligence Reinforcement Learning

Advance AI : Deep-Reinforcement Learning

Cutting-Edge Deep-Reinforcement Learning

Comment Read more

Older Newer

for Robot Artificial Inteligence

Multi-Agent path finding(MAPF) & REDUCTION-BASED SOLVERS

REDUCTION-BASED SOLVERS

1. Introduction to SAT

2. SAT encoding: core idea

3. SAT encoding with all-different

4. Direct SAT encoding

5. Comparison of SAT encodings

6. Mixed model

7. Objectives in SAT

8. Real World

9. Beyond MAPF

10. ASPRILO

11. Conclusion

27. Computational Complexity(작업 처리 속도)

복잡도(complexity)

Multi-Agent path finding(MAPF) & Search algorithm

Multi-Agent path finding(MAPF)

1) Introduction to MAPF

K-robustness

Other constraints

How to Measure Quality of plans?

Solving approaches

2. Search-Based Solvers

Prioritized Planning (Silver 2005) Analysis: First Agent

Prioritized Planning (Silver 2005) Analysis: Second Agent

Search based solver overview

MAPF as a puzzle

Examples

Search based solver Summary

Can a MAPF algorithm be complete and efficient and optimal?

From Tiles to Agents

Optimal MARF with A*

Search Tree Growth with Operator Decomposition

Enhanced Partial Expansion A*

Independence Detection

M* (Wagner & Choset ‘11,’14)

Recursive M* (Wagner & Choset ‘11,’14)

Other Search-based approaches(ICTS,CBS)

Meta-Agent CBS (MA-CBS)

CBS Enhancements

Summary

Solving MAPF

Bounded Suboptimal Algorithms

Suboptimal ICTS

Suboptimal A*

Suboptimal M*

Suboptimal CBS

Advanced Issues in Search-based MAPF Algorithms

26. File

Parsing text in files

Reading and writing binary files

Reading Text file

Binary files

writing Text file

8. Tic-Tac-Toe

1. Outline

2. representing States

3. Mapping state to a number

4. Initializing the value function

5. Enumerating state

Game Tree

permutation(순열)

Enumerating States Recursively

Summary

6. The Environment Class(Code)

7. The Agent class

8. The Human Class(Code)

9. Summary

RL vs SL