for Robot Artificial Inteligence

Gaussian 방법을 이용한 Machine learning

|

가우시안이 누구인가

1777-1855에 활동안 독일의 수학자 이다. 3살때부터 아버지가 실수한 수학문제를 풀었다고 한다.(구라같음..) 7살때 등차 수열(arithmetic series)을 마스터 하였다고 한다. 대학에 들어오고 실용적인 수학을 연구를 하면서 많은 field에서 contribution을 하였다.(Fermat polygonal number theorem, Fermat’s Last Theorem, Descartes’s rule of signs, Kepler conjecture, Non-Euclidean geometries ) 가우시안은 완벽주의자이고 하드워커이다. 완벽하게 작업이 끝나지 않은 것들은 퍼블리시를 절대 하지 않았다.

가우시안이 간단하게 누군지 궁금 한 이유는 공부를 하다보면 가우시안 이름이 들어간 공식이 엄청 많다. 도대체 뭐하는 사람인지 궁금하였고, 오늘에서야 조금 이해가 되었다.

Gaussian Process for Machine learning

Gaussian processes are a powerful algorithm for both regression and classification. Their greatest practical advantage is that they can give a reliable estimate of their own uncertainty.

before div into Gaussian Process. let me introduce where it used mostly these-day, in particular, machine learning.

What is machine learning?

Machine learning is using data we have (known as training data) to learn a function that we can use to make predictions about data we don’t have yet. The simplest example of this is linear regression, where we learn the slope and intercept of a line so we can predict the vertical position of points from their horizontal position. This is shown below, the training data are the blue points and the learnt function is the red line.

Machine learning is an extension of linear regression in a few ways. Firstly is that modern ML deals with much more complicated data, instead of learning a function to calculate a single number from another number like in linear regression we might be dealing with different inputs and outputs such as:

  • The price of a house (output) dependent on its location, number of rooms, etc… (inputs)
  • The content of an image (output) based on the intensities and colours of pixels in the image (inputs)
  • The best move (output) based on the state of a board of Go (input)
  • A higher resolution image (output) based on a low resolution image (input)

Secondly, modern ML uses much more powerful methods for extracting patterns of which deep learning is only one of many. Gaussian processes are another of these methods and their primary distinction(특별함) is their relation to uncertainty.

Thinking about uncertainty

Uncertainty can be represented as a set of possible outcomes and their respective(각각의) likelihood — called a probability distribution(http://hyperphysics.phy-astr.gsu.edu/hbase/Math/gaufcn.html)

The world around us is filled with uncertainty — we do not know exactly how long our commute will take or precisely what the weather will be at noon tomorrow. Some uncertainty is due to our lack of knowledge is intrinsic to the world no matter how much knowledge we have. Since we are unable to completely remove uncertainty from the universe we best have a good way of dealing with it. Probability distributions are exactly that and it turns out that these are the key to understanding Gaussian processes.

The most obvious example of a probability distribution is that of the outcome of rolling a fair 6-sided dice i.e. a one in six chance of any particular face.

This is an example of a discrete probability distributions as there are a finite number of possible outcomes. In the discrete case a probability distribution is just a list of possible outcomes and the chance of them occurring. In many real world scenarios a continuous probability distribution is more appropriate as the outcome could be any real number and example of one is explored in the next section. Another key concept that will be useful later is sampling from a probability distribution. This means going from a set of possible outcomes to just one real outcome — rolling the dice in this example.

Bayesian inference(추론)

Bayesian inference might be an intimidating phrase but it boils down to just a method for updating our beliefs about the world based on evidence that we observe.

In Bayesian inference our beliefs about the world are typically represented as probability distributions and Bayes’ rule tells us how to update these probability distributions.

Bayesian statistics provides us the tools to update our beliefs (represented as probability distributions) based on new data

updating uncertainty and update and predict value

Let’s run through an illustrative example of Bayesian inference — we are going to adjust our beliefs about the height of Barack Obama based on some evidence.

Let’s consider that we’ve never heard of Barack Obama (bear with me), or at least we have no idea what his height is. However we do know he’s a male human being resident in the USA. Hence our belief about Obama’s height before seeing any evidence (in Bayesian terms this is our prior belief) should just be the distribution of heights of American males.

Now let’s pretend that Wikipedia doesn’t exist so we can’t just look up Obama’s height and instead observe some evidence in the form of a photo.

Our updated belief (posterior in Bayesian terms) looks something like this.

We can see that Obama is definitely taller than average, coming slightly above several other world leaders, however we can’t be quite sure how tall exactly. The probability distribution shown still reflects the small chance that Obama is average height and everyone else in the photo is unusually short.

슬램을 생각해 보면, 어떤 n-1 state에서 observation이나 motion model로 구축된 prior state에 n state에 여러 observation 데이터와 motion 데이터가 새로운 gaussian distribution을 형상하게 되는데, 이 두 gaussian distribution에서의 maximum liklehood 값을 찾아 n번째 상태를 추정하는 것이다. 그리고 guassian distribtuion을 update 한다.

What is a Gaussian process?

Now that we know how to represent uncertainty over numeric values such as height or the outcome of a dice roll we are ready to learn what a Gaussian process is.

A Gaussian process is a probability distribution over possible functions.

Since Gaussian processes let us describe probability distributions over functions we can use Bayes’ rule to update our distribution of functions by observing training data.

To reinforce this intuition I’ll run through an example of Bayesian inference with Gaussian processes which is exactly analogous to the example in the previous section. Instead of updating our belief about Obama’s height based on photos we’ll update our belief about an unknown function given some samples from that function.

Our prior belief about the the unknown function is visualized above. On the right is the mean and standard deviation of our Gaussian process — we don’t have any knowledge about the function so the best guess for our mean is in the middle of the real numbers i.e. 0.

On the left each line is a sample from the distribution of functions and our lack of knowledge is reflected in the wide range of possible functions and diverse function shapes on display. Sampling from a Gaussian process is like rolling a dice but each time you get a different function, and there are an infinite number of possible functions that could result.

Instead of observing some photos of Obama we will instead observe some outputs of the unknown function at various points. For Gaussian processes our evidence is the training data.

Now that we’ve seen some evidence let’s use Bayes’ rule to update our belief about the function to get the posterior Gaussian process AKA our updated belief about the function we’re trying to fit.

Similarly to the narrowed distribution of possible heights of Obama what you can see is a narrower distribution of functions. The updated Gaussian process is constrained to the possible functions that fit our training data —the mean of our function intercepts all training points and so does every sampled function. We can also see that the standard deviation is higher away from our training data which reflects our lack of knowledge about these areas.

Advantages and disadvantages of GPs

Gaussian processes know what they don’t know

This sounds simple but many, if not most ML methods don’t share this. A key benefit is that the uncertainty of a fitted GP increases away from the training data — this is a direct consequence of GPs roots in probability and Bayesian inference.

Above we can see the classification functions learned by different methods on a simple task of separating blue and red dots. Note that two commonly used and powerful methods maintain high certainty of their predictions far from the training data — this could be linked to the phenomenon of adversarial examples where powerful classifiers give very wrong predictions for strange reasons. This characteristic of Gaussian processes is particularly relevant for identity verification and security critical uses as you want to be completely certain your models output is for a good reason.

Gaussian processes let you incorporate expert knowledge.

When you’re using a GP to model your problem you can shape your prior belief via the choice of kernel.

This lets you shape your fitted function in many different ways. The observant among you may have been wondering how Gaussian processes are ever supposed to generalize beyond their training data given the uncertainty property discussed above. Well the answer is that the generalization properties of GPs rest almost entirely within the choice of kernel.

Gaussian processes are computationally expensive.

Gaussian processes are a non-parametric method. Parametric approaches distill knowledge about the training data into a set of numbers. For linear regression this is just two numbers, the slope and the intercept, whereas other approaches like neural networks may have 10s of millions. This means that after they are trained the cost of making predictions is dependent only on the number of parameters.

However as Gaussian processes are non-parametric (although kernel hyperparameters blur the picture) they need to take into account the whole training data each time they make a prediction. This means not only that the training data has to be kept at inference time but also means that the computational cost of predictions scales (cubically!) with the number of training samples.

The future of Gaussian processes

The world of Gaussian processes will remain exciting for the foreseeable as research is being done to bring their probabilistic benefits to problems currently dominated by deep learning — sparse and minibatch Gaussian processes increase their scalability to large datasets while deep and convolutional Gaussian processes put high-dimensional and image data within reach. Watch this space.

KEY Process for machine learning

  1. classification : two data value(train, valid)-> data input -> obtain mean -> variance -> standard deviation -> trained dataset -> trained gaussian distribution(gaussian model)(labels ex 0,1) -> input data to gaussian model -> get probability -> over certain threshold -> judge point label is 0, 1

  2. Prediction : two data value(train, valid) -> data input -> obtained mean -> variance -> standard deviation -> trained dataset -> trained gaussian distribution -> input data to gaussian model -> get likelihood -> get obtained parameter -> update parameter

  3. my study case : datas(height data on grid map) -> variance(obtained from consideration of sensor height variance) -> mean -> variance -> covariance matrix -> matrix decomposition(eigen value, egien vector) -> ellipes Rotation, ellipes lengh -> submap(ellipes) mean and weight -> center of ellipes -> probabilities(x,y) -> calculate weight by probabilities -> update upper bound and lower bound distribution by weight. -> update mean.

즉 uncertainty로 probability를 구하고, weight를 구하여 uncertainty를 고려한 weight를 mean에다가 업데이트 시켜준다.

정리

Uncertainty를 이용하는 것은 probability distribution을 구하는 것이다. likelihood, probability distribution function.

KNN -> nearlest point -> put current data in gaussian model(built by previous data)

REFERENCE

https://www.mathsisfun.com/data/standard-deviation.html

Comment  Read more

upcasting, downcasting 과 virtual, override 관계

|

상속 관계로 이뤄진 기반 클래스와 파생 클래스는 한 식구가 있다고 한다면 업캐스팅은 파생클래스에서 기반클래스로 캐스팅하는 것이다 즉

class cal
{
protected:
  int a,b,c;
public:
  void init(int new_a, int_b);
}

class add : public call{
public:
  void sum();
}

void main()
{
  add a;
  a.init(3,5);
}

파생클래스에서 기반클래스를 사용하게 되면 자동으로 업캐스팅을 된다.

다만 다운캐스팅은 자동적으로 되지 않기 떄문에 다른 방법을 써야 된다.

void main()
{
  cal* calptr;
  calptr = new add(3,5); // upcasting
  calptr->calcprn();

  add* addptr;
  addptr = (add*)calptr; // downcasting.
  add->calprn();
}

virtual과 override는 거의 같이 쓴다고 보면 된다.

virtual같은 경우 파생클래스와 기반클래스에 똑같은 이름의 함수가 있다면 기반클래스의 함수를 virtual로 픽스를 해주게 된다. 그리고, 파생에 이름이 같은 함수가 있을 경우 overide를 써준다.

즉 virtual은 기반클래스의 함수 구분용, override는 파생클래스 함수로 구분하는데 쓴다.

#include<iostream>
using namespace std;
class cal
{
protected:
  int a;
  int b;
public:
  cal();
  cal(int new_a, int new_b);
  virtual void ptr();
};

cal::cal()
{
  a = 0;
  b = 0;
}

cal::cal(int new_a, new_b)
{
  a = new_a;
  b = new_b;
}

void cal:prn()
{
  cout<<a;
}

class add : public cal
{
protected:
  int c;
public:
  add();
void ptr override ()

};

완전 가상함수 pure virtual function은 함수의 정의 없이 함수의 유형만을 기반 클래스에 제시해 놓는 것이다. 완전 가상함수는 가상함수처럼 멤버함수를 선언할 때 예약어 virtual을 선언문의 맨 앞에 붙이고 함수의 선언문 마지막 부분에 “=0”을 붙인다

virtual 반환형 함수영() = 0;

이렇게 파생 클래스, 기반클래스를 쓰게 된다면 한가집 꼭 해야되는 작업이 있다. 소멸자를 virtual로 해줘야 한다. 그래야 프로그램 종료가 될떄, 파생클래스의 소멸자도 불러오게 된다.

마약 virtual 을 안해주게 된다면

기반클래스 생성자
파생클래스 생성자
기반클래스 파생자

이렇게 콜이 된다. 파생클래스를 소멸하지 않았으므로, 메모리 릭같은 문제가 생길 수 있다.

그렇기 떄문에 기반클래스 파생클래스를 쓸떄는 기반클래스 소멸자에다가 virtual을 넣어서 파생클래스 또한 소멸자를 콜을 할 수 있도록 한다.

기반클래스 생성자
파생클래스 생성자
파생클래스 파생자
기반클래스 파생자

Comment  Read more

this란, const 멤버 함수

|

class에서 함수에서 자기 자신에 있는 멤버함수를 가르키는 것이다.

this 포인터는 묵시적으로 사용이 되지만 함수 내에 있는 변수 이름과 클래스에 내에 있는 멤버 변수 이름이 같을 떄 헷갈림을 최소하 하기 위해 사용된다.

class COMPLEX
{
public:
  COMPLEX(int real, int image);
  ~COMPLEX();
private:
  int real, image;
};
COMPLEX::COMPLEX(int real, int image)
{
  this->real = real;
  this->image = image;
}

const를 멤버함수로도 쓸 수 있다. const 멤버함수로 지정된 함수는 함수 안에 멤버변수를 변환할 수 없다. 보통 프린터 용으로 많이 쓰인다.

void COMPLEX::showComplex() const
{
  cout << " hello "<< image << real << endl;
}

Comment  Read more

지시자, 매크로 상수, inline 사용

|

지시자 같은 경우 프로그램 실행에 앞서 조건에 맞춰서 라이브러리를 선택을 하거나, 상수 정의 및 함수 정의를 하는데 사용이 된다.

#if
#elif
#else
#endif
#ifdef
#ifndef

지시자, 매크로 상수, inline을 사용하는 방법은 프로그램을 좀더 빠르게 하는데 사용이 된다.

매크로 상수는 모듈안에 글로벌적으로 변하지 않는 상수를 지정하는데 사용하여 특정 상수에 대해 가독성을 높인다.

#define

inline 함수는 다른 함수와 다르게 빠르게 콜하는 장점이 있다. 그러나 스택에 쌓이기 때문에 해비하고 긴 코드를 가진 함수보다, 자주 콜되지만, 간단한 계산을 가지는 곳에 쓰이는것이 효율적이다.

inline int p()
{
}
...

다른 방법으로 inline 함수를 하는 방법은 header파일 안에 있는 클래스안에서 함수를 짜면 된다. (따로 header에서 선언만하고 cpp에서 소스코드 작성하지 않고)

Comment  Read more

머신러닝에서의 앙상블

|

https://untitledtblog.tistory.com/156

머신 러닝에서 앙상블은 단어 그대로 여러 단순한 모델을 결합하여 정확한 모델을 만드는 방법이다.

즉 여러 모델들을 if 조건을 거치고 거쳐서 만들어 진다.

Comment  Read more