for Robot Artificial Inteligence

2. Optimization application

|

Optimization in Signal and Image processing

Model arising in compressive sensing and imaging sciences

  • Separable(可分开的) Structures of problems

  • Convexity and easy subproblems(子问题).
  • Often involving large, nonsmooth convex functions with separable structure.
  • First-order method can be very slow for producing high accuracy solutions, but also share many advantages:
    • Non-differentiability
    • Use minimal information, e.g.,(f;∇f)
    • Often lead to very simple and “cheap” iterative schemes
    • Complexity/iteration mildly dependent (e.g., linear) in problem’s dimension
    • Suitable when high accuracy is not crucial [in many large scale applications, the data is anyway corrupted or known only roughly

Example: compressive sensing(Kernel 같은 개념)

  • Goal: Recover sparse signal from very few linear measurements.
  • Basis pursuit model:

Compressive Sensing

  • Find the sparest solution
    • Given n=256, m=128.
      • n is original signal
      • m is output value
    • A = randn(m,n); u = sprandn(n, 1, 0.1); b = A*u;

Sparse((흔히 넓은 지역에 분포된 정도가) 드문, (밀도가) 희박한) recovery models

  • Basis pursuit model : nonunique solution
  • Ax = b = {$x^*$+N(A)} => ∀ v ∈ N(A), Av = 0
    • N(A) is non-space
  • μ is given parameter(constant variable)

Sparsity((흔히 넓은 지역에 분포된 정도가) 드문, (밀도가) 희박한) under a basis

  • Given a reprenting basis or dictionary, we want to recover a signal which is sparse under this basis.
  • Two types of models:

  • Commonly used dictionaries include both analytic and trained ones.
    • analytic bases: Id, DCT, wavelets, curvelets, gabor, etc., also their combinations; they have analytic properties, often easy to compute (for example, multiplying a vector takes O(n log n) instead of O($n^2$))
    • data driven: can also be numerically learned from training data or partial signal, patches (offline or online learning).
    • they can be orthogonal, frame, normalized or general.
    • If Φ is orthogonal (or just invertible), the analyse based model is equivalent to synthesis based one by changing the variable. When it is not, the model is harder to solve.

Parallel MRI reconstruction

Nonconvex sparsity models

  • min f (x) + h(x)
  • where either f or h is nonconvex or both are nonconvex. Some popular models:
    • Nonlinear least square for nonlinear inverse problems f (x) = ‖F(x) − y‖^2.
    • h(x) = ‖x‖_p, where 0 <= p < 1 or rank constraint.
    • Alternating minimization methods in many applications, such as blind deconvolution, matrix factorization, or dictionary learning etc.

Optimization of Matrices

The Netflix problem

Matrix Rank Minimization

Video separation

Sparse and low-rank matrix separation

  • Given a matrix M, we want to find a low rank matrix W and a sparse matrix E, so that W + E = M.
  • Convex approximation:

  • Robust PCA (reduce dimension)

Extension

Correlation Matrices

  • A correlation(相互关系) matrix satisfies:
    • X = X^T , X_ii = 1, i = 1, . . . , n, X >= 0.
  • Example: (low-rank) nearest correlation matrix estimation

Optimization in Machine learning

Why Optimization in Machine Learning?

Loss functions in neural network

convolution operator

Optimization algorithms in Deep learning: stochastic gradient methods

  • pytorch/caffe2: adadelta, adagrad, adam, nesterov, rmsprop, YellowFin https://github.com/pytorch/pytorch/tree/master/caffe2/sgd
  • pytorch/torch: sgd, asgd, adagrad, rmsprop, adadelta, adam, adamax https://github.com/pytorch/pytorch/tree/master/torch/optim
  • tensorflow: Adadelta, AdagradDA, Adagrad, ProximalAdagrad, Ftrl, Momentum, adam, Momentum, CenteredRMSProp https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/training_ops.cc

Generative adversarial network (GAN)

Reinforcement Learning

Optimization in Finance

Portfolio optimization

Review of Risk Measures

Reference

Optimization method - Standford University

Comments