Contents

Goal of RL problems

Maximize the expected total reward sum

Key difficulties in learning → Feedback is sequential, evaluative, and sampled

Challenges

  1. Sequential

  2. Evaluative

  3. Sampled

    → need generalization

Untitled

Countermeasures

  1. Sequential → estimate the values (expected returns)
  2. Evaluative → balance exploration and exploitation
  3. Sampled → generalize the value functions: use neural network for the generalization

Generalize the Values

State가 너무 많고 복잡해서 sampling을 통해 일부밖에 볼 수 없으므로, 새로운 state가 나와도 이전과 비슷한 state들을 통해 Value 값 예측 → Regression Problem

Untitled

Parametric function approximation

Parametric class of functions → choose $\tilde v_t$