Contents

Dynamic Programming (DP)

Problem formulation

$$ \max_{a_t\in A_t(s_t)} E \left \{ \sum_t (s_t, a_t) \right \} $$

DP algorithm (Backward induction)

Based on the principle of optimality (tail subproblem)

$$ v^(s_t)=\max_{a_t\in A_t(s_t)} E [r_t(s_t,a_t)+v^(s_{t+1}|s_t,a_t)] \text{ for all possible }s_t $$

However,

Prediction

Without MDP kernel