A discrete-time stochastic model → a sequence of possible events:
각 event의 사건의 확률은 이전 사건으로부터 얻어진 state에 의존
A stochastic process $\{X_n, n=0,1,2,...\}$
$X_n$ → state at (discrete) time step n
DTMC
$$ \begin{align*} & P(X_{n+1}=j|X_n=i, X_{n-1}=i_{n-1}, ..., X_0=i_0) \\ & =P(X_{n+1}=j|X_n=i) \\ & = P_{ij} \end{align*} $$
where $P_{ij}$ is independent of the past history and of the time step (n)
$P=P_{ij}$ 를 알면 stationary probability distribution $\Pi_i$ 를 알 수 있음:
$\Pi_i=\text{Prob\{current state is i\}} = E(\text{state i})$
→ average time spent in state i, inter-visit(재방문) time to state i 같은 정보를 알 수 있음
p(next state | current state, action) 로 표현
In mathematics, a MDP is a discrete-time stochastic control process.
Formulating RL problems through MDP
Should satisfy Markov property and stationary property
Kernel of MDP describes environment’s behavior