15/01/2026 12:04 PM
15/01/2026 12:04 PM
Markov Chains
Distribution of current state given all past times is the distribution given last state. Conditionally independent of all prior states given prior state.
We can further have time-homogenous chains where the distribution of current given last does not change in time.
We have
$$P_{ij}=P(X_k=j | X_{k-1} = i)$$
as the transition probability matrix.
For time-homogenous, then if we have a row vector of probabilities $p(k)=(..., P(X_k=i),...)$ we can right multiply by $P$ to get $p(k+1) = p(k)P$.
This also gives us the stationary distribution: $\pi = \pi P$.
We can see that this is an eigenvector problem. We are looking for the eigenvectors corresponding to eigenvalues of 1. Of course since this is a probability distribution there can only be eigenvalues of 1.
I think this also implies that any linear combination of stationary distributions is stationary.
Reachability:
Take a pair of states $i$, $j$. $j$ is reachable from $i$ if there exists a finite time $T$ such that if we start from state $i$ then at time $T$ we have $P(X_T | X_0 = i) > 0$. Or in other words we can get from $i$ to $j$ in $T$ steps.
If we view the markov chain as a graph, if there is a path from $i$ to $j$ then $j$ is reachable from $i$.
Irreducible:
For every pair $i$, $j$ $j$ is reachable from $i$.
Thm- Irreducible => unique stationary distribution.
Now, if we start from a given state, does it converge to the stationary distribution as k -> $\infty$? This is not always the case. Think about a simple 2 state where you always just swap states. This is irreducible, but you never settle into the stationary dist.
Period: State $i$ has a period of $T$ if the MC can only return to $i$ if $k$ is a multiple of $T$.
In our above example, both have a period of 2.
Aperiodic- All states have period 1.
So our example is not aperiodic.
Thm- Every finite-state, aperioidic, irreducile MC converges to its unique stationary distribution as $k \rightarrow \infty$.
Do these go in the other direction? Can we know that we don't converge if we lack some property?
Any chain with every state having a self-loop becomes Aperiodic.
For an irreducible chain, as long as any state has a self loop the entire chain becomes Aperiodic. Since you can always just loop for a bit at the one with a self loop to break the multiple condition.
We were saying something about common divisors. I wasn't listening.
Controlled markov chain (Markov decision process):
Now we introduce the idea of an action. Now $P$ depends on the action.
Why do we have a discount factor? Just for convenience. It is difficult to define the Value in such a way that converges if we don't discount.
We want to solve the problem for non-discounted, but that doesn't work. So we just set the discount factor to be very close to 1.
Markov assumption: In a general stochastic control problem, your policy needs to depend on the entire history of states and actions. But we use a simplifying assumption to make problems solvable.
We use a markov policy so $\mu_k = \mu_k(x_k)$ instead of $\mu_k = \mu_k(x_k, \mu_{k-1}, x_{k-1}, ...)$.
We can even have a stationary policy where we are time-homogenous. In this case $\mu = \mu(x_k)$.
We assume that one of the optimal polcies is stationary for the MDPs we consider.
Why? Is it a theorem that there is a stationary policy as good as any time-varying policy?