15/01/2026 12:04 PM

⬅️ [13/01/2026 12:05 PM](<./13_01_2026 12_05 PM.md>) | ⬆️ [ECE 567](<./README.md>) | [19/01/2026 4:51 PM - HW1](<./19_01_2026 4_51 PM - HW1.md>) ➡️

15/01/2026 12:04 PM

lecture-03.pdf

Markov Chains
Distribution of current state given all past times is the distribution given last state. Conditionally independent of all prior states given prior state.

We can further have time-homogenous chains where the distribution of current given last does not change in time.

We have
$$P_{ij}=P(X_k=j | X_{k-1} = i)$$
as the transition probability matrix.

For time-homogenous, then if we have a row vector of probabilities $p(k)=(..., P(X_k=i),...)$ we can right multiply by $P$ to get $p(k+1) = p(k)P$.

This also gives us the stationary distribution: $\pi = \pi P$.
We can see that this is an eigenvector problem. We are looking for the eigenvectors corresponding to eigenvalues of 1. Of course since this is a probability distribution there can only be eigenvalues of 1.

I think this also implies that any linear combination of stationary distributions is stationary.

Reachability:
Take a pair of states $i$, $j$. $j$ is reachable from $i$ if there exists a finite time $T$ such that if we start from state $i$ then at time $T$ we have $P(X_T | X_0 = i) > 0$. Or in other words we can get from $i$ to $j$ in $T$ steps.

If we view the markov chain as a graph, if there is a path from $i$ to $j$ then $j$ is reachable from $i$.

Irreducible:
For every pair $i$, $j$ $j$ is reachable from $i$.

Thm- Irreducible => unique stationary distribution.

Now, if we start from a given state, does it converge to the stationary distribution as k -> $\infty$? This is not always the case. Think about a simple 2 state where you always just swap states. This is irreducible, but you never settle into the stationary dist.

Period: State $i$ has a period of $T$ if the MC can only return to $i$ if $k$ is a multiple of $T$.
In our above example, both have a period of 2.
Aperiodic- All states have period 1.
So our example is not aperiodic.

Thm- Every finite-state, aperioidic, irreducile MC converges to its unique stationary distribution as $k \rightarrow \infty$.

Do these go in the other direction? Can we know that we don't converge if we lack some property?

Any chain with every state having a self-loop becomes Aperiodic.

For an irreducible chain, as long as any state has a self loop the entire chain becomes Aperiodic. Since you can always just loop for a bit at the one with a self loop to break the multiple condition.

We were saying something about common divisors. I wasn't listening.

Controlled markov chain (Markov decision process):
Now we introduce the idea of an action. Now $P$ depends on the action.

Why do we have a discount factor? Just for convenience. It is difficult to define the Value in such a way that converges if we don't discount.
We want to solve the problem for non-discounted, but that doesn't work. So we just set the discount factor to be very close to 1.

Markov assumption: In a general stochastic control problem, your policy needs to depend on the entire history of states and actions. But we use a simplifying assumption to make problems solvable.
We use a markov policy so $\mu_k = \mu_k(x_k)$ instead of $\mu_k = \mu_k(x_k, \mu_{k-1}, x_{k-1}, ...)$.
We can even have a stationary policy where we are time-homogenous. In this case $\mu = \mu(x_k)$.

We assume that one of the optimal polcies is stationary for the MDPs we consider.

Why? Is it a theorem that there is a stationary policy as good as any time-varying policy?

⬅️ [13/01/2026 12:05 PM](<./13_01_2026 12_05 PM.md>) | ⬆️ [ECE 567](<./README.md>) | [19/01/2026 4:51 PM - HW1](<./19_01_2026 4_51 PM - HW1.md>) ➡️