Kalman Filter explained with derivation
Published:
Kalman Filter (KF) is a recursive state estimator widely used in sensor/data fusion. It’s an optimal Bayesian filter in linear Gaussian system, since it minimizes the estimated error covariance. It is also known as Gaussian Filter. KF is not applicable to discrete or hybrid state spaces. This blog helps understand Kalman Filter with simple examples and presents mathematical derivation from two different perspectives.
Modeling Robot State Estimation
CLICK ME
Now we want to estimate the state of a robot <$p, v$>, $p = [p_x, p_y, p_z]^T$ for position and $v = [v_x, v_y, v_z]^T$ for velocity. The state at time $k-1$ is $x_{k-1}=(p_{k-1},v_{k-1})$, we can predict the state at time $k$ if the accelaration $a_k$ is given: $$ p_k = p_{k-1}+v_{k-1} \Delta t + \frac{1}{2}a_k (\Delta t)^2 \\ v_k = v_{k-1}+a_k \Delta t $$ In matrix form: $$ x_k = Ax_{k-1}+Ba_k $$ where $A$ is state transition matrix and $B$ is called control input matrix. $$A = \begin{bmatrix}1 & \Delta t \\ 0 & 1 \end{bmatrix}, B_ = \begin{bmatrix} \frac{1}{2}(\Delta t)^2 \\ \Delta t \end{bmatrix} $$ This is the ideal case. In reality the robot is disturbed by so-called process noise, which models the uncertainty introduced by the state transition. Usually we assume the process noise $\omega_k$ follows a Gaussian distribution (it has to be Gaussian in KF) with zero mean $\omega_k \sim N(0,Q_k)$, $Q_k$ is the covariance matrix. Another observation of the state $z_k$ comes from sensor such as GPS, with measurement noise in Gaussian $v_k \sim N(0, R_k)$. State space model: $$ \begin{aligned} &x_k = Ax_{k-1}+Ba_k+\omega_k \\ &z_k = Hx_k+v_k \end{aligned} $$ where $H_k$ is measurement matrix that map the state space into the observation space.Mathematical Derivation
Kalman Filter is the optimal filter for Linear Gaussian system, a linear system with almost every variable be Gaussian distributed. KF can be derived in many ways, this section introduces 2 derivations to help better understand KF.
Linear Optimal Estimation
How can we guess the real value of $x$ if we observed $x=x_1$ from source 1 and $x=x_2$ from source 2? Theoratically we would never get the real value, but we can approaching the real value or minimize the error between estimated value and real value based on information from different sources under certain assumptions. One option for estimating $x$ is weighted sum of the $x_1$, $x_2$. Kalman Filter corrects the state prediction value according to observed value from sensors to approach the real value, which is similar to weighted sum: $$ \begin{aligned} \tilde{x}_k = \tilde{x}^-_k + K(z_k-H\tilde{x}^-_k) \end{aligned} $$ where $\tilde{x}_k$ is the optimal estimate, $\tilde{x}^-_k$ is predicted state (source 1) and $z_k$ is the observation (source 2). $K$ is called Kalman gain. To determine the value of $K$: $$ \begin{aligned} \tilde{x}_k = &\tilde{x}^-_k + K(Hx_k+v_k-H\tilde{x}^-_k) \\ = &\tilde{x}^-_k + KHx_k+Kv_k-KH\tilde{x}^-_k \end{aligned} $$ subtract $x_k$ from both sides: $$ \tilde{x}_k-x_k = \tilde{x}^-_k-x_k + KH(x_k-\tilde{x}^-_k)+Kv_k $$ Simplify the equation: $$ e_k=x_k-\tilde{x}_k\\ e^-_k=x_k-\tilde{x}^-_k\\ e_k=(I-KH)e^-_k-Kv_k $$ To ensure the estimate $\tilde{x}_k$ is optimal, we have to find the $K$ that minimize $||e_k||^2$, which is equivalent to minimize the trace of covariance matrix $tr(E[e_ke^T_k])$. Set: $$ P_k=E[e_ke^T_k]\\ P^-_k=E[e^-_ke^{-T}_k] $$ Substitute $e_k$: $$ \begin{aligned} P_k=&(I-KH)P^-_k(I-KH)^T+KRK^T\\ =&P^-_k -KHP^-_k-P^-_kH^TK^T+K(HP^-_kH^T+R)K^T \end{aligned} $$ To find the minimum of $tr(P_k)$, we can set the first derivative of $P_k$ with respect to $K$ to 0: $$ \frac{\partial P_k}{\partial K}=-2(P^-_kH^T)+2K(HP^-_kH^T+R)=0\\ K=P^-_kH^T(HP^-_kH^T+R)^{-1} $$ Substituting $K^{-1}P^-_kH^T=HP^-_kH^T+R$ back into the expression of $P_k$: $$ P_k=(I-K_kH)P^-_k $$ We also need to obtain $P^-_k$. Note that noise are independent from the robot state: $$ \begin{aligned} e^-_k=&x_k-\tilde{x}^-_k\\ =&Ax_{k-1}+Bu_k+\omega_k-A \tilde{x}_{k-1}-Bu_k\\ =&A(x_{k-1}-\tilde{x}_{k-1})+\omega_k\\ =&Ae_{k-1}+\omega_k \end{aligned}\\ \begin{aligned} P^-_k=&E[e^-_ke^{-T}_k]\\ =&E[(Ae_{k-1}+\omega_k)(Ae_{k-1}+\omega_k)^T]\\ =&E[(Ae_{k-1})(Ae_{k-1})^T]+E[\omega_k(\omega_k)^T]\\ =&AP_{k-1}A^T+Q \end{aligned}\\ $$ Finally, we can run the Kalman Filter:Step 1. Prediction $$ \tilde{x}^-_k = A\tilde{x}_{k-1}+Ba_k\\ P^-_k = AP_{k-1}A^T+Q $$
Step 2. Measurement update $$ K_k=P^-_kH^T(HP^-_kH^T+R)^{-1}\\ \tilde{x}_k = \tilde{x}^-_k + K(z_k-H\tilde{x}^-_k)\\ P_k=(I-K_kH)P^-_k $$ From the way we find the $K$ for optimal estimation (Least square error), Kalman Filter is the optimal Linear Filter for a Linear system if the noise are independent distributed with zero mean values and valid variances. However, if the system is not a Linear Gaussian, there are non-linear filters perform better than KF, which is another appealing topic. In another word, KF is the optimal estimater for Linear Gaussian System, see the proof below.
Maximum A Posteriori Estimation
KF is a Maximum A Posteriori estimator in Linear Gaussian space under Bayesian interpretation following the same 2-step scheme. The goal is finding the $x_k$ to maximize the posterior distribution:$\tilde{x_k}={argmax}_{x_k}\ p(x_k|z_k,a_k)$Step 1. Prediction: predict $\bar{bel}(x_k)=P(x_k|z_{k-1},a_k)$, the belief of $x_k$ after receiving the control input $a_k$ and before the measurement $z_k$ using the state model, given $bel(x_{t-1})=P(x_{k-1}|z_{k-1},a_{k-1})$, the belief of $x_{k-1}$ (A belief reflects the robot’s internal knowledge about the state of the environment): $$ \begin{aligned} \bar{bel}(x_k)=&P(x_k|z_{k-1},a_k)\\ =&P(x_k|z_{k-1},a_k,a_{k-1})\\ =& \frac{P(x_k,z_{k-1}|a_k,a_{k-1})}{P(z_{k-1})}\\ =& \frac{\int p(x_k,x_{k-1},z_{k-1}|a_k,a_{k-1})dx_{k-1}}{P(z_{k-1})}\\ =& \frac{\int p(x_k|x_{k-1},z_{k-1},a_k,a_{k-1})p(x_{k-1}|z_{k-1},a_{k-1})P(z_{k-1})dx_{k-1}}{P(z_{k-1})}\\ =& \int p(x_k|x_{k-1},a_k)bel(x_{k-1})dx_{k-1}\\ \end{aligned} $$ Note that $x_k$ is only depend on $x_{k-1}$ and $a_k$ (Markov assumption) and $z_{k-1}$ is not dependent on $a_{k-1}$ if $x_{k-1}$ is known (see observation model).
$$ bel(x_{k-1}) \sim N(x_{k-1};\mu_{k-1},\Sigma_{k-1})\\ p(x_k|x_{k-1},a_k) \sim N(x_k;Ax_{k-1}+Ba_k,Q_k) $$ The result of multiplying two Gaussian PDFs is in the form of a Gaussion PDF: $$ \bar{bel}(x_{k}) \sim N(x_k;\bar{\mu}_k,\bar{\Sigma}_k)\\ \bar{\mu}_k=A\mu_{k-1}+Ba_k\\ \bar{\Sigma}_k = A\Sigma_{k-1}A^T+Q_k $$ Step 2. Correction / Update: update the prediction using the observation at time $t$ to obtain the estimation $P(x_k|z_k,a_k)$ $$ \begin{aligned} bel(x_k) =&P(x_k|z_k,a_k)\\ =&P(x_k|z_k,z_{k-1},a_k)\\ =& \frac{P(x_k,z_k|z_{k-1},a_k)}{P(z_k|z_{k-1},a_k)}\\ =& \frac{P(z_k|x_k,z_{k-1},a_k)P(x_k|z_{k-1},a_k)}{P(z_k|z_{k-1},a_k)}\\ =& \frac{P(z_k|x_k)\bar{bel}(x_k)}{P(z_k|z_{k-1},a_k)}\\ =& \eta P(z_k|x_k)\bar{bel}(x_k) \end{aligned} $$ Assuming that $z_k$ is independent of previous observations $z_{1:k-1}$, conditional independent of $a_k$ given $x_k$. Normalization factor $\eta$ can be calculated: $$ \begin{aligned} P(z_k|z_{k-1},a_k)=& \int p(z_k,x_k|z_{k-1},a_k)dx_k\\ =& \int p(z_k|x_k,z_{k-1},a_k)p(x_k|z_{k-1},a_k)dx_k \\ =& \int p(z_k|x_k)p(x_k|z_{k-1},a_k)dx_k\\ =& \int p(z_k|x_k)\bar{bel}(x_k)dx_k \end{aligned} $$ Likewise, Two Gaussian PDFs multiplication: $$ p(z_t|x_t) \sim N(z_k;Hx_k,R_k)\\ \bar{bel}(x_{k}) \sim N(x_k;\bar{\mu}_k,\bar{\Sigma}_k)\\ bel(x_{k}) =\eta \exp\{-J_k\} $$ where $$J_k=\frac{1}{2}(z_k-Hx_k)^TR^{-1}_k(z_k-Hx_k)+\frac{1}{2}(x_k-\bar{\mu}_k)^T\bar{\Sigma}^{-1}_k(x_k-\bar{\mu}_k) $$. $J_k$ is quadratic in $x_k$, which means $bel(x_t)$ is a Gaussian. The mean of $bel(x_k)$ is the minimum of this quadratic function, and $bel(x_k)$ reaches it's maximum. After complicated matrix operations and simplification, we will obtain: $$ bel(x_{k}) \sim N(x_k;\mu_k,\Sigma_k)\\ \mu_k=\bar{\mu}_k+K_k(z_k-H\bar{\mu}_k)\\ \Sigma_k = (I-K_kH)\bar{\Sigma}_k\\ K_k=\bar{\Sigma}_kH^T(R_k+H\bar{\Sigma}_kH^T)^{-1} $$ We can use MAP or just use bayesian inference, both works to find the same parameters, since the full Bayesian posterior is exactly Gaussian in a Linear Gaussian system.