Kalman Filter explained with derivation

6 minute read

Published: July 24, 2023

Kalman Filter (KF) is a recursive state estimator widely used in sensor/data fusion. It’s an optimal Bayesian filter in linear Gaussian system, since it minimizes the estimated error covariance. It is also known as Gaussian Filter. KF is not applicable to discrete or hybrid state spaces. This blog helps understand Kalman Filter with simple examples and presents mathematical derivation from two different perspectives.

Modeling Robot State Estimation

CLICK ME

Now we want to estimate the state of a robot <$p, v$>, $p = [p_x, p_y, p_z]^T$ for position and $v = [v_x, v_y, v_z]^T$ for velocity. The state at time $k-1$ is $x_{k-1}=(p_{k-1},v_{k-1})$, we can predict the state at time $k$ if the accelaration $a_k$ is given: $$ p_k = p_{k-1}+v_{k-1} \Delta t + \frac{1}{2}a_k (\Delta t)^2 \\ v_k = v_{k-1}+a_k \Delta t $$ In matrix form: $$ x_k = Ax_{k-1}+Ba_k $$ where $A$ is state transition matrix and $B$ is called control input matrix. $$A = \begin{bmatrix}1 & \Delta t \\ 0 & 1 \end{bmatrix}, B_ = \begin{bmatrix} \frac{1}{2}(\Delta t)^2 \\ \Delta t \end{bmatrix} $$ This is the ideal case. In reality the robot is disturbed by so-called process noise, which models the uncertainty introduced by the state transition. Usually we assume the process noise $\omega_k$ follows a Gaussian distribution (it has to be Gaussian in KF) with zero mean $\omega_k \sim N(0,Q_k)$, $Q_k$ is the covariance matrix. Another observation of the state $z_k$ comes from sensor such as GPS, with measurement noise in Gaussian $v_k \sim N(0, R_k)$. State space model: $$ \begin{aligned} &x_k = Ax_{k-1}+Ba_k+\omega_k \\ &z_k = Hx_k+v_k \end{aligned} $$ where $H_k$ is measurement matrix that map the state space into the observation space.

Mathematical Derivation

Kalman Filter is the optimal filter for Linear Gaussian system, a linear system with almost every variable be Gaussian distributed. KF can be derived in many ways, this section introduces 2 derivations to help better understand KF.

Linear Optimal Estimation

How can we guess the real value of $x$ if we observed $x=x_1$ from source 1 and $x=x_2$ from source 2? Theoratically we would never get the real value, but we can approaching the real value or minimize the error between estimated value and real value based on information from different sources under certain assumptions. One option for estimating $x$ is weighted sum of the $x_1$, $x_2$. Kalman Filter corrects the state prediction value according to observed value from sensors to approach the real value, which is similar to weighted sum: $$ \begin{aligned} \tilde{x}_k = \tilde{x}^-_k + K(z_k-H\tilde{x}^-_k) \end{aligned} $$ where $\tilde{x}_k$ is the optimal estimate, $\tilde{x}^-_k$ is predicted state (source 1) and $z_k$ is the observation (source 2). $K$ is called Kalman gain. To determine the value of $K$: $$ \begin{aligned} \tilde{x}_k = &\tilde{x}^-_k + K(Hx_k+v_k-H\tilde{x}^-_k) \\ = &\tilde{x}^-_k + KHx_k+Kv_k-KH\tilde{x}^-_k \end{aligned} $$ subtract $x_k$ from both sides: $$ \tilde{x}_k-x_k = \tilde{x}^-_k-x_k + KH(x_k-\tilde{x}^-_k)+Kv_k $$ Simplify the equation: $$ e_k=x_k-\tilde{x}_k\\ e^-_k=x_k-\tilde{x}^-_k\\ e_k=(I-KH)e^-_k-Kv_k $$ To ensure the estimate $\tilde{x}_k$ is optimal, we have to find the $K$ that minimize $||e_k||^2$, which is equivalent to minimize the trace of covariance matrix $tr(E[e_ke^T_k])$. Set: $$ P_k=E[e_ke^T_k]\\ P^-_k=E[e^-_ke^{-T}_k] $$ Substitute $e_k$: $$ \begin{aligned} P_k=&(I-KH)P^-_k(I-KH)^T+KRK^T\\ =&P^-_k -KHP^-_k-P^-_kH^TK^T+K(HP^-_kH^T+R)K^T \end{aligned} $$ To find the minimum of $tr(P_k)$, we can set the first derivative of $P_k$ with respect to $K$ to 0: $$ \frac{\partial P_k}{\partial K}=-2(P^-_kH^T)+2K(HP^-_kH^T+R)=0\\ K=P^-_kH^T(HP^-_kH^T+R)^{-1} $$ Substituting $K^{-1}P^-_kH^T=HP^-_kH^T+R$ back into the expression of $P_k$: $$ P_k=(I-K_kH)P^-_k $$ We also need to obtain $P^-_k$. Note that noise are independent from the robot state: $$ \begin{aligned} e^-_k=&x_k-\tilde{x}^-_k\\ =&Ax_{k-1}+Bu_k+\omega_k-A \tilde{x}_{k-1}-Bu_k\\ =&A(x_{k-1}-\tilde{x}_{k-1})+\omega_k\\ =&Ae_{k-1}+\omega_k \end{aligned}\\ \begin{aligned} P^-_k=&E[e^-_ke^{-T}_k]\\ =&E[(Ae_{k-1}+\omega_k)(Ae_{k-1}+\omega_k)^T]\\ =&E[(Ae_{k-1})(Ae_{k-1})^T]+E[\omega_k(\omega_k)^T]\\ =&AP_{k-1}A^T+Q \end{aligned}\\ $$ Finally, we can run the Kalman Filter:
Step 1. Prediction $$ \tilde{x}^-_k = A\tilde{x}_{k-1}+Ba_k\\ P^-_k = AP_{k-1}A^T+Q $$
Step 2. Measurement update $$ K_k=P^-_kH^T(HP^-_kH^T+R)^{-1}\\ \tilde{x}_k = \tilde{x}^-_k + K(z_k-H\tilde{x}^-_k)\\ P_k=(I-K_kH)P^-_k $$ From the way we find the $K$ for optimal estimation (Least square error), Kalman Filter is the optimal Linear Filter for a Linear system if the noise are independent distributed with zero mean values and valid variances. However, if the system is not a Linear Gaussian, there are non-linear filters perform better than KF, which is another appealing topic. In another word, KF is the optimal estimater for Linear Gaussian System, see the proof below.

Maximum A Posteriori Estimation

KF is a Maximum A Posteriori estimator in Linear Gaussian space under Bayesian interpretation following the same 2-step scheme. The goal is finding the $x_k$ to maximize the posterior distribution:$\tilde{x_k}={argmax}_{x_k}\ p(x_k|z_k,a_k)$
Step 1. Prediction: predict $\bar{bel}(x_k)=P(x_k|z_{k-1},a_k)$, the belief of $x_k$ after receiving the control input $a_k$ and before the measurement $z_k$ using the state model, given $bel(x_{t-1})=P(x_{k-1}|z_{k-1},a_{k-1})$, the belief of $x_{k-1}$ (A belief reflects the robot’s internal knowledge about the state of the environment): $$ \begin{aligned} \bar{bel}(x_k)=&P(x_k|z_{k-1},a_k)\\ =&P(x_k|z_{k-1},a_k,a_{k-1})\\ =& \frac{P(x_k,z_{k-1}|a_k,a_{k-1})}{P(z_{k-1})}\\ =& \frac{\int p(x_k,x_{k-1},z_{k-1}|a_k,a_{k-1})dx_{k-1}}{P(z_{k-1})}\\ =& \frac{\int p(x_k|x_{k-1},z_{k-1},a_k,a_{k-1})p(x_{k-1}|z_{k-1},a_{k-1})P(z_{k-1})dx_{k-1}}{P(z_{k-1})}\\ =& \int p(x_k|x_{k-1},a_k)bel(x_{k-1})dx_{k-1}\\ \end{aligned} $$ Note that $x_k$ is only depend on $x_{k-1}$ and $a_k$ (Markov assumption) and $z_{k-1}$ is not dependent on $a_{k-1}$ if $x_{k-1}$ is known (see observation model).
$$ bel(x_{k-1}) \sim N(x_{k-1};\mu_{k-1},\Sigma_{k-1})\\ p(x_k|x_{k-1},a_k) \sim N(x_k;Ax_{k-1}+Ba_k,Q_k) $$ The result of multiplying two Gaussian PDFs is in the form of a Gaussion PDF: $$ \bar{bel}(x_{k}) \sim N(x_k;\bar{\mu}_k,\bar{\Sigma}_k)\\ \bar{\mu}_k=A\mu_{k-1}+Ba_k\\ \bar{\Sigma}_k = A\Sigma_{k-1}A^T+Q_k $$ Step 2. Correction / Update: update the prediction using the observation at time $t$ to obtain the estimation $P(x_k|z_k,a_k)$ $$ \begin{aligned} bel(x_k) =&P(x_k|z_k,a_k)\\ =&P(x_k|z_k,z_{k-1},a_k)\\ =& \frac{P(x_k,z_k|z_{k-1},a_k)}{P(z_k|z_{k-1},a_k)}\\ =& \frac{P(z_k|x_k,z_{k-1},a_k)P(x_k|z_{k-1},a_k)}{P(z_k|z_{k-1},a_k)}\\ =& \frac{P(z_k|x_k)\bar{bel}(x_k)}{P(z_k|z_{k-1},a_k)}\\ =& \eta P(z_k|x_k)\bar{bel}(x_k) \end{aligned} $$ Assuming that $z_k$ is independent of previous observations $z_{1:k-1}$, conditional independent of $a_k$ given $x_k$. Normalization factor $\eta$ can be calculated: $$ \begin{aligned} P(z_k|z_{k-1},a_k)=& \int p(z_k,x_k|z_{k-1},a_k)dx_k\\ =& \int p(z_k|x_k,z_{k-1},a_k)p(x_k|z_{k-1},a_k)dx_k \\ =& \int p(z_k|x_k)p(x_k|z_{k-1},a_k)dx_k\\ =& \int p(z_k|x_k)\bar{bel}(x_k)dx_k \end{aligned} $$ Likewise, Two Gaussian PDFs multiplication: $$ p(z_t|x_t) \sim N(z_k;Hx_k,R_k)\\ \bar{bel}(x_{k}) \sim N(x_k;\bar{\mu}_k,\bar{\Sigma}_k)\\ bel(x_{k}) =\eta \exp\{-J_k\} $$ where $$J_k=\frac{1}{2}(z_k-Hx_k)^TR^{-1}_k(z_k-Hx_k)+\frac{1}{2}(x_k-\bar{\mu}_k)^T\bar{\Sigma}^{-1}_k(x_k-\bar{\mu}_k) $$. $J_k$ is quadratic in $x_k$, which means $bel(x_t)$ is a Gaussian. The mean of $bel(x_k)$ is the minimum of this quadratic function, and $bel(x_k)$ reaches it's maximum. After complicated matrix operations and simplification, we will obtain: $$ bel(x_{k}) \sim N(x_k;\mu_k,\Sigma_k)\\ \mu_k=\bar{\mu}_k+K_k(z_k-H\bar{\mu}_k)\\ \Sigma_k = (I-K_kH)\bar{\Sigma}_k\\ K_k=\bar{\Sigma}_kH^T(R_k+H\bar{\Sigma}_kH^T)^{-1} $$ We can use MAP or just use bayesian inference, both works to find the same parameters, since the full Bayesian posterior is exactly Gaussian in a Linear Gaussian system.

Share on

Twitter Facebook LinkedIn

Hungarian Algorithm in multiple object tracking

7 minute read

Published: August 30, 2024

Facilitated by the development of deep learning in subfields of computer vision including single object tracking (SOT), video object detection (VOD), and re-identification (Re-ID), many methods of multi-object tracking (MOT) have emerged, such as various embedding models designed for object locations and track identities [1]. While tracking-by-detection is still an effective paradigm for MOT, since we can always combine off-the-shelf state-of-the-art detectors and Re-ID models to improve the MOT system. The idea is: detect the objects in the current frame, then try to associate the detected boxes with tracked boxes in the previous frames. In practice, we need a more sophisticated design (e.g. choose similarity indices for data association, use Kalman Filter to predict new locations of tracks etc.) and proper strategies (e.g. to delete or initialize tracks). Hunarian algorithm is commonly used to matching objects after we get the similarity matrix, in this blog, I will try to explain how and why Hunarian algorithm works from the perspective of linear programming.

Problem Description

Hunarian algorithm or Kuhn-Munkres algorithm is well known for solving assignment problem, which involves assigning tasks to agents (or jobs to workers) such that the total cost is minimized (or the total profit is maximized), subject to the constraint that each agent is assigned exactly one task, and each task is assigned to exactly one agent.

Primal Form of the Assignment Problem

We are given an $n \times n$ cost matrix $C$, where $c_{ij}$ is the cost of assigning agent $i$ to task $j$.
$\begin{aligned} \text{min} & \sum_{i \in I} \sum_{j \in J} c_{ij}x_{ij} \\ s.t. & \sum_{j \in J}x_{ij} = 1 \text{ for all } i \in I \\ &\sum_{i \in I}x_{ij} = 1 \text{ for all } j \in J \\ & x_{ij} \geq 0 \end{aligned}$

Note that $x_{ij} = 1$ if i is assigned to j and 0 otherwise.

Assignment Problem is a classic linear programming problem, since both the objective function and constraints are linear sums, even though it involves binary decision variables. We can model the problem as a bipartite graph and interpret the algorithm from the perspective of graph (find the perfect matching by finding augmenting path), but this blog will not cover this topic. Let’s try to understand Hungarian algorithm under the framework of linear programming. Since the primal problem is tricky to deal with, let’s go for the dual form of the problem.

Primal to Dual

To convert the primal Assignment Problem to its dual form, we need to use Lagrange multipliers to relax the equality constraints in the primal problem. Introduce $u_i$ as the Lagrange multiplier associated with the constraint that agent $i$ is assigned exactly one task ($\sum_j x_{ij} = 1$), $v_j$ as the Lagrange multiplier associated with the constraint that task $j$ is assigned to exactly one agent ($\sum_i x_{ij} = 1$), then formulate the Lagrangian:
$$ L(x_{ij},u_i,v_j)=\sum_{i=1}^n \sum_{j=1}^n c_{ij}x_{ij}+\sum_{i=1}^n u_i (1-\sum_{j=1}^n x_{ij})+\sum_{j=1}^n v_j (1-\sum_{i=1}^n x_{ij}) $$ Lagrange dual function is defined as the minimum value of the Lagrangian over primal variables:
$$ g(u,v)=\text{inf}_x L(x,u,v) $$ where $\text{inf}_x$ means the infimum (the lower bound) of the Lagrangian over $x$. Since the dual function is the pointwise infimum of a family of affine functions of (u, ν), it is concave, even when the problem is not convex [2]. To minimize this Lagrangian $L(x_{ij},u_i,v_j)$ with respect to $x_{ij}$, rewrite the Lagrangian as:
$$ L(x_{ij},u_i,v_j)=\sum_{i=1}^n \sum_{j=1}^n [(c_{ij}-u_i-v_j)x_{ij}] + \sum_{i=1}^n u_i + \sum_{j=1}^n v_j $$ The minimization occurs in the condition:
$$ x_{ij}=1 \ \text{ if }\ c_{ij} \leq u_i+v_j \\ x_{ij}=0 \ \text{ if }\ c_{ij} > u_i+v_j $$ Now we want to find the tightest lower bound or minimum duality gap, so we need to maximize the dual function w.r.t $u$ and $v$. In the case of $x_{ij}=1$ If $c_{ij} \leq u_i+v_j$, the maximum occurs when $c_{ij} = u_i+v_j$, so $c_{ij}-u_i-v_j$ in Lagrangian is non-negative.

Dual Form of the Assignment Problem

The dual form of the Assignment Problem can be written as:
$\begin{aligned} \text{max} &\sum_{i=1}^n u_i + \sum_{j=1}^n v_j \\ s.t. & u_i+v_j \leq c_{ij}, \ \forall i, j \end{aligned}$

where $u_i$ is the dual variable associated with agent $i$’s assignment constraint, $v_j$ is the dual variable associated with task $j$’s assignment constraint.
The strong duality theorem of linear programming guarantees that if a feasible and bounded solution exists for the primal linear problem, the dual linear problem is feasible (and bounded as well, by weak duality), with the same optimum value as the primal [3]. The primal problem has constraints that ensure every agent is assigned exactly one task, and every task is assigned exactly one agent, a feasible primal solution exists (since the problem is well-posed with the same number of agents and tasks).

Hungarian Algorithm

Given a $n \times n$ cost matrix (Adjacent Matrix in Graph theory) $C$, the algorithm is summarized as 5 steps below:

Find and subtract the row minimum, namely find the minimum element in each row of the cost matrix and subtract the minimum element in each row from the element in that row.
Find and subtract the colume minimum, namely find the minimum element of each column in the remaining matrix and subtract the minimum element in each column from the elements in that column.
Use the minimum number of horizontal or vertical lines to cover all zero elements in the matrix. If the minimum number of lines is equal to $n$, stop running; if the minimum number of lines is less than $n$, continue to the next step.
Find the minimum value among the uncovered elements, subtract this minimum value from the uncovered elements, and add this minimum value to the elements at the intersection of different lines.
Go back to step 3, that is, continue to cover all zero elements in the matrix with the minimum number of horizontal or vertical lines. If the minimum number of lines is equal to $n$, stop running; if the minimum number of lines is less than $n$, execute step 4.

For details of $O(n^3)$ algorithm implementation, refer to this blog from cp algorithms, where the dual variables $u,v$ are denoted as potential. Hungarian algorithm updates the search of optimal value by continuesly increase the sum of $u,v$: $f= \sum_{i=1}^n u[i] + \sum_{j=1}^n v[j]$

In the context of MOT, cost matrix is filled with IOU or similarity score obtained from ReID features between objects in two consecutive frames. Sometimes the matrix is non-square, we can add dummy rows or columes with zeros and run more iterations.

Reference:
[1] Wang, G., Song, M., & Hwang, J. N. (2022). Recent advances in embedding methods for multi-object tracking: a survey. arXiv preprint arXiv:2205.10766.
[2] Boyd, Stephen, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
[3] Matoušek, Jiří, and Bernd Gärtner. Understanding and using linear programming. Vol. 1. Berlin: Springer, 2007.

Deploy LLMs on a phone with arm chips

4 minute read

Published: June 13, 2024

In this blog, I’m going to share how I deployed LLMs (1~3b) and ASR models (whisper tiny & base) on my old smart phone, Xiaomi 8 (dipper). It was released in 2018, with a broken screen and depleted battery after several years of heavy usage, it still offers acceptable speed and fluency running daily light-weight apps. So I wonder, if I deploy modern machine models, such as LLMs, on the phone, and can I run the large models smoothly. Let’s see.
The hardware configuration: 8-cores CPU with 4 big cores (Cortex-A75) and 4 small cores (Cortex-A55), 64 GB storage and 6 GB RAM. It has a integrated GPU, which I didn’t find a way to utilize during model inference, so the computation is totally counted on CPU in this trial. To better leverage the power of this CPU, I uninstall the android system (MIUI 12) and port a Ubuntu touch on the phone. Basically it’s a linux OS but using the underlying andriod framework to control hardwares, and it gives me a much longer battery life compared to android, also more convenience since I am a rookie in android development. Models are quantized versions from llamacpp and whispercpp.
To get rid of the cumbersome work build an app with GUI, I run models in command line using the terminal app originally from Ubuntu touch, which can be regarded as the same terminal on Ubuntu desktop in this case. All I need to do is to compile the c++ code into an executable file that runs on my phone. Since the architecture of my laptop cpu is x86, the version of glibc, libstdc++ are different from the libs on the phone, I could either compile on the phone directly or cross compile on my PC with a specific toolchain. I kept all the heavy work on PC and built my own toolchain using crosstool-NG, which is targeted at building toolchains.

LLMs Inference speed up EP1 - kv cache

6 minute read

Published: May 31, 2024

Large Language Models (LLMs) have revolutionized the field of natural language processing, enabling significant advancements in tasks such as language translation, text summarization, and sentiment analysis. However, despite their impressive capabilities, LLMs are not without limitations. One of the most significant challenges facing LLMs today is the problem of inference speed. Due to the sheer size and complexity of these models, the process of making predictions or extracting information from them can be computationally expensive and time-consuming. Several ways to speed up the LLMs without updating hardware:

Customize a local chatbot in streamlit

4 minute read

Published: April 03, 2024

Large Language Models (LLMs) are the brains of various chatbots, and luckily we all have access to some brilliant cheap or even free LLMs now, thanks to open source community. It is possible to run LLMs on a PC and keep everything local. This blog presents my solution to building a chatbot running on my PC, with a totally local file storage system and a costumized graphical user interface.

Teng Ma

Kalman Filter explained with derivation

Modeling Robot State Estimation

Mathematical Derivation

Share on

You May Also Enjoy

Hungarian Algorithm in multiple object tracking

Problem Description

Primal Form of the Assignment Problem

Dual Form of the Assignment Problem

Hungarian Algorithm

Deploy LLMs on a phone with arm chips

LLMs Inference speed up EP1 - kv cache

Customize a local chatbot in streamlit