Direct Method SLAM - Jacobian Formulation

This post is mainly a re-formulation and detailed expansion for the section 8.4, "Direct Method" in "14 lectures on Visual SLAM".

When I was working on direct method SLAM, I found the notations used on the book is hard to understand and many details to derive the final equation are omitted. Hence, I wrote this post as a note and record for my own derivation of the Jacobian in Direct Method of SLAM.

Coordinate System and Symbol Table

Assumption

Cam1 and Cam2 have same intrinsic matrix KK and are pinhole camera with no distortion (or distortion is corrected for resulting image in pre-process step).

Coordinate System

There are two 3D coordinate systems: the camera 1 (where optic center O1O_1 is the same as world coordinate origin) and the camera 2 (with transformation of cam1 → cam2 is T12=ξT_1^2 = \xi)

There are two 2D coordinate systems (pixel coordinates): the image 1 and image 2. I will use uv coordinate when refer to them.

Symbol Table

SymbolDomainShapeMeaning
PiP_iR3\mathbb{R}^3(1, 3)i-th point in the 3-dimentional space (under cam1 coordinate)
p1,ip_{1,i}R2\mathbb{R}^2(1, 2)i-th point's projection on cam1's sensor plane
p2,ip_{2,i}R2\mathbb{R}^2(1, 2)i-th point's projection on cam2's sensor plane
KKR3×3\mathbb{R}^{3\times 3}(3, 3)Camera intrinsic matrix
ξ\xise(3)\mathfrak{se}(3)(1, 6)Cam2's pose w.r.t. world coordinate (cam1)
I1I_1R2R\mathbb{R^2} \to \mathbb{R}-Illuminance map from cam1 (w/ bilinear interpolation)
I2I_2R2R\mathbb{R^2} \to \mathbb{R}-Illuminance map from cam2 (w/ bilinear interpolation)
Exp()Exp(\cdot)se(3)R4×4\mathfrak{se}(3) \to \mathbb{R}^{4\times 4}-vector to matrix + Exponential Mapping (Lie Algebra → Transformation Matrix)
Log()Log(\cdot)R4×4se(3)\mathbb{R}^{4\times 4} \to \mathfrak{se}(3)-Logarithm Mapping + Matrix to vector (Transformation Matrix → Lie Algebra)
exp()exp(\cdot)se(3)SE(3)\mathfrak{se}(3) \to SE(3)-Exponential Mapping (Lie Algebra → Lie Group)
\cdot^\wedgeR6R4×4\mathbb{R}^{6}\to \mathbb{R}^{4\times 4}-se(3)\mathfrak{se}(3) vector representation to matrix representation
~\tilde{\cdot}--Homogenous coordinate system

Unless explicitly specified, all the coordinates are represented under heterogeneous coordinate.

Step 1. Reprojection Model

According to the pinhole camera model, we have p1,i~=KPi\tilde{p_{1, i}} = KP_i and p2,i~=K(Exp(ξ)Pi~)1:3\tilde{p_{2, i}} = K(Exp(\xi)\tilde{P_i})_{1:3}.

By converting homogenous coordinate to heterogeneous coordinate, we can retrieve the p1,ip_{1, i} and p2,ip_{2, i} from ξ\xi and KK.

Step 2. Photometric Error as Model Residual

In direct method, we do not do feature point matching as this process (extracting feature point, convert to feature descriptor, run descriptor matching with or without epipolar geo constraint) is computationally heavy and error-prone.

Instead, we use the "illuminance consistency assumption". That is, between two camera frames tt and t+1t + 1, the illuminance of a point in 3D space should be consistent. That is,

It(P)=It+1(P)I_t(P) = I_{t + 1}(P)

Naturally, the photometric error is the difference in illuminance between same point in 2 adjacent frames.

ei(ξ)=(I1(p1,i)I2(p2,i))2e_i(\xi) = \left(I_1(p_{1, i}) - I_2(p_{2, i})\right)^2

Direct SLAM make use of this assumption heavily. In the bundle adjustment process, we define the total residual of current state estimation as:

R(ξ)=i=1Nei(ξ)=i=1N(I1(p1,i)I2(p2,i))2\mathbf{R}(\xi) = \sum_{i = 1}^N{e_i(\xi)} = \sum_{i = 1}^N{\left(I_1(p_{1, i}) - I_2(p_{2, i})\right)^2}

Step 3. Optimization Problem Formulation

During the (local) bundle adjustment stage, we want to optimize the pose of cam2, namely ξse(3)\xi \in \mathfrak{se}(3), to minimize the residual of state estimation. The mathematical formulation of this optimization problem is:

ξ=argminξR(ξ)=argminξi=1Nei(ξ)=argminξi=1NI1(p1,i)I2(p2,i)22=argminξi=1NI1(KPi)I2(K(Exp(ξ)Pi~)1:322\begin{aligned} \xi^\star &= \arg\min_{\xi}{\mathbf{R}(\xi)}\\ &= \arg\min_{\xi}{\sum_{i = 1}^N{e_i(\xi)}}\\ &= \arg\min_{\xi}{\sum_{i = 1}^N{\|I_1(p_{1, i}) - I_2(p_{2, i})\|_2^2}}\\ &= \arg\min_{\xi}{\sum_{i = 1}^N{\|I_1(KP_i) - I_2(K(Exp(\xi)\tilde{P_i})_{1:3}\|_2^2}} \end{aligned}

Step 4. Perturbation Model and Jacobian for Direct Method

To solve this optimization problem, we can use optimizers like Gauss-Newton or Lagrange-Marquadt. These optimizers typically requires us to provide the jacobian of model. Hence, we need to derive the jacobian J\mathbf{J} for cam2 pose ξ\xi w.r.t. R\mathbf{R}. Since R\mathbf{R} is a simple summation from ei(ξ)e_i(\xi), we will only derive ei(ξ)/ξ\partial e_i(\xi) / \partial \xi here.

To calculate the jacobian of R\mathbf{R} w.r.t. ξ\xi, there are two main methods: the direct differentiation and perturbation model.

Direct Differentiation (which, we will not use)

ei(ξ)ξ=limδξ0ei(ξδξ)ei(ξ)δξ=limδξ0I(KPi)I2(K(Exp(ξδξ)Pi~)1:3)22ei(ξ)δξ\begin{aligned} \frac{\partial e_i(\xi)}{\partial \xi} &= \lim_{\delta\xi \to 0}\frac{e_i(\xi \oplus \delta\xi) - e_i(\xi)}{\delta\xi}\\ &= \lim_{\delta\xi\to 0}{\frac{\|I(KP_i) - I_2(K(Exp(\xi\oplus\delta\xi)\tilde{P_i})_{1:3})\|_2^2 - e_i(\xi)}{\delta\xi}} \end{aligned}

However, we have ξδξLog(Exp(ξ)Exp(δξ))\xi\oplus\delta\xi \neq Log(Exp(\xi)Exp(\delta\xi)) in Lie algebra. Instead, we need to follow the BCH formula with first-order approximation (when δξ\delta\xi is sufficiently small)

Log(Exp(δξ)Exp(ξ))Jl(ξ)1δξ+ξLog(Exp(\delta\xi)Exp(\xi)) \approx \mathbf{J}_l(\xi)^{-1}\delta\xi + \xi

However, using BCH formula requires us to calculate the Jl\mathbf{J}_l, which is undesired due to its computation complexity. Hence, we will use the perturbation model to calculate the differentiation instead.

Perturbation Model

Comparing to the direct differentiation, where BCH formula is required, perturbation model first add a small perturbation on target function eie_i, then calculate the derivative of residual w.r.t. the perturbation term.

That is, we will calculate ei(ξδξ)δξ\frac{\partial e_i(\xi \oplus \delta\xi)}{\partial \delta\xi} instead of ei(ξ)ξ\frac{\partial e_i(\xi)}{\partial \xi}.

First, we will derive ei(ξδξ)e_i(\xi \oplus \delta\xi) (using left-perturbation, the result is different from right-perturbation).

ei(ξδξ)=I1(KPi)I2(K(Exp(δξ)Exp(ξ)Pi~)1:3)\begin{aligned} e_i(\xi\oplus\delta\xi) &= I_1(KP_i) - I_2(K(Exp(\delta\xi)Exp(\xi)\tilde{P_i})_{1:3})\\ \end{aligned}

The taylor expansion for Exp(δξ)Exp(\delta \xi) is of form

Exp(δξ)=exp(δξ)=n=01n!(δξ)nExp(\delta\xi) = exp(\delta\xi^\wedge) = \sum_{n = 0}^\infty{\frac{1}{n!}(\delta\xi^\wedge)^n}

Using a first-order taylor approximation here (should be accurate as δξ\delta\xi is a minute perturbation term), we have

Exp(δξ)1+δξExp(\delta\xi) \approx 1 + \delta\xi^\wedge

Then, we have

ei(ξδξ)=I1(KPi)I2(K(Exp(δξ)Exp(ξ)P~i)1:3)I1(KPi)I2(K((1+δξ)Exp(ξ)P~i)1:3)=I1(KPi)I2(K(Exp(ξ)P~i)1:3+K(δξExp(ξ)P~i)1:3)\begin{aligned} e_i(\xi\oplus\delta\xi) &= I_1(KP_i) - I_2(K(Exp(\delta\xi)Exp(\xi)\tilde{P}_i)_{1:3})\\ &\approx I_1(KP_i) - I_2(K((1 + \delta\xi^\wedge)Exp(\xi)\tilde{P}_i)_{1:3}) \\ &= I_1(KP_i) - I_2(K(Exp(\xi)\tilde{P}_i)_{1:3} + K(\delta\xi^\wedge Exp(\xi)\tilde{P}_i)_{1:3}) \end{aligned}

Then we apply another first-order taylor approximation

I2(K(Exp(ξ)P~i)1:3+K(δξExp(ξ)P~i)1:3)I2(K(Exp(ξ)P~i)1:3)+I2(p2)p2p2ξδξ\begin{aligned} &\phantom{\approx}I_2(K(Exp(\xi)\tilde{P}_i)_{1:3} + K(\delta\xi^\wedge Exp(\xi)\tilde{P}_i)_{1:3}) \\ &\approx I_2(K(Exp(\xi)\tilde{P}_i)_{1:3}) + \frac{\partial I_2(p_2)}{\partial p_2} \frac{\partial p_2}{\partial \xi} \delta\xi \end{aligned}

And we have the fully expanded form of e(ξδξ)e(\xi\oplus\delta\xi)

e(ξδξ)=I1(KPi)(I2(K(Exp(δξ)Exp(ξ)Pi)1:3)+I2(p2)p2p2ξδξ)e(\xi \oplus \delta\xi) = I_1(KP_i) - \left(I_2(K(Exp(\delta\xi)Exp(\xi)P_i)_{1:3}) + \frac{\partial I_2(p_2)}{\partial p_2} \frac{\partial p_2}{\partial \xi} \delta\xi \right)

where we can derive the derivative under left perturbation

e(ξδξ)δξI2(p2)p2p2ξ\frac{\partial e(\xi \oplus \delta\xi)}{\partial \delta\xi} \approx -\frac{\partial I_2(p_2)}{\partial p_2}\frac{\partial p_2}{\partial \xi}