sift算法理解

SIFT(Scale-Invariant Feature Transform)

尺度不變特征轉換(Scale-invariant feature transform或SIFT)是一種電腦視覺的算法用來偵測與描述影像中的局部性特征，它在空間尺度中尋找極值點，并提取出其位置、尺度、旋轉不變量，此算法由 David Lowe在1999年所發表，2004年完善總結。在CNN用來圖像特征提取之前，SIFT算是手工設計特征提取算法的巔峰了。關于sift算法，有很多大佬講的很好了，參考：

SIFT原始論文翻譯：

http://www.rzrgm.cn/cuteshongshong/archive/2012/05/25/2506374.html
大佬講解文章：

https://blog.csdn.net/zddblog/article/details/7521424

https://blog.csdn.net/qq_37174526/article/details/100052271

http://www.rzrgm.cn/Alliswell-WP/p/SIFT.html

http://www.rzrgm.cn/wangguchangqing/p/9176103.html

http://www.rzrgm.cn/wangguchangqing/p/4853263.html
源碼學習：

vlFeat： vision Lab Feature library

? https://www.vlfeat.org/

? https://github.com/vlfeat/vlfeat

? https://www.vlfeat.org/overview/sift.html

opencv

https://github.com/opencv/opencv/blob/68d15fc62edad980f1ffa15ee478438335f39cc3/modules/features2d/src/sift.dispatch.cpp

python：

https://github.com/rmislam/PythonSIFT

首先看一下sift特征提取和匹配算法的步驟：

構建尺度空間，生成高斯差分金字塔（DOG金字塔）
空間極值點檢測(關鍵點的初步查探)
穩健關鍵點的精確定位和篩選
- 精確定位：離散空間的局部極值，通過迭代法尋找連續空間極值
- 閾值篩選：消除小于閾值的極值
- 消除邊緣效應：去除掉可能是圖片卷積過程中邊緣效應產生的極值點
穩定關鍵點的方向信息分配
關鍵點描述
特征點匹配

sift詳細解釋，大佬們已經講的很好了，這里只記錄下自己的一些困惑和理解，希望能幫助到和我一樣的同學。(個人理解不一定完全對，歡迎糾錯)

1. 高斯差分金字塔

sift為什么要建立高斯差分金字塔呢？弄清楚這個，得注意兩個關鍵詞：高斯，差分。

高斯

熟悉圖像處理的同學，都知道高斯卷積核能用來平滑圖像，使圖像變得模糊，如opencv中的GaussianBlur函數。而sift算法是為了找到尺度不變的特征點，就是圖片中物體的特征點不隨尺度變化，所以要得到物體不同尺度下的圖片，對圖片進行高斯卷積可以模擬物體在不同尺度下的圖片(想象下，我們把圖片縮小，圖片中的物體會變小，同時也變得模糊)

差分

得到物體在不同尺度下的圖片后，很明顯下一步要做的就是特征提取，所謂圖片特征就是特別突出的像素點，如比局部背景像素點都大的像素點，就是圖像處理中常說的邊緣，角點等。在圖像處理中，laplacian算子能用來提取物體邊緣，觀察下laplacian卷積核，就能發現laplacian算子找到的這些邊緣都是局部極值，如下是一個3x3的laplacian卷積核：(比周圍四個像素點都大或都小才會被提取出來)

\[\left[ \begin{array}{ll} 0& 1& 0& \\ 1& -4& 1& \\ 0& 1& 0& \\ \end{array} \right] \]

所以總結下，先對圖片進行高斯卷積，再進行拉普拉斯卷積，就能得到不同尺度下物體的特征，這一步驟常稱為高斯拉普拉斯(LoG, Laplacian of Gaussian)，但是這個計算比較耗時，大佬們發現了一種近似LoG的算法，但是計算更加簡單，就是高斯差分(DoG, Difference of Gaussian)。高斯差分DoG的步驟，就是采用不同sigma的高斯核分別對兩幅圖片進行卷積，然后求兩幅圖片相減，即差分。

采用LoG提取特征，LoG具有旋轉不變性：http://www.rzrgm.cn/Alliswell-WP/p/Laplace_Gaussian.html

關于Dog是LoG的近似，推導過程：http://www.rzrgm.cn/silence-cho/p/13621975.html

2. 穩健關鍵點的精確定位和篩選

sift算法在找到尺度空間中的極值點后，還進行了一步精確定位和篩選，從而獲得穩定的關鍵點，主要包括三步：

精確定位：離散空間的局部極值，通過迭代法尋找連續空間極值
閾值篩選：消除小于閾值的極值
消除邊緣效應：去除掉可能是圖片卷積過程中邊緣效應產生的極值點

2.1 精確定位

在確定極值點時，是通過與其鄰近的26個點作比較，如下圖所示，中間尺度圖片上的點，與同一尺度的8個點，上下兩個尺度各9個點，共26個點進行大小比較，確定其是不是極值點。如果我們建立一個三維坐標系，包括x， y， s三個維度，分別表示圖片的x軸方向，y軸方向，以及圖片的尺度，對于三個維度下的每個整數坐標點與其鄰域26個整數坐標點進行比較，很明顯我們得到是一個離散空間極值點，如果想要得到連續空間的極值點，則需要對這個極值點的位置進行修正，即精確定位極值點。

如何確定連續空間的極值點呢？一提到極值點，熟悉深度學習的同學很快就會想到梯度下降法和牛頓迭代法，sift這里采用的就是類似的牛頓迭代法。牛頓迭代法的公式如下：

\[x^{(t+1)} = x^{(t)} - H_t^{-1}g_t\\ 其中g_t為x_t點的梯度，H_t為海森矩陣 \]

上述公式含義就是, 對于函數\(f(x)\)，在坐標點\(x^{(t)}\)，沿著\(- H_t^{-1}g_t\)方向移動，就能得到函數值比其更小的值，只要我們進行不斷的迭代，坐標點就會收斂于極值點處。

所以回到我們的原始問題中，在圖像上某一點處，我們是能計算該像素點處的一階導數和二階導數，也就能得到梯度\(g\)和海森矩陣\(H\), 那么在離散空間的極值點處，通過牛頓迭代法，就能得到連續空間的極值點。可以看下sift算法源碼中是怎么計算的：

有x, y, s三個維度，所以梯度為三個維度的一階偏導數，海森矩陣是三個維度的二階偏導數，即：

\[梯度：g = [dx, dy, ds]\\ 海森矩陣：H = \left[ \begin{array}{ll} dxx& dxy& dxs& \\ dyx& dyy& dys& \\ dsx& dsy& dss& \\ \end{array} \right] \]

看下代碼實現：

梯度計算

def computeGradientAtCenterPixel(pixel_array):
    dx = 0.5 * (pixel_array[1, 1, 2] - pixel_array[1, 1, 0])
    dy = 0.5 * (pixel_array[1, 2, 1] - pixel_array[1, 0, 1])
    ds = 0.5 * (pixel_array[2, 1, 1] - pixel_array[0, 1, 1])
    return array([dx, dy, ds])
    
函數說明：
"""Approximate gradient at center pixel [1, 1, 1] of 3x3x3 array using central difference formula of order O(h^2), where h is the step size
    # With step size h, the central difference formula of order O(h^2) for f'(x) is (f(x + h) - f(x - h)) / (2 * h)
    # Here h = 1, so the formula simplifies to f'(x) = (f(x + 1) - f(x - 1)) / 2
    # NOTE: x corresponds to second array axis, y corresponds to first array axis, and s (scale) corresponds to third array axis
"""

海森矩陣計算

def computeHessianAtCenterPixel(pixel_array):
    center_pixel_value = pixel_array[1, 1, 1]
    dxx = pixel_array[1, 1, 2] - 2 * center_pixel_value + pixel_array[1, 1, 0]
    dyy = pixel_array[1, 2, 1] - 2 * center_pixel_value + pixel_array[1, 0, 1]
    dss = pixel_array[2, 1, 1] - 2 * center_pixel_value + pixel_array[0, 1, 1]
    dxy = 0.25 * (pixel_array[1, 2, 2] - pixel_array[1, 2, 0] - pixel_array[1, 0, 2] + pixel_array[1, 0, 0])
    dxs = 0.25 * (pixel_array[2, 1, 2] - pixel_array[2, 1, 0] - pixel_array[0, 1, 2] + pixel_array[0, 1, 0])
    dys = 0.25 * (pixel_array[2, 2, 1] - pixel_array[2, 0, 1] - pixel_array[0, 2, 1] + pixel_array[0, 0, 1])
    return array([[dxx, dxy, dxs], 
                  [dxy, dyy, dys],
                  [dxs, dys, dss]])
函數說明：
"""Approximate Hessian at center pixel [1, 1, 1] of 3x3x3 array using central difference formula of order O(h^2), where h is the step size
    """
    # With step size h, the central difference formula of order O(h^2) for f''(x) is (f(x + h) - 2 * f(x) + f(x - h)) / (h ^ 2)
    # Here h = 1, so the formula simplifies to f''(x) = f(x + 1) - 2 * f(x) + f(x - 1)
    # With step size h, the central difference formula of order O(h^2) for (d^2) f(x, y) / (dx dy) = (f(x + h, y + h) - f(x + h, y - h) - f(x - h, y + h) + f(x - h, y - h)) / (4 * h ^ 2)
    # Here h = 1, so the formula simplifies to (d^2) f(x, y) / (dx dy) = (f(x + 1, y + 1) - f(x + 1, y - 1) - f(x - 1, y + 1) + f(x - 1, y - 1)) / 4
    # NOTE: x corresponds to second array axis, y corresponds to first array axis, and s (scale) corresponds to third array axis

然后計算\(- H_t^{-1}g_t\)時，一般不直接計算H的逆矩陣，而是采用最小二乘法求解矩陣方程的方式，如下所示，矩陣方程\(Hx = g\)的解x正是我們需要的值，而矩陣方程的解可以采用最小二乘法求解。

\[Hx = g\\ x = H^{-1}g \]

計算處\(- H_t^{-1}g_t\)后，分別對x,y,s三個維度坐標值進行更新，sift算法中只對大于0.5的值進行更新，太小就不更新(小于0.5,說明此處就是極值點，不需要更新)，而且只進行了5次迭代,看下代碼如下：

    for attempt_index in range(num_attempts_until_convergence):  # num_attempts_until_convergence=5，表示5次迭代
        # need to convert from uint8 to float32 to compute derivatives and need to rescale pixel values to [0, 1] to apply Lowe's thresholds
        first_image, second_image, third_image = dog_images_in_octave[image_index-1:image_index+2]
        pixel_cube = stack([first_image[i-1:i+2, j-1:j+2],
                            second_image[i-1:i+2, j-1:j+2],
                            third_image[i-1:i+2, j-1:j+2]]).astype('float32') / 255.
        gradient = computeGradientAtCenterPixel(pixel_cube)  # 梯度值g
        hessian = computeHessianAtCenterPixel(pixel_cube)    # 海森矩陣H
        extremum_update = -lstsq(hessian, gradient, rcond=None)[0] # 最小二乘法求解H^-1 *g
        if abs(extremum_update[0]) < 0.5 and abs(extremum_update[1]) < 0.5 and abs(extremum_update[2]) < 0.5:
            break
        j += int(round(extremum_update[0]))  #更新x
        i += int(round(extremum_update[1]))  #更新y
        image_index += int(round(extremum_update[2]))  #更新s
        # make sure the new pixel_cube will lie entirely within the image
        if i < image_border_width or i >= image_shape[0] - image_border_width or j < image_border_width or j >= image_shape[1] - image_border_width or image_index < 1 or image_index > num_intervals:
            extremum_is_outside_image = True
            break

迭代結束后，精確定位了極值點位置，根據泰勒公式可以得到該位置對應極值點的近似值，泰勒公式的二次展開如下：

\[f(\vec X) = f(\vec X_0) + (\vec X-\vec X_0)^T g + \frac{1}{2}(\vec X-\vec X_0)^TH (\vec X-\vec X_0)\\ \]

上面式子中，\(\vec X = (x, y, s)\)表示尺度空間中某一點，即從 \(\vec X_0\) 移動到 \(\vec X\) 處，根據我們迭代公式知道，移動值\(\vec X-\vec X_0\)為\(- H_t^{-1}g_t\)，帶入上述公式得到:

\[f(\vec X) = f(\vec X_0) + (\vec X-\vec X_0)^T g + \frac{1}{2}(\vec X-\vec X_0)^TH (\vec X-\vec X_0)\\ =f(\vec X_0) + (- H^{-1}g)^Tg + \frac{1}{2}(- H^{-1}g)^TH(- H^{-1}g)\\ =f(\vec X_0) + g^T(-H^{-1})g + \frac{1}{2}g^T(-H^{-1})H(- H^{-1}g)\\ = f(\vec X_0) + g^T(-H^{-1})g + \frac{1}{2}g^T(H^{-1}g)\\ =f(\vec X_0) + \frac{1}{2}g^T(-H^{-1}g) \]

對應代碼中如下：

functionValueAtUpdatedExtremum = pixel_cube[1, 1, 1] + 0.5 * dot(gradient, extremum_update)  # extremum_update即為-H^-1g

牛頓迭代法參考： https://zhuanlan.zhihu.com/p/37588590
http://www.rzrgm.cn/happylion/p/4172632.html

2.2 閾值篩選

這一步比較簡單，在上述精確定位極值點位置后，根據泰勒公式得到該位置的對應極值，sift中設置了一個閾值(contrast_threshold)，如果小于閾值就刪除掉這個極值點。

2.3 消除邊緣效應

邊緣效應指對圖片進行卷積時，對于圖片上下左右邊界處的像素點，常需要對圖片補0，方便進行卷積運算，會導致邊界處產生較強的效應，得到錯誤的極值點。對于圖片邊界處錯誤的極值點，其x方向和y方向上二階導數，會出現一大一小。如下圖中，在圖片的上邊邊界，x方向上的二階導數dxx比較小(兩側都位于圖像中)，y方向上的二階導數dyy較大(一側位于圖像中，對應值為像素值，另一側位于圖像外面，對應值為填充的0)

所以邊緣效應產生的極值點，一個方向上二階導數大，另一個方向上二階導數小，那么只需過濾掉兩者比值較大的極值點。上面計算的海森矩陣包括了x方向和y方向的二階導數，

\[海森矩陣：H = \left[ \begin{array}{ll} dxx& dxy& dxs& \\ dyx& dyy& dys& \\ dsx& dsy& dss& \\ \end{array} \right] \]

假設\(dxx=\alpha, dyy=\beta, \alpha = r \beta\), 則可以構建下面的等式，很明顯r=1時，這個表達式取得最小值，r大于1，或者小于1時，表達式值增大。所以只需要設置一個閾值，計算下面表達式的值，是否超過閾值即可。

\[\frac{(\alpha + \beta)^2}{\alpha \beta} = \frac{(r\beta + \beta)^2}{r\beta^2}=\frac{(r + 1)^2}{r} \]

代碼中如下：

functionValueAtUpdatedExtremum = pixel_cube[1, 1, 1] + 0.5 * dot(gradient, extremum_update)
    if abs(functionValueAtUpdatedExtremum) * num_intervals >= contrast_threshold:
        xy_hessian = hessian[:2, :2]        # 海森矩陣
        xy_hessian_trace = trace(xy_hessian) # α+β
        xy_hessian_det = det(xy_hessian)    # αβ
        if xy_hessian_det > 0 and eigenvalue_ratio * (xy_hessian_trace ** 2) < ((eigenvalue_ratio + 1) ** 2) * xy_hessian_det:     # eigenvalue_ratio 即為r
            # Contrast check passed -- construct and return OpenCV KeyPoint object
            keypoint = KeyPoint()
            keypoint.pt = ((j + extremum_update[0]) * (2 ** octave_index), (i + extremum_update[1]) * (2 ** octave_index))
            keypoint.octave = octave_index + image_index * (2 ** 8) + int(round((extremum_update[2] + 0.5) * 255)) * (2 ** 16)
            keypoint.size = sigma * (2 ** ((image_index + extremum_update[2]) / float32(num_intervals))) * (2 ** (octave_index + 1))  # octave_index + 1 because the input image was doubled
            keypoint.response = abs(functionValueAtUpdatedExtremum)
            return keypoint, image_index

邊緣效應參考：http://www.rzrgm.cn/pegasus/archive/2011/05/19/2051416.html
https://blog.csdn.net/qq_34886403/article/details/83589108
https://zhuanlan.zhihu.com/p/366778678

3. 關鍵點描述

在生成關鍵點的描述子(128維向量)時，有個地方也值得注意下，sift算法采用了三線性插值(逆運算)。

如下圖所示，這里也有三個維度：x，y，angle，分別表示關鍵點的x值，y值，幅角，而對應的值f(x, y, angle)表示幅值。下面圖中紅色點是我們計算得到一個坐標點(x, y, angle), 這個坐標可能是小數，如(1.5, 1.5 1.5), 我們需要將其幅值分配到與其相鄰的八個整數坐標點上, 就需要采用三次插值逆運算，即分別在x維度，y維度，angle維度進行幅值分配。

(注意：sift中將360度分成了8個區間，每個區間45度，所以angle維度中：1表示45度，2表示90度，。。。7表示360度)

最后，sift中還有很多細節和參數值計算方法，多看幾遍代碼和講解文章應該就清楚了。

posted @ 2021-08-15 14:54 silence_cho 閱讀(2361) 評論(0) 收藏舉報

刷新頁面返回頂部

silence_cho