【深度學習系列】卷積神經網絡詳解(二)——自己手寫一個卷積神經網絡

　　上篇文章中我們講解了卷積神經網絡的基本原理，包括幾個基本層的定義、運算規則等。本文主要寫卷積神經網絡如何進行一次完整的訓練，包括前向傳播和反向傳播，并自己手寫一個卷積神經網絡。如果不了解基本原理的，可以先看看上篇文章：【深度學習系列】卷積神經網絡CNN原理詳解(一)——基本原理

卷積神經網絡的前向傳播

　　首先我們來看一個最簡單的卷積神經網絡：

　1.輸入層---->卷積層

　　以上一節的例子為例，輸入是一個4*4 的image，經過兩個2*2的卷積核進行卷積運算后，變成兩個3*3的feature_map

　　以卷積核filter1為例(stride = 1 )：

　　計算第一個卷積層神經元o₁₁的輸入:　

\begin{equation}
\begin{aligned}
\ net_{o_{11}}&= conv (input,filter)\\
&= i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22}\\
&=1 \times 1 + 0 \times (-1) +1 \times 1 + 1 \times (-1)=1
\end{aligned}
\end{equation}

　　神經元o₁₁的輸出:(此處使用Relu激活函數)

\begin{equation}
\begin{aligned}
out_{o_{11}} &= activators(net_{o_{11}}) \\
&=max(0,net_{o_{11}}) = 1
\end{aligned}
\end{equation}

　　其他神經元計算方式相同

　2.卷積層---->池化層

　　計算池化層m₁₁的輸入(取窗口為 2 * 2),池化層沒有激活函數　　

\begin{equation}
\begin{aligned}
net_{m_{11}} &= max(o_{11},o_{12},o_{21},o_{22}) = 1\\
&out_{m_{11}} = net_{m_{11}} = 1
\end{aligned}
\end{equation}

　　3.池化層---->全連接層

　　池化層的輸出到flatten層把所有元素“拍平”，然后到全連接層。

　　4.全連接層---->輸出層

　　全連接層到輸出層就是正常的神經元與神經元之間的鄰接相連，通過softmax函數計算后輸出到output，得到不同類別的概率值，輸出概率值最大的即為該圖片的類別。

卷積神經網絡的反向傳播

　　傳統的神經網絡是全連接形式的，如果進行反向傳播，只需要由下一層對前一層不斷的求偏導，即求鏈式偏導就可以求出每一層的誤差敏感項，然后求出權重和偏置項的梯度，即可更新權重。而卷積神經網絡有兩個特殊的層：卷積層和池化層。池化層輸出時不需要經過激活函數，是一個滑動窗口的最大值，一個常數，那么它的偏導是1。池化層相當于對上層圖片做了一個壓縮，這個反向求誤差敏感項時與傳統的反向傳播方式不同。從卷積后的feature_map反向傳播到前一層時，由于前向傳播時是通過卷積核做卷積運算得到的feature_map，所以反向傳播與傳統的也不一樣，需要更新卷積核的參數。下面我們介紹一下池化層和卷積層是如何做反向傳播的。

　　在介紹之前，首先回顧一下傳統的反向傳播方法：

　　1.通過前向傳播計算每一層的輸入值$net_{i,j}$(如卷積后的feature_map的第一個神經元的輸入：$net_{i_{11}}$)

　　2.反向傳播計算每個神經元的誤差項$\delta_{i,j}$，$\delta_{i,j} = \frac{\partial E}{\partial net_{i,j}}$，其中E為損失函數計算得到的總體誤差，可以用平方差，交叉熵等表示。

　　3.計算每個神經元權重$w_{i,j}$的梯度，$\eta_{i,j} = \frac{\partial E}{\partial net_{i,j}} \cdot \frac{\partial net_{i,j}}{\partial w_{i,j}} = \delta_{i,j} \cdot out_{i,j}$

　　4.更新權重 $w_{i,j} = w_{i,j}-\lambda \cdot \eta_{i,j}$(其中$\lambda$為學習率)

　　卷積層的反向傳播

　　由前向傳播可得：

　　每一個神經元的值都是上一個神經元的輸入作為這個神經元的輸入，經過激活函數激活之后輸出，作為下一個神經元的輸入，在這里我用$i_{11}$表示前一層,$o_{11}$表示$i_{11}$的下一層。那么$net_{i_{11}}$就是i11這個神經元的輸入，$out_{i_{11}}$就是i11這個神經元的輸出，同理，$net_{o_{11}}$就是o11這個神經元的輸入，$out_{o_{11}}$就是$o_{11}$這個神經元的輸出,因為上一層神經元的輸出 = 下一層神經元的輸入，所以$out_{i_{11}}$= $net_{o_{11}}$，這里我為了簡化，直接把$out_{i_{11}}$記為$i_{11}$

\begin{equation}
\begin{aligned}
\ i_{11}
&=out_{i_{11}} \\
&= activators(net_{i_{11}})\\
\ net_{o_{11}}&= conv (input,filter)\\
&= i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22}\\
out_{o_{11}} &= activators(net_{o_{11}}) \\
&=max(0,net_{o_{11}})
\end{aligned}
\end{equation}

　　$net_{i_{11}}$表示上一層的輸入，$out_{i_{11}}$表示上一層的輸出

　　首先計算卷積的上一層的第一個元素$i_{11}$的誤差項$\delta_{11}$：

$$\delta_{11} = \frac{\partial E}{\partial net_{i_{11}}} =\frac{\partial E}{\partial out_{i_{11}}} \cdot \frac{\partial out_{i_{11}}}{\partial net_{i_{11}}} = \frac{\partial E}{\partial i_{11}} \cdot \frac{\partial i_{11}}{\partial net_{i_{11}}}$$

　　先計算$\frac{\partial E}{\partial i_{11}} $

　　此處我們并不清楚$\frac{\partial E}{\partial i_{11}}$怎么算，那可以先把input層通過卷積核做完卷積運算后的輸出feature_map寫出來:

\begin{equation}
\begin{aligned}
net_{o_{11}} = i_{11} \times h_{11} + i_{12} \times h_{12} +i_{21} \times h_{21} + i_{22} \times h_{22} \\
net_{o_{12}} = i_{12} \times h_{11} + i_{13} \times h_{12} +i_{22} \times h_{21} + i_{23} \times h_{22} \\
net_{o_{12}} = i_{13} \times h_{11} + i_{14} \times h_{12} +i_{23} \times h_{21} + i_{24} \times h_{22} \\
net_{o_{21}} = i_{21} \times h_{11} + i_{22} \times h_{12} +i_{31} \times h_{21} + i_{32} \times h_{22} \\
net_{o_{22}} = i_{22} \times h_{11} + i_{23} \times h_{12} +i_{32} \times h_{21} + i_{33} \times h_{22} \\
net_{o_{23}} = i_{23} \times h_{11} + i_{24} \times h_{12} +i_{33} \times h_{21} + i_{34} \times h_{22} \\
net_{o_{31}} = i_{31} \times h_{11} + i_{32} \times h_{12} +i_{41} \times h_{21} + i_{42} \times h_{22} \\
net_{o_{32}} = i_{32} \times h_{11} + i_{33} \times h_{12} +i_{42} \times h_{21} + i_{43} \times h_{22} \\
net_{o_{33}} = i_{33} \times h_{11} + i_{34} \times h_{12} +i_{43} \times h_{21} + i_{44} \times h_{22} \\
\end{aligned}
\end{equation}

　　然后依次對輸入元素$i_{i,j}$求偏導

　　$i_{11}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{11}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{11}}\\
&=\delta_{11} \cdot h_{11}
\end{aligned}
\end{equation}

　　$i_{12}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{12}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{12}} +\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{12}}\\
&=\delta_{11} \cdot h_{12}+\delta_{12} \cdot h_{11}
\end{aligned}
\end{equation}

　　$i_{13}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{13}}&=\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{13}} +\frac{\partial E}{\partial net_{o_{13}}} \cdot \frac{\partial net_{o_{13}}}{\partial i_{13}}\\
&=\delta_{12} \cdot h_{12}+\delta_{13} \cdot h_{11}
\end{aligned}
\end{equation}

　　$i_{21}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{21}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{21}} +\frac{\partial E}{\partial net_{o_{21}}} \cdot \frac{\partial net_{o_{21}}}{\partial i_{21}}\\
&=\delta_{11} \cdot h_{21}+\delta_{21} \cdot h_{11}
\end{aligned}
\end{equation}

　　$i_{22}$的偏導：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial i_{22}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial i_{22}} +\frac{\partial E}{\partial net_{o_{12}}} \cdot \frac{\partial net_{o_{12}}}{\partial i_{22}}\\
&+\frac{\partial E}{\partial net_{o_{21}}} \cdot \frac{\partial net_{o_{21}}}{\partial i_{22}}+\frac{\partial E}{\partial net_{o_{22}}} \cdot \frac{\partial net_{o_{22}}}{\partial i_{22}}\\
&=\delta_{11} \cdot h_{22}+\delta_{12} \cdot h_{21}+\delta_{21} \cdot h_{12}+\delta_{22} \cdot h_{11}
\end{aligned}
\end{equation}

　　觀察一下上面幾個式子的規律，歸納一下，可以得到如下表達式：

\begin{equation}
{
\left[ \begin{array}{ccc}
0& 0& 0& 0& 0& \\
0& \delta_{11} & \delta_{12} & \delta_{13}&0\\
0&\delta_{21} & \delta_{22} & \delta_{23} &0\\
0&\delta_{31} & \delta_{32} & \delta_{33} &0\\
0& 0& 0& 0& 0& \\
\end{array}
\right ]
\cdot
\left[ \begin{array}{ccc}
h_{22}& h_{21} \\
h_{12}& h_{11} \\
\end{array}
\right]}=
\left[ \begin{array}{ccc}
\frac{\partial E}{\partial i_{11}}& \frac{\partial E}{\partial i_{12}}& \frac{\partial E}{\partial i_{13}}& \frac{\partial E}{\partial i_{14}} \\
\frac{\partial E}{\partial i_{21}}& \frac{\partial E}{\partial i_{22}}& \frac{\partial E}{\partial i_{23}}& \frac{\partial E}{\partial i_{24}} \\
\frac{\partial E}{\partial i_{31}}& \frac{\partial E}{\partial i_{32}}& \frac{\partial E}{\partial i_{33}}& \frac{\partial E}{\partial i_{34}} \\
\frac{\partial E}{\partial i_{41}}& \frac{\partial E}{\partial i_{42}}& \frac{\partial E}{\partial i_{43}}& \frac{\partial E}{\partial i_{44}} \\
\end{array}
\right]
\end{equation}

　　圖中的卷積核進行了180°翻轉，與這一層的誤差敏感項矩陣${delta_{i,j})}$周圍補零后的矩陣做卷積運算后，就可以得到${\frac{\partial E}{\partial i_{11}}}$，即

$\frac{\partial E}{\partial i_{i,j}} = \sum_m \cdot \sum_n h_{m,n}\delta_{i+m,j+n}$

　　第一項求完后，我們來求第二項$\frac{\partial i_{11}}{\partial net_{i_{11}}}$

\begin{equation}
\begin{aligned}
\because i_{11} &= out_{i_{11}} \\
&= activators(net_{i_{11}})\\
\therefore \frac{\partial i_{11}}{\partial net_{i_{11}}}
&=f'(net_{i_{11}})\\
\therefore \delta_{11} &=\frac{\partial E}{\partial net_{i_{11}}} \\
&=\frac{\partial E}{\partial i_{11}} \cdot \frac{\partial i_{11}}{\partial net_{i_{11}}}\\
&=\sum_m \cdot \sum_n h_{m,n}\delta_{i+m,j+n} \cdot f'(net_{i_{11}})
\end{aligned}
\end{equation}

　　此時我們的誤差敏感矩陣就求完了，得到誤差敏感矩陣后，即可求權重的梯度。

　　由于上面已經寫出了卷積層的輸入$net_{o_{11}}$與權重$h_{i,j}$之間的表達式，所以可以直接求出：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial h_{11}}&=\frac{\partial E}{\partial net_{o_{11}}} \cdot \frac{\partial net_{o_{11}}}{\partial h_{11}}+...\\
&+\frac{\partial E}{\partial net_{o_{33}}} \cdot \frac{\partial net_{o_{33}}}{\partial h_{11}}\\
&=\delta_{11} \cdot h_{11} +...+ \delta_{33} \cdot h_{11}
\end{aligned}
\end{equation}

　　推論出權重的梯度：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial h_{i,j}} = \sum_m\sum_n\delta_{m,n}out_{o_{i+m,j+n}}
\end{aligned}
\end{equation}

　　偏置項的梯度：

\begin{equation}
\begin{aligned}
\frac{\partial E}{\partial b} &=\frac{\partial E}{\partial net_{o_{11}}} \frac{\partial net_{o_{11}}}{\partial w_b} +\frac{\partial E}{\partial net_{o_{12}}} \frac{\partial net_{o_{12}}}{\partial w_b}\\
&+\frac{\partial E}{\partial net_{o_{21}}} \frac{\partial net_{o_{21}}}{\partial w_b} +\frac{\partial E}{\partial net_{o_{22}}} \frac{\partial net_{o_{22}}}{\partial w_b}\\
&=\delta_{11}+\delta_{12}+\delta_{21}+\delta_{22}\\
&=\sum_i\sum_j\delta_{i,j}
\end{aligned}
\end{equation}

　　可以看出，偏置項的偏導等于這一層所有誤差敏感項之和。得到了權重和偏置項的梯度后，就可以根據梯度下降法更新權重和梯度了。

　 池化層的反向傳播

　池化層的反向傳播就比較好求了，看著下面的圖，左邊是上一層的輸出，也就是卷積層的輸出feature_map，右邊是池化層的輸入，還是先根據前向傳播，把式子都寫出來，方便計算：

　　假設上一層這個滑動窗口的最大值是$out_{o_{11}}$
\begin{equation}
\begin{aligned}
&\because net_{m_{11}} = max(out_{o_{11}},out_{o_{12}},out_{o_{21}},out_{o_{22}})\\
&\therefore \frac{\partial net_{m_{11}}}{\partial out_{o_{11}}} = 1\\
& \frac{\partial net_{m_{11}}}{\partial out_{o_{12}}}=\frac{\partial net_{m_{11}}}{\partial out_{o_{21}}}=\frac{\partial net_{m_{11}}}{\partial out_{o_{22}}} = 0\\
&\therefore \delta_{11}^{l-1} = \frac{\partial E}{\partial out_{o_{11}}} = \frac{\partial E}{\partial net_{m_{11}}} \cdot \frac{\partial net_{m_{11}}}{\partial out_{o_{11}}} =\delta_{11}^l\\
&\delta_{12}^{l-1} = \delta_{21}^{l-1} =\delta_{22}^{l-1} = 0
\end{aligned}
\end{equation}

　　這樣就求出了池化層的誤差敏感項矩陣。同理可以求出每個神經元的梯度并更新權重。

手寫一個卷積神經網絡

　　1.定義一個卷積層

　　首先我們通過ConvLayer來實現一個卷積層，定義卷積層的超參數

 1 class ConvLayer(object):
 2     '''
 3     參數含義：
 4     input_width:輸入圖片尺寸——寬度
 5     input_height:輸入圖片尺寸——長度
 6     channel_number:通道數，彩色為3，灰色為1
 7     filter_width:卷積核的寬
 8     filter_height:卷積核的長
 9     filter_number:卷積核數量
10     zero_padding：補零長度
11     stride:步長
12     activator:激活函數
13     learning_rate:學習率
14     '''
15     def __init__(self, input_width, input_height,
16                  channel_number, filter_width,
17                  filter_height, filter_number,
18                  zero_padding, stride, activator,
19                  learning_rate):
20         self.input_width = input_width
21         self.input_height = input_height
22         self.channel_number = channel_number
23         self.filter_width = filter_width
24         self.filter_height = filter_height
25         self.filter_number = filter_number
26         self.zero_padding = zero_padding
27         self.stride = stride
28         self.output_width = \
29             ConvLayer.calculate_output_size(
30             self.input_width, filter_width, zero_padding,
31             stride)
32         self.output_height = \
33             ConvLayer.calculate_output_size(
34             self.input_height, filter_height, zero_padding,
35             stride)
36         self.output_array = np.zeros((self.filter_number,
37             self.output_height, self.output_width))
38         self.filters = []
39         for i in range(filter_number):
40             self.filters.append(Filter(filter_width,
41                 filter_height, self.channel_number))
42         self.activator = activator
43         self.learning_rate = learning_rate

　　其中calculate_output_size用來計算通過卷積運算后輸出的feature_map大小

1 @staticmethod
2     def calculate_output_size(input_size,
3             filter_size, zero_padding, stride):
4         return (input_size - filter_size +
5             2 * zero_padding) / stride + 1

　　2.構造一個激活函數

　　此處用的是RELU激活函數，因此我們在activators.py里定義，forward是前向計算，backforward是計算公式的導數：

1 class ReluActivator(object):
2     def forward(self, weighted_input):
3         #return weighted_input
4         return max(0, weighted_input)
5 
6     def backward(self, output):
7         return 1 if output > 0 else 0

　　其他常見的激活函數我們也可以放到activators里，如sigmoid函數，我們可以做如下定義：

1 class SigmoidActivator(object):
2     def forward(self, weighted_input):
3         return 1.0 / (1.0 + np.exp(-weighted_input))
4     #the partial of sigmoid
5     def backward(self, output):
6         return output * (1 - output)

　　如果我們需要自動以其他的激活函數，都可以在activator.py定義一個類即可。

　　3.定義一個類，保存卷積層的參數和梯度

 1 class Filter(object):
 2     def __init__(self, width, height, depth):
 3         #初始權重
 4         self.weights = np.random.uniform(-1e-4, 1e-4,
 5             (depth, height, width))
 6         #初始偏置
 7         self.bias = 0
 8         self.weights_grad = np.zeros(
 9             self.weights.shape)
10         self.bias_grad = 0
11 
12     def __repr__(self):
13         return 'filter weights:\n%s\nbias:\n%s' % (
14             repr(self.weights), repr(self.bias))
15 
16     def get_weights(self):
17         return self.weights
18 
19     def get_bias(self):
20         return self.bias
21 
22     def update(self, learning_rate):
23         self.weights -= learning_rate * self.weights_grad
24         self.bias -= learning_rate * self.bias_grad

　　4.卷積層的前向傳播

　　1).獲取卷積區域

 1 # 獲取卷積區域
 2 def get_patch(input_array, i, j, filter_width,
 3               filter_height, stride):
 4     '''
 5     從輸入數組中獲取本次卷積的區域，
 6     自動適配輸入為2D和3D的情況
 7     '''
 8     start_i = i * stride
 9     start_j = j * stride
10     if input_array.ndim == 2:
11         input_array_conv = input_array[
12             start_i : start_i + filter_height,
13             start_j : start_j + filter_width]
14         print "input_array_conv:",input_array_conv
15         return input_array_conv
16 
17     elif input_array.ndim == 3:
18         input_array_conv = input_array[:,
19             start_i : start_i + filter_height,
20             start_j : start_j + filter_width]
21         print "input_array_conv:",input_array_conv
22         return input_array_conv

　　2).進行卷積運算

 1 def conv(input_array,
 2          kernel_array,
 3          output_array,
 4          stride, bias):
 5     '''
 6     計算卷積，自動適配輸入為2D和3D的情況
 7     '''
 8     channel_number = input_array.ndim
 9     output_width = output_array.shape[1]
10     output_height = output_array.shape[0]
11     kernel_width = kernel_array.shape[-1]
12     kernel_height = kernel_array.shape[-2]
13     for i in range(output_height):
14         for j in range(output_width):
15             output_array[i][j] = (
16                 get_patch(input_array, i, j, kernel_width,
17                     kernel_height, stride) * kernel_array
18                 ).sum() + bias

　　3).增加zero_padding

 1 #增加Zero padding
 2 def padding(input_array, zp):
 3     '''
 4     為數組增加Zero padding，自動適配輸入為2D和3D的情況
 5     '''
 6     if zp == 0:
 7         return input_array
 8     else:
 9         if input_array.ndim == 3:
10             input_width = input_array.shape[2]
11             input_height = input_array.shape[1]
12             input_depth = input_array.shape[0]
13             padded_array = np.zeros((
14                 input_depth,
15                 input_height + 2 * zp,
16                 input_width + 2 * zp))
17             padded_array[:,
18                 zp : zp + input_height,
19                 zp : zp + input_width] = input_array
20             return padded_array
21         elif input_array.ndim == 2:
22             input_width = input_array.shape[1]
23             input_height = input_array.shape[0]
24             padded_array = np.zeros((
25                 input_height + 2 * zp,
26                 input_width + 2 * zp))
27             padded_array[zp : zp + input_height,
28                 zp : zp + input_width] = input_array
29             return padded_array

　　4).進行前向傳播

 1 def forward(self, input_array):
 2         '''
 3         計算卷積層的輸出
 4         輸出結果保存在self.output_array
 5         '''
 6         self.input_array = input_array
 7         self.padded_input_array = padding(input_array,
 8             self.zero_padding)
 9         for f in range(self.filter_number):
10             filter = self.filters[f]
11             conv(self.padded_input_array,
12                 filter.get_weights(), self.output_array[f],
13                 self.stride, filter.get_bias())
14         element_wise_op(self.output_array,
15                         self.activator.forward)

　　其中element_wise_op函數是將每個組的元素對應相乘

1 # 對numpy數組進行element wise操作，將矩陣中的每個元素對應相乘
2 def element_wise_op(array, op):
3     for i in np.nditer(array,
4                        op_flags=['readwrite']):
5         i[...] = op(i)

　　5.卷積層的反向傳播

　　1).將誤差傳遞到上一層

 1 def bp_sensitivity_map(self, sensitivity_array,
 2                            activator):
 3         '''
 4         計算傳遞到上一層的sensitivity map
 5         sensitivity_array: 本層的sensitivity map
 6         activator: 上一層的激活函數
 7         '''
 8         # 處理卷積步長，對原始sensitivity map進行擴展
 9         expanded_array = self.expand_sensitivity_map(
10             sensitivity_array)
11         # full卷積，對sensitivitiy map進行zero padding
12         # 雖然原始輸入的zero padding單元也會獲得殘差
13         # 但這個殘差不需要繼續向上傳遞，因此就不計算了
14         expanded_width = expanded_array.shape[2]
15         zp = (self.input_width +
16               self.filter_width - 1 - expanded_width) / 2
17         padded_array = padding(expanded_array, zp)
18         # 初始化delta_array，用于保存傳遞到上一層的
19         # sensitivity map
20         self.delta_array = self.create_delta_array()
21         # 對于具有多個filter的卷積層來說，最終傳遞到上一層的
22         # sensitivity map相當于所有的filter的
23         # sensitivity map之和
24         for f in range(self.filter_number):
25             filter = self.filters[f]
26             # 將filter權重翻轉180度
27             flipped_weights = np.array(map(
28                 lambda i: np.rot90(i, 2),
29                 filter.get_weights()))
30             # 計算與一個filter對應的delta_array
31             delta_array = self.create_delta_array()
32             for d in range(delta_array.shape[0]):
33                 conv(padded_array[f], flipped_weights[d],
34                     delta_array[d], 1, 0)
35             self.delta_array += delta_array
36         # 將計算結果與激活函數的偏導數做element-wise乘法操作
37         derivative_array = np.array(self.input_array)
38         element_wise_op(derivative_array,
39                         activator.backward)
40         self.delta_array *= derivative_array

　　2).保存傳遞到上一層的sensitivity map的數組

1 def create_delta_array(self):
2         return np.zeros((self.channel_number,
3             self.input_height, self.input_width))

　　3).計算代碼梯度

 1 def bp_gradient(self, sensitivity_array):
 2         # 處理卷積步長，對原始sensitivity map進行擴展
 3         expanded_array = self.expand_sensitivity_map(
 4             sensitivity_array)
 5         for f in range(self.filter_number):
 6             # 計算每個權重的梯度
 7             filter = self.filters[f]
 8             for d in range(filter.weights.shape[0]):
 9                 conv(self.padded_input_array[d],
10                      expanded_array[f],
11                      filter.weights_grad[d], 1, 0)
12             # 計算偏置項的梯度
13             filter.bias_grad = expanded_array[f].sum()

　　　4).按照梯度下降法更新參數

1 def update(self):
2         '''
3         按照梯度下降，更新權重
4         '''
5         for filter in self.filters:
6             filter.update(self.learning_rate)

　　6.MaxPooling層的訓練

　　1).定義MaxPooling類

 1 class MaxPoolingLayer(object):
 2     def __init__(self, input_width, input_height,
 3                  channel_number, filter_width,
 4                  filter_height, stride):
 5         self.input_width = input_width
 6         self.input_height = input_height
 7         self.channel_number = channel_number
 8         self.filter_width = filter_width
 9         self.filter_height = filter_height
10         self.stride = stride
11         self.output_width = (input_width -
12             filter_width) / self.stride + 1
13         self.output_height = (input_height -
14             filter_height) / self.stride + 1
15         self.output_array = np.zeros((self.channel_number,
16             self.output_height, self.output_width))

　　2).前向傳播計算

 1 # 前向傳播
 2     def forward(self, input_array):
 3         for d in range(self.channel_number):
 4             for i in range(self.output_height):
 5                 for j in range(self.output_width):
 6                     self.output_array[d,i,j] = (
 7                         get_patch(input_array[d], i, j,
 8                             self.filter_width,
 9                             self.filter_height,
10                             self.stride).max())

　　3).反向傳播計算

 1 #反向傳播
 2     def backward(self, input_array, sensitivity_array):
 3         self.delta_array = np.zeros(input_array.shape)
 4         for d in range(self.channel_number):
 5             for i in range(self.output_height):
 6                 for j in range(self.output_width):
 7                     patch_array = get_patch(
 8                         input_array[d], i, j,
 9                         self.filter_width,
10                         self.filter_height,
11                         self.stride)
12                     k, l = get_max_index(patch_array)
13                     self.delta_array[d,
14                         i * self.stride + k,
15                         j * self.stride + l] = \
16                         sensitivity_array[d,i,j]

　　完整代碼請見：cnn.py (https://github.com/huxiaoman7/PaddlePaddle_code/blob/master/1.mnist/cnn.py)

  1 #coding:utf-8
  2 '''
  3 Created by huxiaoman 2017.11.22
  4 
  5 '''
  6 
  7 import numpy as np
  8 from activators import ReluActivator,IdentityActivator
  9 
 10 class ConvLayer(object):
 11     def __init__(self,input_width,input_weight,
 12              channel_number,filter_width,
 13              filter_height,filter_number,
 14              zero_padding,stride,activator,
 15              learning_rate):
 16         self.input_width = input_width
 17         self.input_height = input_height
 18         self.channel_number = channel_number
 19         self.filter_width = filter_width
 20         self.filter_height = filter_height
 21         self.filter_number = filter_number
 22         self.zero_padding = zero_padding
 23         self.stride = stride #此處可以加上stride_x, stride_y
 24         self.output_width = ConvLayer.calculate_output_size(
 25                 self.input_width,filter_width,zero_padding,
 26                 stride)
 27         self.output_height = ConvLayer.calculate_output_size(
 28                 self.input_height,filter_height,zero_padding,
 29                 stride)
 30         self.output_array = np.zeros((self.filter_number,
 31                 self.output_height,self.output_width))
 32         self.filters = []
 33         for i in range(filter_number):    
 34             self.filters.append(Filter(filter_width,
 35                 filter_height,self.channel_number))
 36         self.activator = activator
 37         self.learning_rate = learning_rate
 38     def forward(self,input_array):
 39         '''
 40         計算卷積層的輸出
 41         輸出結果保存在self.output_array
 42         '''
 43         self.input_array = input_array
 44         self.padded_input_array = padding(input_array,
 45             self.zero_padding)
 46         for i in range(self.filter_number):
 47             filter = self.filters[f]
 48             conv(self.padded_input_array,
 49                  filter.get_weights(), self.output_array[f],
 50                  self.stride, filter.get_bias())
 51             element_wise_op(self.output_array,
 52                     self.activator.forward)
 53 
 54 def get_batch(input_array, i, j, filter_width,filter_height,stride):
 55     '''
 56     從輸入數組中獲取本次卷積的區域，
 57     自動適配輸入為2D和3D的情況
 58     '''
 59     start_i = i * stride
 60     start_j = j * stride
 61     if input_array.ndim == 2:
 62         return input_array[
 63             start_i : start_i + filter_height,
 64             start_j : start_j + filter_width]
 65     elif input_array.ndim == 3:
 66         return input_array[
 67             start_i : start_i + filter_height,
 68                         start_j : start_j + filter_width]
 69 
 70 # 獲取一個2D區域的最大值所在的索引
 71 def get_max_index(array):
 72     max_i = 0
 73     max_j = 0
 74     max_value = array[0,0]
 75     for i in range(array.shape[0]):
 76         for j in range(array.shape[1]):
 77             if array[i,j] > max_value:
 78                 max_value = array[i,j]
 79                 max_i, max_j = i, j
 80     return max_i, max_j
 81 
 82 def conv(input_array,kernal_array,
 83     output_array,stride,bias):
 84     '''
 85     計算卷積，自動適配輸入2D,3D的情況
 86     '''
 87     channel_number = input_array.ndim
 88     output_width = output_array.shape[1]
 89     output_height = output_array.shape[0]
 90     kernel_width = kernel_array.shape[-1]
 91     kernel_height = kernel_array.shape[-2]
 92     for i in range(output_height):
 93         for j in range(output_width):
 94             output_array[i][j] = (
 95                 get_patch(input_array, i, j, kernel_width,
 96                     kernel_height,stride) * kernel_array).sum() +bias
 97 
 98 
 99 def element_wise_op(array, op):
100     for i in np.nditer(array,
101                op_flags = ['readwrite']):
102         i[...] = op(i)
103 
104 
105 class ReluActivators(object):
106     def forward(self, weighted_input):
107         # Relu計算公式 = max(0,input)
108         return max(0, weighted_input)
109 
110     def backward(self,output):
111         return 1 if output > 0 else 0
112 
113 class SigmoidActivator(object):
114         
115     def forward(self,weighted_input):
116         return 1 / (1 + math.exp(- weighted_input))
117     
118     def backward(self,output):
119         return output * (1 - output)

View Code

　　最后，我們用之前的4 * 4的image數據檢驗一下通過一次卷積神經網絡進行前向傳播和反向傳播后的輸出結果：

 1 def init_test():
 2     a = np.array(
 3         [[[0,1,1,0,2],
 4           [2,2,2,2,1],
 5           [1,0,0,2,0],
 6           [0,1,1,0,0],
 7           [1,2,0,0,2]],
 8          [[1,0,2,2,0],
 9           [0,0,0,2,0],
10           [1,2,1,2,1],
11           [1,0,0,0,0],
12           [1,2,1,1,1]],
13          [[2,1,2,0,0],
14           [1,0,0,1,0],
15           [0,2,1,0,1],
16           [0,1,2,2,2],
17           [2,1,0,0,1]]])
18     b = np.array(
19         [[[0,1,1],
20           [2,2,2],
21           [1,0,0]],
22          [[1,0,2],
23           [0,0,0],
24           [1,2,1]]])
25     cl = ConvLayer(5,5,3,3,3,2,1,2,IdentityActivator(),0.001)
26     cl.filters[0].weights = np.array(
27         [[[-1,1,0],
28           [0,1,0],
29           [0,1,1]],
30          [[-1,-1,0],
31           [0,0,0],
32           [0,-1,0]],
33          [[0,0,-1],
34           [0,1,0],
35           [1,-1,-1]]], dtype=np.float64)
36     cl.filters[0].bias=1
37     cl.filters[1].weights = np.array(
38         [[[1,1,-1],
39           [-1,-1,1],
40           [0,-1,1]],
41          [[0,1,0],
42          [-1,0,-1],
43           [-1,1,0]],
44          [[-1,0,0],
45           [-1,0,1],
46           [-1,0,0]]], dtype=np.float64)
47     return a, b, cl

　　運行一下：

 1 def test():
 2     a, b, cl = init_test()
 3     cl.forward(a)
 4     print "前向傳播結果:", cl.output_array
 5     cl.backward(a, b, IdentityActivator())
 6     cl.update()
 7     print "反向傳播后更新得到的filter1:",cl.filters[0]
 8     print "反向傳播后更新得到的filter2:",cl.filters[1]
 9 
10 if __name__ == "__main__":
11         test()

　　運行結果：　

 1 前向傳播結果: [[[ 6.  7.  5.]
 2   [ 3. -1. -1.]
 3   [ 2. -1.  4.]]
 4 
 5  [[ 2. -5. -8.]
 6   [ 1. -4. -4.]
 7   [ 0. -5. -5.]]]
 8 反向傳播后更新得到的filter1: filter weights:
 9 array([[[-1.008,  0.99 , -0.009],
10         [-0.005,  0.994, -0.006],
11         [-0.006,  0.995,  0.996]],
12 
13        [[-1.004, -1.001, -0.004],
14         [-0.01 , -0.009, -0.012],
15         [-0.002, -1.002, -0.002]],
16 
17        [[-0.002, -0.002, -1.003],
18         [-0.005,  0.992, -0.005],
19         [ 0.993, -1.008, -1.007]]])
20 bias:
21 0.99099999999999999
22 反向傳播后更新得到的filter2: filter weights:
23 array([[[  9.98000000e-01,   9.98000000e-01,  -1.00100000e+00],
24         [ -1.00400000e+00,  -1.00700000e+00,   9.97000000e-01],
25         [ -4.00000000e-03,  -1.00400000e+00,   9.98000000e-01]],
26 
27        [[  0.00000000e+00,   9.99000000e-01,   0.00000000e+00],
28         [ -1.00900000e+00,  -5.00000000e-03,  -1.00400000e+00],
29         [ -1.00400000e+00,   1.00000000e+00,   0.00000000e+00]],
30 
31        [[ -1.00400000e+00,  -6.00000000e-03,  -5.00000000e-03],
32         [ -1.00200000e+00,  -5.00000000e-03,   9.98000000e-01],
33         [ -1.00200000e+00,  -1.00000000e-03,   0.00000000e+00]]])
34 bias:
35 -0.0070000000000000001

PaddlePaddle卷積神經網絡源碼解析

　　卷積層

　　在上篇文章中，我們對paddlepaddle實現卷積神經網絡的的函數簡單介紹了一下。在手寫數字識別中，我們設計CNN的網絡結構時，調用了一個函數simple_img_conv_pool(上篇文章的鏈接已失效，因為已經把framework--->fluid，更新速度太快了 = =)使用方式如下：

1 conv_pool_1 = paddle.networks.simple_img_conv_pool(
2         input=img,
3         filter_size=5,
4         num_filters=20,
5         num_channel=1,
6         pool_size=2,
7         pool_stride=2,
8         act=paddle.activation.Relu())

　　這個函數把卷積層和池化層兩個部分封裝在一起，只用調用一個函數就可以搞定，非常方便。如果只需要單獨使用卷積層，可以調用這個函數img_conv_layer,使用方式如下：

1 conv = img_conv_layer(input=data, filter_size=1, filter_size_y=1,
2                               num_channels=8,
3                               num_filters=16, stride=1,
4                               bias_attr=False,
5                               act=ReluActivation())

　　我們來看一下這個函數具體有哪些參數(注釋寫明了參數的含義和怎么使用)

  1 def img_conv_layer(input,
  2                    filter_size,
  3                    num_filters,
  4                    name=None,
  5                    num_channels=None,
  6                    act=None,
  7                    groups=1,
  8                    stride=1,
  9                    padding=0,
 10                    dilation=1,
 11                    bias_attr=None,
 12                    param_attr=None,
 13                    shared_biases=True,
 14                    layer_attr=None,
 15                    filter_size_y=None,
 16                    stride_y=None,
 17                    padding_y=None,
 18                    dilation_y=None,
 19                    trans=False,
 20                    layer_type=None):
 21     """
 22     適合圖像的卷積層。Paddle可以支持正方形和長方形兩種圖片尺寸的輸入
 23     
 24     也可適用于圖像的反卷積(Convolutional Transpose，即deconv)。
 25     同樣可支持正方形和長方形兩種尺寸輸入。
 26 
 27     num_channel:輸入圖片的通道數。可以是1或者3，或者是上一層的通道數(卷積核數目 * 組的數量)
 28     每一個組都會處理圖片的一些通道。舉個例子，如果一個輸入如偏的num_channel是256，設置4個group，
 29     32個卷積核，那么會創建32*4 = 128個卷積核來處理輸入圖片。通道會被分成四塊，32個卷積核會先
 30     處理64(256/4=64)個通道。剩下的卷積核組會處理剩下的通道。
 31 
 32     name:層的名字。可選，自定義。
 33     type:basestring
 34 
 35     input:這個層的輸入
 36     type:LayerOutPut
 37 
 38     filter_size:卷積核的x維，可以理解為width。
 39                 如果是正方形，可以直接輸入一個元祖組表示圖片的尺寸
 40     type:int/ tuple/ list
 41 
 42     filter_size_y:卷積核的y維，可以理解為height。
 43                 PaddlePaddle支持長方形的圖片尺寸，所以卷積核的尺寸為(filter_size,filter_size_y)
 44 
 45     type:int/ None
 46 
 47     act: 激活函數類型。默認選Relu
 48     type:BaseActivation
 49 
 50     groups:卷積核的組數量
 51     type:int
 52     
 53 
 54     stride: 水平方向的滑動步長。或者世界輸入一個元祖，代表水平數值滑動步長相同。
 55     type:int/ tuple/ list
 56 
 57     stride_y:垂直滑動步長。
 58     type:int 
 59     
 60     padding: 補零的水平維度，也可以直接輸入一個元祖，水平和垂直方向上補零的維度相同。
 61     type:int/ tuple/ list
 62 
 63     padding_y:垂直方向補零的維度
 64     type:int
 65 
 66     dilation:水平方向的擴展維度。同樣可以輸入一個元祖表示水平和初值上擴展維度相同
 67     :type:int/ tuple/ list
 68 
 69     dilation_y:垂直方向的擴展維度
 70     type:int
 71 
 72     bias_attr:偏置屬性
 73               False：不定義bias   True：bias初始化為0
 74     type: ParameterAttribute/ None/ bool/ Any
 75 
 76     num_channel：輸入圖片的通道channel。如果設置為None，自動生成為上層輸出的通道數
 77     type: int
 78 
 79     param_attr:卷積參數屬性。設置為None表示默認屬性
 80     param_attr:ParameterAttribute
 81 
 82     shared_bias:設置偏置項是否會在卷積核中共享
 83     type:bool
 84 
 85     layer_attr: Layer的 Extra Attribute
 86     type:ExtraLayerAttribute
 87 
 88     param trans:如果是convTransLayer，設置為True，如果是convlayer設置為conv
 89     type:bool
 90 
 91     layer_type:明確layer_type，默認為None。
 92                如果trans= True，必須是exconvt或者cudnn_convt，否則的話要么是exconv，要么是cudnn_conv
 93                ps:如果是默認的話，paddle會自動選擇適合cpu的ExpandConvLayer和適合GPU的CudnnConvLayer
 94                當然，我們自己也可以明確選擇哪種類型
 95     type:string
 96     return:LayerOutput object
 97     rtype:LayerOutput
 98 
 99     """
100 
101 
102 def img_conv_layer(input,
103                    filter_size,
104                    num_filters,
105                    name=None,
106                    num_channels=None,
107                    act=None,
108                    groups=1,
109                    stride=1,
110                    padding=0,
111                    dilation=1,
112                    bias_attr=None,
113                    param_attr=None,
114                    shared_biases=True,
115                    layer_attr=None,
116                    filter_size_y=None,
117                    stride_y=None,
118                    padding_y=None,
119                    dilation_y=None,
120                    trans=False,
121                    layer_type=None):
122 
123     if num_channels is None:
124         assert input.num_filters is not None
125         num_channels = input.num_filters
126 
127     if filter_size_y is None:
128         if isinstance(filter_size, collections.Sequence):
129             assert len(filter_size) == 2
130             filter_size, filter_size_y = filter_size
131         else:
132             filter_size_y = filter_size
133 
134     if stride_y is None:
135         if isinstance(stride, collections.Sequence):
136             assert len(stride) == 2
137             stride, stride_y = stride
138         else:
139             stride_y = stride
140 
141     if padding_y is None:
142         if isinstance(padding, collections.Sequence):
143             assert len(padding) == 2
144             padding, padding_y = padding
145         else:
146             padding_y = padding
147 
148     if dilation_y is None:
149         if isinstance(dilation, collections.Sequence):
150             assert len(dilation) == 2
151             dilation, dilation_y = dilation
152         else:
153             dilation_y = dilation
154 
155     if param_attr.attr.get('initial_smart'):
156         # special initial for conv layers.
157         init_w = (2.0 / (filter_size**2 * num_channels))**0.5
158         param_attr.attr["initial_mean"] = 0.0
159         param_attr.attr["initial_std"] = init_w
160         param_attr.attr["initial_strategy"] = 0
161         param_attr.attr["initial_smart"] = False
162 
163     if layer_type:
164         if dilation > 1 or dilation_y > 1:
165             assert layer_type in [
166                 "cudnn_conv", "cudnn_convt", "exconv", "exconvt"
167             ]
168         if trans:
169             assert layer_type in ["exconvt", "cudnn_convt"]
170         else:
171             assert layer_type in ["exconv", "cudnn_conv"]
172         lt = layer_type
173     else:
174         lt = LayerType.CONVTRANS_LAYER if trans else LayerType.CONV_LAYER
175 
176     l = Layer(
177         name=name,
178         inputs=Input(
179             input.name,
180             conv=Conv(
181                 filter_size=filter_size,
182                 padding=padding,
183                 dilation=dilation,
184                 stride=stride,
185                 channels=num_channels,
186                 groups=groups,
187                 filter_size_y=filter_size_y,
188                 padding_y=padding_y,
189                 dilation_y=dilation_y,
190                 stride_y=stride_y),
191             **param_attr.attr),
192         active_type=act.name,
193         num_filters=num_filters,
194         bias=ParamAttr.to_bias(bias_attr),
195         shared_biases=shared_biases,
196         type=lt,
197         **ExtraLayerAttribute.to_kwargs(layer_attr))
198     return LayerOutput(
199         name,
200         lt,
201         parents=[input],
202         activation=act,
203         num_filters=num_filters,
204         size=l.config.size)

　　　我們了解這些參數的含義后，對比我們之前自己手寫的CNN，可以看出paddlepaddle有幾個優點：

支持長方形和正方形的圖片尺寸

支持滑動步長stride、補零zero_padding、擴展dilation在水平和垂直方向上設置不同的值

支持偏置項卷積核中能夠共享

自動適配cpu和gpu的卷積網絡

　　在我們自己寫的CNN中，只支持正方形的圖片長度，如果是長方形會報錯。滑動步長，補零的維度等也只支持水平和垂直方向上的維度相同。了解卷積層的參數含義后，我們來看一下底層的源碼是如何實現的：ConvBaseLayer.py 有興趣的同學可以在這個鏈接下看看底層是如何用C++寫的ConvLayer

　　池化層同理，可以按照之前的思路分析，有興趣的可以一直順延看到底層的實現，下次有機會再詳細分析。(占坑明天補一下tensorflow的源碼實現)

總結　　

　　本文主要講解了卷積神經網絡中反向傳播的一些技巧，包括卷積層和池化層的反向傳播與傳統的反向傳播的區別，并實現了一個完整的CNN，后續大家可以自己修改一些代碼，譬如當水平滑動長度與垂直滑動長度不同時需要怎么調整等等，最后研究了一下paddlepaddle中CNN中的卷積層的實現過程，對比自己寫的CNN，總結了4個優點，底層是C++實現的，有興趣的可以自己再去深入研究。寫的比較粗糙，如果有問題歡迎留言：）

參考文章：

1.http://www.rzrgm.cn/pinard/p/6494810.html

2.https://www.zybuluo.com/hanbingtao/note/476663

posted @ 2017-11-22 17:20 Charlotte77 閱讀(70974) 評論(84) 收藏舉報

刷新頁面返回頂部

Charlotte77

數學系的數據挖掘民工(公眾號:CharlotteDataMining，深度學習技術交流qq群:339120614)最新深度學習免費學習視頻請移步我的B站：https://www.bilibili.com/video/av75414647

【深度學習系列】卷積神經網絡詳解(二)——自己手寫一個卷積神經網絡

作者：Charlotte77

出處：http://www.rzrgm.cn/charlotte77/

本文以學習、研究和分享為主，如需轉載，請聯系本人，標明作者和出處，非商業用途！

關注【Charlotte數據挖掘】回復 '資料' 獲取深度學習優質資料

公告