-
-
Notifications
You must be signed in to change notification settings - Fork 310
Yihui He 何宜晖 edited this page Jan 9, 2018
·
2 revisions
Our 3C approach applies 3 methods sequentially. Given conv weights W
:
- Spatial Decomposition produces
W_v
andW_h
. - Channel Decomposition decomposes
W_h
and outputsW_h'
andW_p
. - Channel Pruning prunes
W_p
.
In the beginning, we adopt Filter Reconstruction in the Spatial Decomposition, which is data independent.
![](https://raw.githubusercontent.com/yihui-he/images/master/Screenshot%20from%202018-01-09%2011-03-26.png)
We found that the whole model performance can be improved by minimizing the error on the output feature map after ReLU with W_h
(namely, data dependent). The method is from nonlinear case 3.2 in Channel Decomposition. The corresponding function in our code is nonlinear_fc
.
It involves two alternative steps.
First, minimize the error on the feature map before ReLU with linear least squares:
Second, minimize the error on the feature map after ReLU: