My intern and I plan to implement an affine transformation distance layer with Caffe, within which we further nested two full connection layers to simulate the affine distance matrix. What’s stange is that the learned parameters of the nested two full connection layers did not get stored in the final caffemodel. It is really frustrating as we find the two full connection layers would randomly initialize their weight parameters, resulting in erroneous outputs. I hereby read convolution and full connection source code for their storing learned parameters and finally figured our where the bug is.
How learnable parameters get stored?
All specified layers (i.e. Convolution, FC, Batch Normalization) inherite from
layer.hpp. It has a protected variable
vector<shared_ptr< Blob<Dtype> > > blobs_ storing the learnable parameters. if
blobs_.size() > 0, it means there are parameters to be stored in caffemodel, vice versa. Naturally, convolution or fully connection has to explicitly set the
blobs_ size to be larger than 0, depending on how many parameters it has to learn, in their
LayerSetup() functions. This is exactly how they work, for example, in Convolution
LayerSetup(), it initializes the size of
1 2 3 4 5 6 7 8 9 10
Obviously, the self-implemented layers have to explicitly their
blobs_ size in order to let the learned parameters to be saved. As to my problem, I can definitely resize the
blobs_ to be 2, and let them to directly communicate with the two nested full connection layers’
It is a pitfall deserves careful attention. Hope it would be read before others attempting to implement relevant layers.