Yuhang He's Blog

Some birds are not meant to be caged, their feathers are just too bright.

Pitfall in Caffe Layer Nesting

My intern and I plan to implement an affine transformation distance layer with Caffe, within which we further nested two full connection layers to simulate the affine distance matrix. What’s stange is that the learned parameters of the nested two full connection layers did not get stored in the final caffemodel. It is really frustrating as we find the two full connection layers would randomly initialize their weight parameters, resulting in erroneous outputs. I hereby read convolution and full connection source code for their storing learned parameters and finally figured our where the bug is.

How learnable parameters get stored?

All specified layers (i.e. Convolution, FC, Batch Normalization) inherite from layer.hpp. It has a protected variable vector<shared_ptr< Blob<Dtype> > > blobs_ storing the learnable parameters. if blobs_.size() > 0, it means there are parameters to be stored in caffemodel, vice versa. Naturally, convolution or fully connection has to explicitly set the blobs_ size to be larger than 0, depending on how many parameters it has to learn, in their LayerSetup() functions. This is exactly how they work, for example, in Convolution LayerSetup(), it initializes the size of blobs_ in base_conv_layer.cpp:

1
2
3
4
5
6
7
8
9
10
if( this-> blobs_.size() > 0 ){
	// initialize
}
else{
  if( bias_term_ ){
	this -> blobs_.resize(2);
  }else{
    this -> blobs_.resize(1);
  }
}

Obviously, the self-implemented layers have to explicitly their blobs_ size in order to let the learned parameters to be saved. As to my problem, I can definitely resize the blobs_ to be 2, and let them to directly communicate with the two nested full connection layers’ blobs_.

It is a pitfall deserves careful attention. Hope it would be read before others attempting to implement relevant layers.