Yuhang He's Blog

Some birds are not meant to be caged, their feathers are just too bright.

How to instantiate or ignore Caffe label_ blob

Caffe stores all input dataset within a mini-batch by data_ and label_ blobs, both of which are declares in Batch class:

1
2
3
4
5
template <typename Dtype>
class Batch{
  public:
    Blob<Dtype> data_, label_;
};

However, label_ blob is not always prerequisite. That is, not every input data corresponds to a label (or several labels). Under this circumstance, a question arises: How to instantiate or ignore the label_ blob?

Good question. Let’s dive into the source code. Actually, the BaseDataLayer class holds an variable bool output_labels_, controlling whether the DataLayer should outputs label_. Further more, the value of output_label_ is automatically determined by the by the DataLayer top blob size via the following code:

1
2
3
4
5
if( top_size() == 1 ){
  output_labels_ = false;
}else{
  output_labels_ = true;
}

That is, you have to explicitly pinpoint how many top blobs you want to extract from the bottom datum. An typical data layer usually looks like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
layer {
  name: "data_input"
  type: "ImageData"
  top: "data"
  top: "label"
  include{
    phase: TRAIN
  }
  transform_param {
    crop_size: 224
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  image_data_param {
    source: ""
    batch_size: 100
    new_height: 256
    new_width: 256
    root_folder: ""
  }
}

While you instantiated two top blobs: data and label. Caffe system automatically assumes you need label_ blob, thus would calculate it. OK, problem solved: if you instantiate two tops in your data layer, Caffe would output label_, otherwise Caffe ignores it.

A Little Bit More

A natural extension of aforementioned question is that: what if we have to output more than two blobs in our data layer, i.e. 3, 4 and more. Great question. But remember you have to write many codes to achieve it. First, please take a glance at the Batch class definition. It currently supports only two blobs. Once you want to involve more blobs, you have to modify it to hold more blobs. One example might look like this following one

1
2
3
4
5
template <typename Dtype>
class Batch{
  public:
    Blob<Dtype> data1_, data2_, label_;
};

With the new defined Batch class, you enjoy much flexibility to arrange your dataset.

Hope you enjoy it!