Yuhang He's Blog

Some birds are not meant to be caged, their feathers are just too bright.

How an image gets transformed to blob in Caffe

Several weeks ago, I got drown in adding data augmentation jitter to images, like rotation, mirror, scaling. When I read the source code line by line, a question abruptly bouncing back in my mind: how Caffe converts an image to blob? After some talks with my colleagues, I finally got the anwser:

The offcial Caffe says the conventional blob stores data in four dimension row-majored format: Batch*Channel*Height*Width B*C*H*W. It means that in a 4D blob N*C*H*W, the width changes fastest, then the height, channel, batch size N accordingly. Thus, given an image, we store it channel by channel (R, G, B channel respectively). For each channel plane, we scan it line by line (note the width changes fastest).

From the 4D blob aspect, we can intuitively implementation the code for an image as:

1
2
3
4
5
6
7
8
  Dtype* blob_data;
  for( int c = 0; c < channel_num; ++c ){
    for( int h = 0; h < height_num; ++h )
      for( int w = 0; w < width_num; ++w ){
        blob_data->at(c, h, w) = img.at<uchar>( index_id );
      }
    }
  }

However, it is somewhat to index the index_id for the image because index_id jumps largely and irregularly around edges. Note that we, in most situations, start with an image and we might store the image line by line, with channel comes last. The official Caffe adopts this way. The relevant code is in data_transformer.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  Dtype* transformed_data = transformed_blob->mutable_cpu_data();
  int top_index;
  for (int h = 0; h < height; ++h) {
    const uchar* ptr = cv_cropped_img.ptr<uchar>(h);
    int img_index = 0;
    for (int w = 0; w < width; ++w) {
      for (int c = 0; c < img_channels; ++c) {
        if (do_mirror) {
          top_index = (c * height + h) * width + (width - 1 - w);
        } else {
          top_index = (c * height + h) * width + w;
        }
        Dtype pixel = static_cast<Dtype>(ptr[img_index++]);
        if (has_mean_file) {
          int mean_index = (c * img_height + h_off + h) * img_width + w_off + w;
          transformed_data[top_index] =
            (pixel - mean[mean_index]) * scale;
        } else {
          if (has_mean_values) {
            transformed_data[top_index] =
              (pixel - mean_values_[c]) * scale;
          } else {
            transformed_data[top_index] = pixel * scale;
          }
        }
      }
    }
  }

It is worthwhile noting that the index (h, w, c) of the query image locates in (c * height + h)*width + w 4D blob. Of course, we can implement the code in (c, h, w) order and its index in the 4D blob is still (c*height + h)*width + w.