Example: Segmenting Tiles

This is an example tutorial for using Bilateral Convolution Layers (BCL) as illustrated in Sec. 6.1 of the main paper. This tutorial codes are available in $bilateralNN/bilateralnn_code/examples/tile_segmentation/ folder of the source code repository. To try the example codes, you have to change the CAFFE_ROOT value in config.py file to point to the caffe root directory.

In order to illustrate the usefulness of BCL, we construct the following example. A randomly colored tile with size 20x20 is placed on a random colored background of size 64x64 with additional Gaussian noise. The task is to segment out the smaller tile from the background. Some sample tile images are shown below:

Sample Tile Images

We created a dataset of 10K train, 1K validation and 1K test images. Python script for generating sample tile images: get_tile_data.py

A pixel classifier can not distinguish foreground from background since the color is random. We use 3 layered networks for this task. One is CNN with normal spatial convolutions and another one is BNN with learnable bilateral convolutions (BCL). The schematic of these networks is shown below. We use convolutions interleaved with ReLU non-linearities and, intermediate layers have 32 and 16 channels.

Three layered CNN (left) and BNN (right) architectures used for segmenting tiles. Conv9 corresponds to 9x9 spatial convolution layer and BCL1 corresponds to 1-neighborhood permutohedral layer.

The following is the corresponding deploy prototxt for CNN (cnn_deploy.prototxt) and training prototxt is cnn_train.prototxt. We use 9x9 filters for spatial convolutions.


name: "TILES"
input: "data"
input_shape {
dim: 1000
dim: 3
dim: 64
dim: 64
}
force_backward: true

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 0.1
  }
  convolution_param {
    num_output: 32
    pad: 4
    kernel_size: 9
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  bottom: "conv1"
  top: "conv1"
  name: "relu1"
  type: "ReLU"
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "conv1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 0.1
  }
  convolution_param {
    num_output: 16
    pad: 4
    kernel_size: 9
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  bottom: "conv2"
  top: "conv2"
  name: "relu2"
  type: "ReLU"
}

layer {
  name: "conv3"
  type: "Convolution"
  bottom: "conv2"
  top: "conv_result"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 0.1
  }
  convolution_param {
    num_output: 2
    pad: 4
    kernel_size: 9
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

The corresponding deploy prototxt for BNN is shown below (bnn_deploy.prototxt) and training prototxt is bnn_train.prototxt. The features used are RGBXY and the filter has a neighborhood of 1. The total number of parameters in this BNN is around 40K compared to 52K for the above CNN with 9x9 convolutions.


name: "TILES"
input: "data"
input_shape {
  dim: 1000
  dim: 3
  dim: 64
  dim: 64
}
force_backward: true

layer {
 name: "features"
 type: "PixelFeature"
 bottom: "data"                 # Input data blob
 top: "bilateral_features"      # Output feature blob
 pixel_feature_param{
   type: POSITION_AND_RGB       # Feature type (others: RGB, POSITION)
   pos_scale: 0.05              # Position feature scale
   color_scale: 10              # Color feature scale
 }
}

layer {
  name: "permutohedral1"
  type: "Permutohedral"
  bottom: "data"                # Input blob
  bottom: "bilateral_features"  # Input features
  bottom: "bilateral_features"  # Output features

  top: "conv1"                  # Output filtered blob
  top: "lattice"                # Outputs the lattice that can be used by other permutohedral layer

  permutohedral_param {
    num_output: 32              # Number of filter banks == dimension of the output signal.
    group: 1                    # Number of convolutional groups (default is 1).
    neighborhood_size: 1        # Filter neighborhood size
    bias_term: true             # Whether to use bias term or not
    norm_type: AFTER            # SYMMENTRIC (default): Applies the signal normalization before and after the filtering;
                                # AFTER:                Applies the signal normalization after the filtering.
    offset_type: NONE           # FULL (default): Full Gaussian Offset;
                                # DIAG:           Diagonal Gaussian offset;
                                # NONE:           No offset.
    filter_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
 }
}
layer {
  bottom: "conv1"
  top: "conv1"
  name: "relu1"
  type: "ReLU"
}


layer {
  name: "permutohedral2"
  type: "Permutohedral"
  bottom: "conv1"
  bottom: "bilateral_features"
  bottom: "bilateral_features"
  bottom: "lattice"

  top: "conv2"

  permutohedral_param {
    num_output: 16
    group: 1
    neighborhood_size: 1
    bias_term: true
    norm_type: AFTER
    offset_type: NONE

    repeated_init: false
    filter_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
 }
}
layer {
  bottom: "conv2"
  top: "conv2"
  name: "relu2"
  type: "ReLU"
}


layer {
  name: "permutohedral3"
  type: "Permutohedral"
  bottom: "conv2"
  bottom: "bilateral_features"
  bottom: "bilateral_features"
  bottom: "lattice"

  top: "conv_result"

  permutohedral_param {
    num_output: 2
    group: 1
    neighborhood_size: 1
    bias_term: true
    norm_type: AFTER
    offset_type: NONE

    repeated_init: false
    filter_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
 }
}

In this task, color is a discriminative feature for the label and thus doing filtering in high-dimensional RGBXY space would make the task easier. In other words, bilateral convolutions already see the color difference, near-by points are pre-grouped in the permutohedral lattice and the task remains to assign a label to the two groups. Lets see if this is really the case by checking whether BNN converges better than CNN.

Training is performed by using train.py script. The syntax for using this training script is: python train.py <base_lr> <train_prototxt> <snapshot_prefix> <init_caffemodel(optional)>. For CNN, the optimal learning rate for SGD is found to be 0.01. CNN is trained with:

mkdir snapshot_models
python train.py 0.01 cnn_train.prototxt snapshot_models/cnn_train

Similarly, BNN is trained with:

python train.py 1.0 bnn_train.prototxt snapshot_models/bnn_train

The above training commands saves the intermediate trained models in ./snapshot_models/ folder. Next, we test and plot the intersection over union (IoU) score for all the intermediate trained models using the script - test_and_plot.py:

python test_and_plot.py

This results in the following visualization of the training progress of CNN vs. BNN:

Training progress in terms of test IoU vs. Iterations.

The above plot indicates that BNN has better convergence in comparison to CNN with similar number of parameters. The plots may look slightly different for you due to the random initialization in Caffe. You might have noticed from the training logs that each iteration of BNN is approximately 6-10 times slower (if you use GPU) than CNN. This is due the 5 dimensional filtering in BNN in comparison to 2D filtering in CNN. Some parts of the permutohedral layer uses CPU even when in GPU mode and can be speed up. We observe that CNN can also converge to near-perfect solution when bigger filter sizes are used. Refer to Fig. 5 and Sec. 6.1 of the main paper for more details.