Abstract

Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation allows to learn high dimensional linear filters that operate in sparsely populated feature spaces. We build on the permutohedral lattice construction for efficient filtering. The ability to learn more general forms of high-dimensional filters can be used in several diverse applications. First, we demonstrate the use in applications where single filter applications are desired for runtime reasons. Further, we show how this algorithm can be used to learn the pairwise potentials in densely connected conditional random fields and apply these to different image segmentation tasks. Finally, we introduce layers of bilateral filters in CNNs and propose bilateral neural networks for the use of high-dimensional sparse data. This view provides new ways to encode model structure into network architectures. A diverse set of experiments empirically validates the usage of general forms of filters.

Schematic of the learnable permutohedral convolution. Left: splatting the input points (orange) onto the lattice corners (black); Middle: The extent of a filter on the lattice with a 2 neighborhood (white circles), for reference we show a Gaussian filter, with its values color coded. The general case has a free scalar/vector parameter per circle. Right: The result of the convolution at the lattice corners (black) is projected back to the output points (blue). Note that in general the output and input points may be different.

In a follow-up work, we showed how one could use Bilateral Neural Networks for fast information propagation across video frames. See the ‘Video Propagation Networks’ website for further details and the corresponding code.

Paper

Please consider citing the following papers if you make use of this work and/or the corresponding code:

@inproceedings{jampani:cvpr:2016,
	title = {Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks},
	author = {Jampani, Varun and Kiefel, Martin and Gehler, Peter V.},
	booktitle = { IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
	month = jun,
	year = {2016}
}

A preliminary version of this work is presented at ICLR-15 workshop: PDF

@article{kiefel:iclr:2015,
  title={Permutohedral Lattice CNNs},
  author={Kiefel, Martin and Jampani, Varun and Gehler, Peter V.},
  booktitle={International Conference on Learning Representations Workshop},
  month = May,
  year={2015}
}

Usage

Learnable bilateral convolution layer is implemented as Permutohedral layer in Caffe neural network framework. The format of the filter follows closely the filter format of the standard spatial convolution. It has the shape (n, c, 1, f), with n = num_output, c = input_channels / groups, f = filter_size. Below is a sample prototxt for the layer along with explanations of different parameters of the layer.

layer {
  name: "permutohedral"

  type: "Permutohedral"

  bottom: "input"              # Input blob
  bottom: "in_features"        # Input features
  bottom: "out_features"       # Output features
  bottom: "in_lattice"         # (Optional) Use the lattice information from another permutohedral layer
                               # with the same features.

  top: "output"                # Output filtered blob
  top: "out_lattice"           # (Optional) Outputs the lattice information that can be used by another
                               # permutohedral layer with the same features.

  permutohedral_param {
    num_output: 1              # Number of filter banks == dimension of the output signal.

    group:      1              # Number of convolutional groups (default is 1). The input signal is cut into
                               # this many groups to compute the filtered result.
                               # Each one of the (input_channels / groups) are responsible for one element in
                               # the (num_output / groups) many output groups.

    neighborhood_size: 2       # Filter neighborhood size

    bias_term: true            # Whether to use bias term or not

    norm_type: SYMMETRIC       # SYMMENTRIC (default): Applies the signal normalization before and after the
                               #                       filtering;
                               # AFTER:                Applies the signal normalization after the filtering.

    offset_type: FULL          # FULL (default): Full Gaussian Offset;
                               # DIAG:           Diagonal Gaussian offset;
                               # NONE:           No offset.

    visualize_lattice: false   # false (default): nothing changes and works as usual
                               # true:            Will make barycentric coordinates uniform (useful for lattice visualisation)

    do_skip_blur: false        # false (default): nothing changes and works as usual
                               # true:            skips the blur step and only runs splat and slice

    repeated_init: false       # false (default): constructs a new lattice with in and out features
                               # true:            Makes use of in_lattice which is given as third bottom (required for this option)
 }
}

We also add PixelFeature layer in Caffe to extract features from the input image. Following is a sample prototxt to use that layer.

layer {
  name: "positional_rgb_features"
  type: "PixelFeature"
  bottom: "image_blob"            # Input data blob
  top: "positional_rgb_features"  # Output feature blob

  permutohedral_feature_param{
    type: POSITION_AND_RGB        # Feature type (others: RGB, POSITION, RGB_AND_POSITION)
    pos_scale: 0.1                # Position feature scale (default: 1.0)
    color_scale: 0.2              # color feature scale (default: 1.0)
  }
}

Example: Segmenting Tiles

This is an example tutorial for using Bilateral Convolution Layers (BCL) as illustrated in Sec. 6.1 of the main paper. This tutorial codes are available in $bilateralNN/bilateralnn_code/examples/tile_segmentation/ folder of the source code repository. To try the example codes, you have to change the CAFFE_ROOT value in config.py file to point to the caffe root directory.

In order to illustrate the usefulness of BCL, we construct the following example. A randomly colored tile with size 20x20 is placed on a random colored background of size 64x64 with additional Gaussian noise. The task is to segment out the smaller tile from the background. Some sample tile images are shown below:

Sample Tile Images

We created a dataset of 10K train, 1K validation and 1K test images. Python script for generating sample tile images: get_tile_data.py

A pixel classifier can not distinguish foreground from background since the color is random. We use 3 layered networks for this task. One is CNN with normal spatial convolutions and another one is BNN with learnable bilateral convolutions (BCL). The schematic of these networks is shown below. We use convolutions interleaved with ReLU non-linearities and, intermediate layers have 32 and 16 channels.

Three layered CNN (left) and BNN (right) architectures used for segmenting tiles. Conv9 corresponds to 9x9 spatial convolution layer and BCL1 corresponds to 1-neighborhood permutohedral layer.

The following is the corresponding deploy prototxt for CNN (cnn_deploy.prototxt) and training prototxt is cnn_train.prototxt. We use 9x9 filters for spatial convolutions.


name: "TILES"
input: "data"
input_shape {
dim: 1000
dim: 3
dim: 64
dim: 64
}
force_backward: true

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 0.1
  }
  convolution_param {
    num_output: 32
    pad: 4
    kernel_size: 9
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  bottom: "conv1"
  top: "conv1"
  name: "relu1"
  type: "ReLU"
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "conv1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 0.1
  }
  convolution_param {
    num_output: 16
    pad: 4
    kernel_size: 9
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  bottom: "conv2"
  top: "conv2"
  name: "relu2"
  type: "ReLU"
}

layer {
  name: "conv3"
  type: "Convolution"
  bottom: "conv2"
  top: "conv_result"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 0.1
  }
  convolution_param {
    num_output: 2
    pad: 4
    kernel_size: 9
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

The corresponding deploy prototxt for BNN is shown below (bnn_deploy.prototxt) and training prototxt is bnn_train.prototxt. The features used are RGBXY and the filter has a neighborhood of 1. The total number of parameters in this BNN is around 40K compared to 52K for the above CNN with 9x9 convolutions.


name: "TILES"
input: "data"
input_shape {
  dim: 1000
  dim: 3
  dim: 64
  dim: 64
}
force_backward: true

layer {
 name: "features"
 type: "PixelFeature"
 bottom: "data"                 # Input data blob
 top: "bilateral_features"      # Output feature blob
 pixel_feature_param{
   type: POSITION_AND_RGB       # Feature type (others: RGB, POSITION)
   pos_scale: 0.05              # Position feature scale
   color_scale: 10              # Color feature scale
 }
}

layer {
  name: "permutohedral1"
  type: "Permutohedral"
  bottom: "data"                # Input blob
  bottom: "bilateral_features"  # Input features
  bottom: "bilateral_features"  # Output features

  top: "conv1"                  # Output filtered blob
  top: "lattice"                # Outputs the lattice that can be used by other permutohedral layer

  permutohedral_param {
    num_output: 32              # Number of filter banks == dimension of the output signal.
    group: 1                    # Number of convolutional groups (default is 1).
    neighborhood_size: 1        # Filter neighborhood size
    bias_term: true             # Whether to use bias term or not
    norm_type: AFTER            # SYMMENTRIC (default): Applies the signal normalization before and after the filtering;
                                # AFTER:                Applies the signal normalization after the filtering.
    offset_type: NONE           # FULL (default): Full Gaussian Offset;
                                # DIAG:           Diagonal Gaussian offset;
                                # NONE:           No offset.
    filter_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
 }
}
layer {
  bottom: "conv1"
  top: "conv1"
  name: "relu1"
  type: "ReLU"
}


layer {
  name: "permutohedral2"
  type: "Permutohedral"
  bottom: "conv1"
  bottom: "bilateral_features"
  bottom: "bilateral_features"
  bottom: "lattice"

  top: "conv2"

  permutohedral_param {
    num_output: 16
    group: 1
    neighborhood_size: 1
    bias_term: true
    norm_type: AFTER
    offset_type: NONE

    repeated_init: false
    filter_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
 }
}
layer {
  bottom: "conv2"
  top: "conv2"
  name: "relu2"
  type: "ReLU"
}


layer {
  name: "permutohedral3"
  type: "Permutohedral"
  bottom: "conv2"
  bottom: "bilateral_features"
  bottom: "bilateral_features"
  bottom: "lattice"

  top: "conv_result"

  permutohedral_param {
    num_output: 2
    group: 1
    neighborhood_size: 1
    bias_term: true
    norm_type: AFTER
    offset_type: NONE

    repeated_init: false
    filter_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
 }
}

In this task, color is a discriminative feature for the label and thus doing filtering in high-dimensional RGBXY space would make the task easier. In other words, bilateral convolutions already see the color difference, near-by points are pre-grouped in the permutohedral lattice and the task remains to assign a label to the two groups. Lets see if this is really the case by checking whether BNN converges better than CNN.

Training is performed by using train.py script. The syntax for using this training script is: python train.py <base_lr> <train_prototxt> <snapshot_prefix> <init_caffemodel(optional)>. For CNN, the optimal learning rate for SGD is found to be 0.01. CNN is trained with:

mkdir snapshot_models
python train.py 0.01 cnn_train.prototxt snapshot_models/cnn_train

Similarly, BNN is trained with:

python train.py 1.0 bnn_train.prototxt snapshot_models/bnn_train

The above training commands saves the intermediate trained models in ./snapshot_models/ folder. Next, we test and plot the intersection over union (IoU) score for all the intermediate trained models using the script - test_and_plot.py:

python test_and_plot.py

This results in the following visualization of the training progress of CNN vs. BNN:

Training progress in terms of test IoU vs. Iterations.

The above plot indicates that BNN has better convergence in comparison to CNN with similar number of parameters. The plots may look slightly different for you due to the random initialization in Caffe. You might have noticed from the training logs that each iteration of BNN is approximately 6-10 times slower (if you use GPU) than CNN. This is due the 5 dimensional filtering in BNN in comparison to 2D filtering in CNN. Some parts of the permutohedral layer uses CPU even when in GPU mode and can be speed up. We observe that CNN can also converge to near-perfect solution when bigger filter sizes are used. Refer to Fig. 5 and Sec. 6.1 of the main paper for more details.