## Abstract

Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation allows to learn high dimensional linear filters that operate in sparsely populated feature spaces. We build on the permutohedral lattice construction for efficient filtering. The ability to learn more general forms of high-dimensional filters can be used in several diverse applications. First, we demonstrate the use in applications where single filter applications are desired for runtime reasons. Further, we show how this algorithm can be used to learn the pairwise potentials in densely connected conditional random fields and apply these to different image segmentation tasks. Finally, we introduce layers of bilateral filters in CNNs and propose bilateral neural networks for the use of high-dimensional sparse data. This view provides new ways to encode model structure into network architectures. A diverse set of experiments empirically validates the usage of general forms of filters.

In a follow-up work, we showed how one could use Bilateral Neural Networks for fast information propagation across video frames. See the ‘Video Propagation Networks’ website for further details and the corresponding code.

## Paper

Please consider citing the following papers if you make use of this work and/or the corresponding code:

```
@inproceedings{jampani:cvpr:2016,
title = {Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks},
author = {Jampani, Varun and Kiefel, Martin and Gehler, Peter V.},
booktitle = { IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
month = jun,
year = {2016}
}
```

A preliminary version of this work is presented at ICLR-15 workshop: PDF

```
@article{kiefel:iclr:2015,
title={Permutohedral Lattice CNNs},
author={Kiefel, Martin and Jampani, Varun and Gehler, Peter V.},
booktitle={International Conference on Learning Representations Workshop},
month = May,
year={2015}
}
```

## Code

We integrated our bilateral filter learning into Caffe neural network framework. Code is available at https://github.com/MPI-IS/bilateralNN.

## Usage

Learnable bilateral convolution layer is implemented as `Permutohedral`

layer in Caffe neural network framework. The format of the filter follows closely the filter format of the standard spatial convolution. It has the shape (n, c, 1, f), with n = num_output, c = input_channels / groups, f = filter_size. Below is a sample prototxt for the layer along with explanations of different parameters of the layer.

```
layer {
name: "permutohedral"
type: "Permutohedral"
bottom: "input" # Input blob
bottom: "in_features" # Input features
bottom: "out_features" # Output features
bottom: "in_lattice" # (Optional) Use the lattice information from another permutohedral layer
# with the same features.
top: "output" # Output filtered blob
top: "out_lattice" # (Optional) Outputs the lattice information that can be used by another
# permutohedral layer with the same features.
permutohedral_param {
num_output: 1 # Number of filter banks == dimension of the output signal.
group: 1 # Number of convolutional groups (default is 1). The input signal is cut into
# this many groups to compute the filtered result.
# Each one of the (input_channels / groups) are responsible for one element in
# the (num_output / groups) many output groups.
neighborhood_size: 2 # Filter neighborhood size
bias_term: true # Whether to use bias term or not
norm_type: SYMMETRIC # SYMMENTRIC (default): Applies the signal normalization before and after the
# filtering;
# AFTER: Applies the signal normalization after the filtering.
offset_type: FULL # FULL (default): Full Gaussian Offset;
# DIAG: Diagonal Gaussian offset;
# NONE: No offset.
visualize_lattice: false # false (default): nothing changes and works as usual
# true: Will make barycentric coordinates uniform (useful for lattice visualisation)
do_skip_blur: false # false (default): nothing changes and works as usual
# true: skips the blur step and only runs splat and slice
repeated_init: false # false (default): constructs a new lattice with in and out features
# true: Makes use of in_lattice which is given as third bottom (required for this option)
}
}
```

We also add `PixelFeature`

layer in Caffe to extract features from the input image. Following is a sample prototxt to use that layer.

```
layer {
name: "positional_rgb_features"
type: "PixelFeature"
bottom: "image_blob" # Input data blob
top: "positional_rgb_features" # Output feature blob
permutohedral_feature_param{
type: POSITION_AND_RGB # Feature type (others: RGB, POSITION, RGB_AND_POSITION)
pos_scale: 0.1 # Position feature scale (default: 1.0)
color_scale: 0.2 # color feature scale (default: 1.0)
}
}
```

## Example: Segmenting Tiles

This is an example tutorial for using Bilateral Convolution Layers (BCL) as illustrated in Sec. 6.1 of the main paper. This tutorial codes are available in `$bilateralNN/bilateralnn_code/examples/tile_segmentation/`

folder of the source code repository. To try the example codes, you have to change the `CAFFE_ROOT`

value in *config.py* file to point to the caffe root directory.

In order to illustrate the usefulness of BCL, we construct the following example. A randomly colored tile with size 20x20 is placed on a random colored background of size 64x64 with additional Gaussian noise. The task is to segment out the smaller tile from the background. Some sample tile images are shown below:

We created a dataset of 10K train, 1K validation and 1K test images. Python script for generating sample tile images: get_tile_data.py

A pixel classifier can not distinguish foreground from background since the color is random. We use 3 layered networks for this task. One is CNN with normal spatial convolutions and another one is BNN with learnable bilateral convolutions (BCL). The schematic of these networks is shown below. We use convolutions interleaved with ReLU non-linearities and, intermediate layers have 32 and 16 channels.

The following is the corresponding deploy prototxt for CNN (cnn_deploy.prototxt) and training prototxt is cnn_train.prototxt. We use 9x9 filters for spatial convolutions.

```
name: "TILES"
input: "data"
input_shape {
dim: 1000
dim: 3
dim: 64
dim: 64
}
force_backward: true
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 0.1
}
convolution_param {
num_output: 32
pad: 4
kernel_size: 9
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
bottom: "conv1"
top: "conv1"
name: "relu1"
type: "ReLU"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "conv1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 0.1
}
convolution_param {
num_output: 16
pad: 4
kernel_size: 9
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
bottom: "conv2"
top: "conv2"
name: "relu2"
type: "ReLU"
}
layer {
name: "conv3"
type: "Convolution"
bottom: "conv2"
top: "conv_result"
param {
lr_mult: 1
}
param {
lr_mult: 0.1
}
convolution_param {
num_output: 2
pad: 4
kernel_size: 9
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
```

The corresponding deploy prototxt for BNN is shown below (bnn_deploy.prototxt) and training prototxt is bnn_train.prototxt. The features used are RGBXY and the filter has a neighborhood of 1. The total number of parameters in this BNN is around 40K compared to 52K for the above CNN with 9x9 convolutions.

```
name: "TILES"
input: "data"
input_shape {
dim: 1000
dim: 3
dim: 64
dim: 64
}
force_backward: true
layer {
name: "features"
type: "PixelFeature"
bottom: "data" # Input data blob
top: "bilateral_features" # Output feature blob
pixel_feature_param{
type: POSITION_AND_RGB # Feature type (others: RGB, POSITION)
pos_scale: 0.05 # Position feature scale
color_scale: 10 # Color feature scale
}
}
layer {
name: "permutohedral1"
type: "Permutohedral"
bottom: "data" # Input blob
bottom: "bilateral_features" # Input features
bottom: "bilateral_features" # Output features
top: "conv1" # Output filtered blob
top: "lattice" # Outputs the lattice that can be used by other permutohedral layer
permutohedral_param {
num_output: 32 # Number of filter banks == dimension of the output signal.
group: 1 # Number of convolutional groups (default is 1).
neighborhood_size: 1 # Filter neighborhood size
bias_term: true # Whether to use bias term or not
norm_type: AFTER # SYMMENTRIC (default): Applies the signal normalization before and after the filtering;
# AFTER: Applies the signal normalization after the filtering.
offset_type: NONE # FULL (default): Full Gaussian Offset;
# DIAG: Diagonal Gaussian offset;
# NONE: No offset.
filter_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
bottom: "conv1"
top: "conv1"
name: "relu1"
type: "ReLU"
}
layer {
name: "permutohedral2"
type: "Permutohedral"
bottom: "conv1"
bottom: "bilateral_features"
bottom: "bilateral_features"
bottom: "lattice"
top: "conv2"
permutohedral_param {
num_output: 16
group: 1
neighborhood_size: 1
bias_term: true
norm_type: AFTER
offset_type: NONE
repeated_init: false
filter_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
bottom: "conv2"
top: "conv2"
name: "relu2"
type: "ReLU"
}
layer {
name: "permutohedral3"
type: "Permutohedral"
bottom: "conv2"
bottom: "bilateral_features"
bottom: "bilateral_features"
bottom: "lattice"
top: "conv_result"
permutohedral_param {
num_output: 2
group: 1
neighborhood_size: 1
bias_term: true
norm_type: AFTER
offset_type: NONE
repeated_init: false
filter_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
```

In this task, color is a discriminative feature for the label and thus doing filtering in high-dimensional RGBXY space would make the task easier. In other words, bilateral convolutions already *see* the color difference, near-by points are pre-grouped in the permutohedral lattice and the task remains to assign a label to the two groups. Lets see if this is really the case by checking whether BNN converges better than CNN.

Training is performed by using train.py script. The syntax for using this training script is:
`python train.py <base_lr> <train_prototxt> <snapshot_prefix> <init_caffemodel(optional)>`

. For CNN, the optimal learning rate for SGD is found to be 0.01. CNN is trained with:

```
mkdir snapshot_models
python train.py 0.01 cnn_train.prototxt snapshot_models/cnn_train
```

Similarly, BNN is trained with:

```
python train.py 1.0 bnn_train.prototxt snapshot_models/bnn_train
```

The above training commands saves the intermediate trained models in `./snapshot_models/`

folder. Next, we test and plot the intersection over union (IoU) score for all the intermediate trained models using the script - test_and_plot.py:

```
python test_and_plot.py
```

This results in the following visualization of the training progress of CNN vs. BNN:

The above plot indicates that BNN has better convergence in comparison to CNN with similar number of parameters. The plots may look slightly different for you due to the random initialization in Caffe. You might have noticed from the training logs that each iteration of BNN is approximately 6-10 times slower (if you use GPU) than CNN. This is due the 5 dimensional filtering in BNN in comparison to 2D filtering in CNN. Some parts of the permutohedral layer uses CPU even when in GPU mode and can be speed up. We observe that CNN can also converge to near-perfect solution when bigger filter sizes are used. Refer to Fig. 5 and Sec. 6.1 of the main paper for more details.