From left to right: (1) input image (2) boundary strength from baseline method based on Canny edge detection (3) sample "pb-lite" result (4) gpb (state of the art). Notice that texture edges are less likely to be mistaken for boundaries by (3) and (4).

Project 2: pb-lite: Boundary Detection
CS 143: Introduction to Computer Vision

Brief

Starter Code

The starter code provides a outline of the pipeline and handles image loading, saving, visualization, and most significantly quantitative evaluation. The starter code will quantify the performance of your boundary detector on a subset of the human-annotated BSDS500 dataset. The evaluation process is non-trivial and calls mex-compiled code. We have precompiled this code for Linux (32 and 64 bit) and MacOS (64 bit). The code seems difficult to compile for Windows platforms although you are welcome to try and let us know if you have success. The code can be compiled by calling proj2/stencil/bench/source $ source build.sh from your shell or running build.m from the same directory in Matlab. The starter code provides a baseline boundary detector based on the average of Canny edge detection with different parameter settings. This is a non-trivial baseline, but you can beat it!

Overview

Boundary detection is an important, well-studied computer vision problem. Clearly it would be nice to have algorithms which know where one object stops and another starts. But boundary detection from a single image is fundamentally difficult. Determining boundaries could require object-specific reasoning, arguably making the task "vision hard".

Classical edge detection algorithms, including the Canny and Sobel baselines we will compare against, look for intensity discontinuities. The more recent pb boundary detectors significantly outperform these classical methods by considering texture and color gradients in addition to intensity. Qualitatively, much of this performance jump comes from the ability of the pb algorithm to suppress false positives that the classical methods produce in textured regions.

In this project, you will develop a simplified version of pb, which finds boundaries by examining brightness, color, and texture information across multiple scales. The output of your algorithm will be a per-pixel probability of boundary. Several papers from Berkeley describe their algorithms and how their methods evolved over time. Their source code is also available for reference (don't use it). Here we investigate a simplified version of the recent work of Arbelaez, Maire, Fowlkes, and Malik. TPAMI 2011 (pdf). Our simplified boundary detector will still significantly outperform the well regarded Canny edge detector. Evaluation is carried out against human annotations (ground truth) from a subset of the Berkeley Segmentation Data Set 500 (BSDS500)


Details

The main steps in Arbelaez et al. 2011 are:

The focus of pb-lite will be on the representation of brightness, color and texture gradients (bullet point 1). This is covered (briefly) in Section 3.1 of Arbelaez et al. 2011. We will trivialize the multiscale cue combination. We will get a simple form of non-maximum suppression by combining our pb-lite estimates with a classical edge detector. We will skip spectral clustering. You will implement the following pipeline:

The major steps are to pre-define:

and then for every image:

Finally the output for all of the evaluate using the the Berkeley Segmentation Data Set 500 (BSDS500) (code already provided)

Texture Representation

The key distinguishing element between pb-lite and classical edge detection is the ability to measure texture gradients in addition to intensity gradients. The texture gradient at any pixel should summarize how quickly the texture is changing at that point. The key technical challenges are how to represent local texture distributions and how to measure distances between them. As in pb, we will represent texture as a local distribution of textons, where textons are discrete texture elements generated by clustering filter bank responses. Texture and brightness gradients will be measured by comparing the distributions of textons or brightnesses within half-discs centered on a pixel of interest.

Filter Bank and Half-disc Masks

Filtering is at the heart of building the low level features we are interested in. We will use filtering both to measure texture properties and to aggregate regional texture and brightness distributions. There are two sets of filters in this project. The first is an oriented filter bank for computing the textons. A simple but effective filter bank is a collection of oriented derivative of Gaussian filters. These filters can be created by convolving a simple Sobel filter and a Gaussian kernel and then rotating the result. Suppose we want o orientations(from 0 to 360 degrees) and s scales, we should end up with a total of o*s filters. A sample filter bank of size 16 x 2 is shown below:

The half-disc masks are simply (pairs of) binary images of half-discs. These very important because it will allow us to compute the chi-square distances using a filtering operation, which is much faster than looping over each pixel neighborhood and aggregating counts for histograms. Once you get the above filter bank right, forming the masks will be trivial. A sample set of masks (8 orientations, 3 scales) is a shown below:

The filter banks and masks only need to be defined once and then they will be used on all images. You have some discretion to experiment with different filter banks and masks. You can earn extra credit by trying richer sets of filters and masks.

Hint: some useful Matlab functions include: fspecial, imrotate, conv2, imfilter, cell, reshape.

Generating a Texton Map

Filtering an input image with each element of your filter bank results in a vector of filter responses centered on each pixel. For instance, if your filter bank has 16 orientations and 2 scales, you would have 32 filter responses at each pixel. A distribution of these 32-dimensional filter responses (E.g. all of the vectors in an image region) could be thought of as encoding texture properties. We will simplify this representation by replacing each 32-dimensional vector with a discrete texton id. We will do this by clustering the filter responses at all pixels in the image in to K textons using kmeans (kmeans code is provided). Each pixel is then represented by a one dimensional, discrete cluster id instead of a vector of high-dimensional, real-valued filter responses. This can be represented with a single channel image with values in the range of [1, 2, 3, ... , K]. K=64 seems to work well but feel free to experiment. To visualize the a texton map, you can try imagesc(tmap);colormap(jet);

Local Texton Distributions

We will represent local texture distributions by building K-dimensional texton histograms over regions of interest. These histograms count how often each texton is observed. The regions of interest we will use are the half-disc masks previously discussed.

Analogous local histograms could be built for brightness or color, after brightness and color have been quantized in to some number of clusters.

Computing Texture Gradient and Brightness Gradient

The local texton gradient (tg) and brightness gradient (bg) encode how much the texture and brightness distributions are changing at a pixel. We compute tg and bg by comparing the distributions in left/right half-disc pairs centered at a pixel. If the distributions are the similar, the gradient should be small. If the distributions are dissimilar, the gradient should be large. Because our half-discs span multiple scales and orientations, we will end up with a series of local gradient measurements encoding how quickly the texture or brightness distributions are changing at different scales and angles.

We will compare texton and brightness distributions with the chi-square measure. The chi-square distance is a frequently used metric for comparing two histograms. It is defined as follows:

chi_sqr(g,h)=.5*sumi=1:K( (gi-hi)2 / (gi+hi) ), where g and h are histograms with the same binning scheme, and i indexes through these bins. Note that the numerator of this expression is simply the sum of squared difference between histogram elements. The denominator adds a "soft" normalization to each bin so that less frequent elements still contribute to the overall distance.

To efficiently compute tg and bg, filtering can used to avoid nested loops over pixels. In addition, the linear nature of the formula above can be exploited. Suppose we have a function get_gradient(img,masks,bins), where img is a single channel image of discrete values, such as a texton map. At a single orientation and scale, we can use a particular pair of masks to aggregate the counts in a histogram via a filtering operation, and compute the chi-square distance (gradient) in one loop over the bins according to the following outline:

chi_sqr_dist=img*0
for i=1:num_bins
    tmp = 1 where img is in bin i and 0 elsewhere
    g_i = convolve tmp with left_mask
    h_i = convolve tmp with right_mask
    update chi_sqr_dist
end	

The above procedure should generate a 2D matrix of gradient values. Simply repeat this for all orientations and scales, you should end up with a 3D matrix of size n*m*(o*s), where n, m are dimensions of the image.

Hint: you might want less than 256 bins when computing bg.

Output pb-lite

The final step is to combine information from the features with a baseline method (based on Sobel or Canny edge detection). The magnitude of the features represents the strength of boundaries, hence, a simple mean of the feature vector at location i should be somewhat proportional to pb. Of course, fancier ways to combine the features can be explored for better performance and extra credit. As a starting point, you can simply use an element-wise product of the baseline output and the mean feature strength to form the final pb value, this should work reasonably well.

Evaluating Boundary Detection

The goal of this project is to have a boundary detector that beats the baseline method provided, which is based on Sobel/Canny edge detection. The evaluation is based human annotation (groundtruth), collected as part of the BSDS500 dataset. A detailed description of the evaluation scheme is presented in Arbelaez et al. 2011 (pdf). A good performance measure is the F score of the precision-recall curve. The stencil code does automatic evaluation for you. A sample precision-recall curve is shown below:


(dotted lines are copied from figure 17 in Arbelaez et al. 2011 (pdf), solid lines come from actual implementations in MATLAB)

The above diagram is generated using all of the 200 test images in the dataset. If you are using a subset of the testset, you might get better/worse performance because some images are "easier" or "harder". But if you do better than gpb (dotted red line), something is either wrong or you can publish a top paper.

Extra Credit Suggestions

This project has many opportunities for deeper exploration. For instance, one could implement some of the more complex strategies of Arbelaez et al. 2011 (pdf) which have been simplified for the base requirements. You are free to try any extensions you can think of and we may find them worthy of extra credit, especially if you can show quantitative improvement in performance. In particular, the student or students (in case of tie) who achieve the best ODS F-score on the full 200 image test set will get recognition and extra credit.

Below are several specific extra credit suggestions:

Graduate Credit

To get graduate credit on this project you must do 10 points worth of extra credit. Those 10 points will not be added to your grade, but additional extra credit will be.

Writeup

For this project, and all other projects, you must do a project report in HTML. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. Discuss any extra credit you did, and clearly show what contribution it had on the results (e.g. performance with and without each extra credit component). Feel free to add any other information you feel is relevant.

Rubric

Web-Publishing Results

All the results for each project will be put on the course website so that the students can see each other's results. In class we will have presentations of the projects and the students will vote on who got the best results. If you do not want your results published to the web, you can choose to opt out. If you want to opt out, email cs143tas[at]cs.brown.edu saying so.

Handing in

This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5 points. The folder you hand in must contain the following:

Then run: cs143_handin proj2
If it is not in your path, you can run it directly: /course/cs143/bin/cs143_handin proj2

Credits

Project and some code adapted from Berkeley Computer Vision Group. Project description and code by Libin "Geoffrey" Sun and James Hays.