CSCI 1430 - Introduction to Computer Vision

Housing problem

The hypothesis of the project: having images of several contextual classes (houses indoors, outdoors, not houses at all) we can extract features from them, cluster and store as a vocabulary. If every image gets a histogram of feature frequencies, we can train a classifier like SVM and build a model. It should be able to assign a class label to new images never seen before. After that using a regression it is possible to predict a price of a house depicted in a new image based on a class and known price for images of the same class.

Scene recognition is one of the core vision tasks. For this project we build a classifier based on bag of words approach. Words are visual features in this context. The features' arrangement from images doesn't matter in this paradigm. The only thing that matters is the distribution of features among images of the same class.
First we collect the features from images. The bigger the size of the vocabulary we are building, the better - easier to discriminate between classes. After the vocabulary is built based on all possible training images, we build a histogram for each of them, describing what features from the vocabulary they have. We use SVM to build a classification model based on histograms belonging to particular classes. After that we plug in new images never seen before and try to classify them into one of the predefined classes: house outdoors, house indoors, non-house. The prediction accuracy varies from 0.43 to 0.72. After that using regression analysis we try to predict a price for a house having SIFT features and knowing it's class. Prediction price error varies and presented in table 1. There data set is not big enough to say if the model actually work, but the results look promising.
In this project we classify images only containing houses either indoors or outdoors. Image dataset and price labels were collected on real estate sites. The largest prediction failure happens to prices of houses underrepresented in training set and so the error reaches incredible $100M, while on houses priced less than 1M the median error is about $180K and median for the test set is about $215K. Additional statistics on error rate are available here.

Discussion:
In order to prove the algorithm to be working a larger data set is needed. This will reduce classification error and increase prediction rate for the prices. Instead of regression analysis a support vector regression algorithm may be used. And the lastly SIFT may be not the best way to characterize features from the images, but this also requires separate comparison based research.

The algorithm of the program:
VLFeat paths are set to be able to call the SIFT, Kmeans and KDTrees related functions
read all filenames
set vocabulary size
build vocabulary:
      for every image:
           read it
           convert to single precision
           extract SIFT features
           append to feature matrix
      convert feature matrix to single precision
      cluster using k-means
      return vocabulary
calculate parameters for regression
build a forest out of vocabulary using kdtree
for every class and image in it:
      make histogram:
           read image
           convert to gray scale
           extract SIFT features
           query the forest with extracted features
           build histograms out of results
           normalize it dividing by the largest value contained
use SVM to classify histograms and obtain models
for every new test image:
      classify it
      extract SIFT features
      predict price

Image 1. Program output with classification accuracy and house price prediction (if applicable)

Images 2-7. Sample training images

Page owner: Georgy Megrelishvili. Date December 13, 2011