Housing problem
The hypothesis of the project: having images of several contextual classes (houses indoors, outdoors, not houses at all)
we can extract features from them, cluster and store as a vocabulary. If every image gets a histogram of feature
frequencies, we can train a classifier like SVM and build a model. It should be able to assign a class label to new
images never seen before. After that using a regression it is possible to predict a price of a house depicted in a new
image based on a class and known price for images of the same class.
Scene recognition is one of the core vision tasks. For this project we build a classifier based on bag of words approach.
Words are visual features in this context. The features' arrangement from images doesn't matter in this paradigm. The only
thing that matters is the distribution of features among images of the same class.
First we collect the features from images. The bigger the size of the vocabulary we are building, the better - easier to
discriminate between classes. After the vocabulary is built based on all possible training images, we build a histogram for
each of them, describing what features from the vocabulary they have. We use SVM to build a classification model based on
histograms belonging to particular classes. After that we plug in new images never seen before and try to classify them
into one of the predefined classes: house outdoors, house indoors, non-house. The prediction accuracy varies from 0.43 to
0.72. After that using regression analysis we try to predict a price for a house having SIFT features and knowing it's
class. Prediction price error varies and presented in table 1. There data set is not big enough to say if the model
actually work, but the results look promising.
In this project we classify images only containing houses either indoors or outdoors. Image dataset and price labels were
collected on real estate sites. The largest prediction failure happens to prices of houses underrepresented in training set
and so the error reaches incredible $100M, while on houses priced less than 1M the median error is about $180K and median
for the test set is about $215K. Additional statistics on error rate are available
here.
Discussion:
In order to prove the algorithm to be working a larger data set is needed. This will reduce classification error and
increase prediction rate for the prices. Instead of regression analysis a support vector regression algorithm may be used.
And the lastly SIFT may be not the best way to characterize features from the images, but this also requires separate
comparison based research.
The algorithm of the program:
VLFeat paths are set to be able to call the SIFT, Kmeans and KDTrees related functions
read all filenames
set vocabulary size
build vocabulary:
for every image:
read it
convert to single precision
extract SIFT features
append to feature matrix
convert feature matrix to single precision
cluster using k-means
return vocabulary
calculate parameters for regression
build a forest out of vocabulary using kdtree
for every class and image in it:
make histogram:
read image
convert to gray scale
extract SIFT features
query the forest with extracted features
build histograms out of results
normalize it dividing by the largest value contained
use SVM to classify histograms and obtain models
for every new test image:
classify it
extract SIFT features
predict price
Image 1. Program output with classification accuracy and house price prediction (if applicable)
Images 2-7. Sample training images
|