I am trying to implement the classifier discussed in this document . I implemented everything except function allocation. In section 5.1, the author writes:
โFor each superpixel, two types of objects are extracted: dense surfers, which are converted using signed quadrature values โโand Lab color values. In our experiments, it was also useful to extract objects around the superpixels precisely within its bounding box to include more context. Both values โโfor surfing and colors are encoded using the improved Fisher vectors implemented in VlFeat and gmm with 64 modes . We perform pca-whitening on both channels of functions . At the end, two vectors are coded traits are concatenated, creating a dense vector with 8576 values . "
A lot is going on here, and I got confused in what order I should follow the steps, and also in which part of the data set.
Here is my interpretation in pseudo python:
def getFeatures(images): surfs_arr = [] colors_arr = [] for image in images: superpixels = findSuperpixels for superpixel in superpixels: box = boundingBox(superpixel) surfs = findDenseSURFs(box) colors = findColorValues(box) surfs_arr.append(surfs) colors_arr.append(colors) surfs_sample = (randomly choose X samples from surfs_arr) colors_sample = (randomly choose Y samples from colors_arr)
my questions:
I am. should whitening the ATP before creating a GMM? (for example, as an example )
II. Do I have to remove the set of surfs_sample and colors_sample from surfs_arr and colors_arr respectively before they are encoded as Fisher Vectors?
III. As for the description of color values, is it better to leave them as they are or create a histogram?
IV. The author claims that he uses Dense SURF, but does not mention how dense. Do you recommend a specific starting point? 4x4, 16x16? Do I really not understand this?
v. Any idea when the author comes up with a "dense vector with 8576 values"? To get a consistent number of functions with different sizes of superpixels, it seems to me that it should be
1) using a histogram to represent color values, and either
2a) resizing each superpixel or
2b), changing the density of its SURF network.
I work in python w / numpy, opencv, scikit-learn, mahotas and vector ported from VLFeat.
Thanks.