Vocabulary is a compact way to extract images. There are three ways to implement this algorithm, and it depends heavily on other computer vision methods, for example. SIFT.
The first step is to build a kmeans tree using sifting descriptors. The leaf nodes of this tree contain a “bag” of screening descriptors. The second step is to create an image database using the dictionary tree that you create in the first step. You can view this process as quantizing an image into a vector space. Then the third step is to query the image for the image database. Of course, there are some detailed methods, such as an inverted list, etc.
Here is a good vocabulary tree implementation - libvot . It basically performs the three steps described above. It uses the standard C ++ 11 multi-threaded library to speed up the build process, so it works pretty fast.
paper . .