I am creating an image search system (similar to Google reverse image search) for the cataloging system used inside my company. We have already successfully used Elasticsearch for our regular search function, so I plan to hash all of our images, create a separate index for them and use it to search. There are many elements in the system, and each element can have several images associated with it, and the element must be able to find the inverse image by looking at any images associated with it.
There are two possible schemes that we thought of:
Create a document for each image containing only the hash of the image and the identifier of the element with which it is associated. This will result in about 7 million documents, but they will be small because they contain only one hash and identifier.
Creating a document for each element and saving hashes of all the images associated with it in the document array. This will lead to the creation of about ~ 100 thousand documents, but each document will be quite large, some elements have hundreds of images associated with them.
Which of these schemes will be more effective?
source
share