The key to understanding is that CloudSearch doesnβt really delete. Instead, the "delete" function stores identifiers in the index, but clears all fields in these deleted documents, including setting the uint fields to 0. This is great for positive queries that won't match the text in cleared, "deleted" documents.
A workaround is to add the uint field to your documents, called "updated" below, to be used as a filter for requests that may return remote identifiers, such as negative requests.
(The following examples use the Boto interface library for CloudSearch , with many steps omitted for brevity.)
When you add documents, set the current timestamp for this field.
doc['updated'] = now_utc
when you delete, CloudSearch sets the uint fields to 0:
doc_service.delete(id, now_utc) conn.commit()
Now you can distinguish between deleted and active documents in a negative query. The samples below are looking for a test index with 86 documents, with half removed.
# negative query that shows both active and deleted IDs neg_query = "title:'-foobar'" results = search_service.search(bq=neg_query) results.hits # 86 docs in a test index # deleted items deleted_query = "updated:0" results = search_service.search(bq=deleted_query) results.hits # 46 of them have been deleted # negative, filtered query that lists only active filtered_query = "(and updated:1.. title:'-foobar')" results = search_service.search(bq=filtered_query) results.hits # 40 active docs
larham1
source share