How can I get all searchable (non-deleted) documents in Amazon Cloud Search

I want to get my entire searchable document from cloudearch

I tried to do such a negative search:

search-[mySearchEndPoint].cloudsearch.amazonaws.com/2011-02-01/search?bq=(not keywords: '!!!testtest!!!') 

It works, but it also returns all deleted documents.

So how can I get only the active document?

+7
source share
2 answers

The key to understanding is that CloudSearch doesn’t really delete. Instead, the "delete" function stores identifiers in the index, but clears all fields in these deleted documents, including setting the uint fields to 0. This is great for positive queries that won't match the text in cleared, "deleted" documents.

A workaround is to add the uint field to your documents, called "updated" below, to be used as a filter for requests that may return remote identifiers, such as negative requests.

(The following examples use the Boto interface library for CloudSearch , with many steps omitted for brevity.)

When you add documents, set the current timestamp for this field.

 doc['updated'] = now_utc # unix time in seconds; useful for 'version' also. doc_service.add(id, now_utc, doc) conn.commit() 

when you delete, CloudSearch sets the uint fields to 0:

 doc_service.delete(id, now_utc) conn.commit() # CloudSearch sets doc 'updated' field = 0 

Now you can distinguish between deleted and active documents in a negative query. The samples below are looking for a test index with 86 documents, with half removed.

 # negative query that shows both active and deleted IDs neg_query = "title:'-foobar'" results = search_service.search(bq=neg_query) results.hits # 86 docs in a test index # deleted items deleted_query = "updated:0" results = search_service.search(bq=deleted_query) results.hits # 46 of them have been deleted # negative, filtered query that lists only active filtered_query = "(and updated:1.. title:'-foobar')" results = search_service.search(bq=filtered_query) results.hits # 40 active docs 
+4
source

I think you can do it like this:

 search-[mySearchEndPoint].cloudsearch.amazonaws.com/2011-02-01/search?bq=-impossibleTermToSearch 

Attention to '-' at the beginning of a term

+1
source

All Articles