Elasticsearch recommend authors of a book: how to limit a maximum of 3 books per author?

I use Elasticsearch to recommend authors (my Elasticsearch documents represent books, with a title, summary, and list of author IDs).

A user requests my index with some text (for example, Georgiaor Paris), and I need to aggregate the rating of individual books at the author level (which means: recommend an author who writes about Paris).

I started with a simple aggregation, however, experimentally (cross-checking), it is better to stop aggregating the rating of each user after a maximum of 4 books per user. Thus, we do not have an author with 200 books that can "dominate" the results. Let me explain in pseudo code:

# the aggregated score of each author
Map<Author, Double>  author_scores = new Map()
# the number of books (hits) that contributed to each author
Map<Author, Integer> author_cnt = new Map()

# iterate ES query results
for Document doc in hits:

    # stop aggregating if more that 4 books from this author have already been found
    if (author_cnt.get(doc.author_id) < 4):
        author_scores.increment_by(doc.author_id, doc.score)
        author_cnt.increment_by(doc.author_id, 1)

the_result = author_scores.sort_map_by_value(reverse=true)

, , DSL ElaSearch.

+4
1

, ES. , , - "top_hits" . , , , , X-, .

"" "", , ES, 3 , . , ES , , . , "-" " ". .

:

{
  "query": {
    "match": {
      "title": "Paris"
    }
  },
  "aggs": {
    "top-authors": {
      "terms": {
        "field": "author_ids"
      },
      "aggs": {
        "top_books_hits": {
          "top_hits": {
            "sort": [
              {
                "_score": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "include": [
                "title"
              ]
            },
            "size": 3
          }
        }
      }
    }
  }
}
+2

All Articles