Elasticsearch from 0 to sky: Relevance in Queries
Relevance in Queries
When we make a query we are interested in the most relevant documents, relevance is measured in recall, precision, classification.
- Recall: the ratio of true positives against all documents that should be returned (many results).
- Precision: the ratio of true positives against all documents that have been returned (few results).
- Ranking: is the order in which the documents are returned according to relevance.
It is ideal to find the balance between Recall and Precision, Elasticsearch shows the results by calculating the score.
Returns documents, the text is parsed, by default it uses the “or” logic between multiple terms.
By default the documents are returned computer by _score, Elasticsearch limits the total of counts to 10000 to improve performance, the “relation” parameter shows the precision of the search.
If the parameter “track_total_hits” is true, it returns all documents.
In this example we do a search with the “or” operator, the result is greater than 10000, we have used “track_total_hits” for this.
Using the “or” operator will return many results (by default), to improve it we can use the “and” operator, in this way it improves the precision and lowers the recall.
The parameter “minimum_should_match” indicates the amount of terms needed to match.
In this example we use the “and” operator to obtain a single result.
How the score works
Elasticsearch uses the BM25 algorithm to calculate the score, the score is displayed in a field called _score.
There are three factors for the score:
- TF (term frequency): the more a term appears in a field, the more important it is.
- IDF (inverse document frequency): when more documents contain a term it is less important.
- Field length: shorter fields are more relevant than longer ones.