Elasticsearch from 0 to sky: Relevance in Queries

Relevance in Queries

When we make a query we are interested in the most relevant documents, relevance is measured in recall, precision, classification.

  • Recall: the ratio of true positives against all documents that should be returned (many results).
  • Precision: the ratio of true positives against all documents that have been returned (few results).
  • Ranking: is the order in which the documents are returned according to relevance.

It is ideal to find the balance between Recall and Precision, Elasticsearch shows the results by calculating the score.

Match searches

Returns documents, the text is parsed, by default it uses the “or” logic between multiple terms.

By default the documents are returned computer by _score, Elasticsearch limits the total of counts to 10000 to improve performance, the “relation” parameter shows the precision of the search.

If the parameter “track_total_hits” is true, it returns all documents.

In this example we do a search with the “or” operator, the result is greater than 10000, we have used “track_total_hits” for this.

Improving precision

Using the “or” operator will return many results (by default), to improve it we can use the “and” operator, in this way it improves the precision and lowers the recall.

The parameter “minimum_should_match” indicates the amount of terms needed to match.

In this example we use the “and” operator to obtain a single result.

How the score works

Elasticsearch uses the BM25 algorithm to calculate the score, the score is displayed in a field called _score.

There are three factors for the score:

  • TF (term frequency): the more a term appears in a field, the more important it is.
  • IDF (inverse document frequency): when more documents contain a term it is less important.
  • Field length: shorter fields are more relevant than longer ones.

--

--

--

Data Engineer Elastic Stack

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Lit Review Vol. 1 | S. Janson — Tail Bounds for Sums… (2018)

Best clustering algorithms for anomaly detection

Class Based Variable Importance for Medical Decision Making BJSTR

Understanding Efficient Frontier

Analysing British MP Voting Similarity Using Neo4J Graph Database

Does technical analysis work? Here’s proof!

Touch-Driven Recommender Engines

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Iván Frías Molina

Iván Frías Molina

Data Engineer Elastic Stack

More from Medium

Elasticsearch In Action: Core Data Types

Batch Insert to Kudu Table Using Apache Nifi

Kafka for dummies and with practical failure experiments

Elasticsearch 7.16 is there: What’s new?