Introduction
In previous posts, we explored using boost scores to balance the signal between text matches scored with BM25 and vector matches that are scored with cosine similarity. Because the orders of magnitude on these two methods can be drastically different, a positive boost on the lower-scoring cosine similarity is necessary for our vector matches to have a noticeable effect on the final ranking of documents.
There are, however, some drawbacks to this method. The first is that while we can get the scores of the vector match into the same ballpark with boosting, the scores that can be generated by the BM25 algorithm still have high variance. A second drawback is these BM25 scores are not necessarily stable as the product set that we are searching across changes. If, for instance, we suddenly add a ton of backyard grills to our product set, the terms ‘backyard’ and ‘grill’ will see their scores decrease.
Reciprocal Rank Fusion tries to mitigate these drawbacks by instead treating these as separate information retrieval tasks and then combining them based on their ranking. The algorithm uses a naive scoring function that assigns each document a score which is 1 divided by the document’s rank plus a constant. Any document that is retrieved in multiple systems has the scores from each system summed up for a total score. The constant in the denominator is in place to “mitigate the impact of high rankings from outlier systems”.
The original paper for reciprocal rank fusion is here, and there is a terrific talk by Phillip Krenn that shows some examples.
Here is a quick toy example of how this might work:
Rank | BM25 | Vector |
---|---|---|
1 | doc 1 | doc 6 |
2 | doc 6 | doc 4 |
3 | doc 3 | doc 1 |
4 | doc 4 | doc 3 |
5 | doc 2 | doc 5 |
Document | RRF (k=1) | Score |
---|---|---|
doc 1 | 1/(1+1) + 1/(1+3) | 0.750 |
doc 2 | 1/(1+5) + 0 | 0.167 |
doc 3 | 1/(1+3) + 1/(1+4) | 0.450 |
doc 4 | 1/(1+4) + 1/(1+2) | 0.533 |
doc 5 | 0 + 1/(1+5) | 0.167 |
doc 6 | 1/(1+2) + 1/(1+1) | 0.833 |
Rank | Document | Score |
---|---|---|
1 | doc 6 | 0.833 |
2 | doc 1 | 0.750 |
3 | doc 4 | 0.533 |
4 | doc 3 | 0.450 |
5 | doc 2 | 0.167 |
6 | doc 5 | 0.167 |
Running RRF Searches
I should note that as of this time the rrf rank function is available in elastic search but only for paying customers, here’s the documentation. It’s relatively simple to implement on our own, which we were able to do in the following fuction
def query_elasticsearch_rrf_multi(
es_client,
index_name,
search_text=None,
search_vector=None,
num_results=10,
num_query_results=50,
k=60,
boost_values = None,
):
"""Combines a vector search and a text search with RRF. Allows for multiple vector search fields.
Args:
es_client: An Elasticsearch client object.
index_name: The name of the index to query.
search_text: The text to search for.
search_vector: The vector to search with.
num_results: The maximum number of results to return (default 10).
num_results: The maximum number of results to return from each query type (default 50).
k: the RRF ranking constant.
boost_values: A dictionary containing the boost values for title, description, and attributes.
Returns:
The search a dataframe of results from Elasticsearch ranked by RRF.
"""
# Default boost values
boost_values = boost_values or {
"title_boost": 1,
"description_boost": 1,
"attributes_boost": 1,
"product_text_string_vector_boost": 1,
}
# Collect all our hits and the score column names
hits_list = []
hits_score_cols = []
# Text search
text_results = query_elasticsearch_hybrid(
es_client,
index_name,
search_text=search_text,
num_results=num_query_results,
boost_values=boost_values,
)
text_hits = pd.DataFrame(text_results['hits']['hits'])
if len(text_hits)>0:
text_hits['product_uid'] = [x['product_uid'] for x in text_hits['_source']]
text_hits['text_score'] = 1/ (text_hits['_score'].rank(ascending=False) + k)
hits_list.append(text_hits)
hits_score_cols.append('text_score')
# Vector seach
for key in boost_values.keys():
if "vector" in key:
vector_boost_value = {key: boost_values[key]}
vector_results = query_elasticsearch_hybrid(
es_client,
index_name,
search_vector=search_vector,
num_results=num_query_results,
boost_values=vector_boost_value,
)
vector_hits = pd.DataFrame(vector_results['hits']['hits'])
vector_hits['product_uid'] = [x['product_uid'] for x in vector_hits['_source']]
vector_hits[key +'_score'] = 1 / (vector_hits['_score'].rank(ascending=False) + k)
vector_hits.rename(columns={'_index': key + '_index',
'_id': key + '_id',
'_score': key + '_score',
'_source': key + '_source',
}, inplace=True)
hits_list.append(vector_hits)
hits_score_cols.append(key +'_score')
# Combine all hits
all_hits = hits_list[0]
if len(hits_list) > 1:
for i in range(1,len(hits_list)):
all_hits = all_hits.merge(hits_list[i], how='outer', on='product_uid')
all_hits[hits_score_cols] = all_hits[hits_score_cols].fillna(0)
all_hits['rrf_score'] = all_hits[hits_score_cols].sum(axis=1)
all_hits.sort_values('rrf_score', inplace=True, ascending=False)
all_hits.reset_index(inplace=True, drop=True)
return all_hits[:num_results]
Reading this function you can see that we’re actually running multiple queries and then combining the results through merging the results together. This is going to be significantly slower than letting Elasticsearch handle the hybrid search, but I’m sure there are some clever ways to speed this up. For the sake of example we’ll stick with this method for now.
RRF Results
We ran this RRF function two different ways: one using the text search and the embeddings of the entire product text, and a second using the text search and a vector search across the three separately embedded text fields. Lets look at the results:
Name | MRR | MAP | NDCG | Run Time |
---|---|---|---|---|
textsearch | 0.261 | 0.113 | 0.170 | 178.5 |
textsearch_boosted | 0.318 | 0.149 | 0.218 | 207.4 |
vectorsearch | 0.331 | 0.159 | 0.237 | 509.6 |
vectorsearch_multifield | 0.241 | 0.097 | 0.156 | 623.0 |
vectorsearch_multifield_boosted | 0.255 | 0.106 | 0.168 | 662.4 |
hybrid | 0.325 | 0.163 | 0.238 | 662.6 |
hybrid_boosted | 0.342 | 0.170 | 0.251 | 716.3 |
rrf | 0.361 | 0.174 | 0.260 | 4026.2 |
rrf_multi | 0.328 | 0.148 | 0.227 | 7013.1 |
Our Reciprocal Rank Fusion method that we used gives us our best results across all three of our evaluation metrics. It’s worth noting that our RRF over just the text and the single embedding gave us better performance over using all three embeddings. This aligns with the testing we did on just vector search where the single embedding vector search outperformed our tuned multifield vector search.
Conclusion
Throughout this series, we’ve explored Elasticsearch’s capabilities in text search and vector-based search, as well as strategies for optimizing search performance. We’ve examined methods for tuning weights between fields and different match types in hybrid search scenarios.
Our experiments have yielded promising results. By fine-tuning our initial text query across three fields and then incorporating vector search capabilities with Reciprocal Rank Fusion (RRF) for final rankings, we achieved a significant improvement in our evaluation metrics. Specifically, we saw increases of 14-20% across various performance indicators.
As next steps, consider running A/B tests to quantify the real-world impact of these improvements on user satisfaction and business metrics. Additionally, explore optimization techniques to reduce the computational overhead of RRF, potentially making it a more viable option for a broader range of applications.
If you want to dive deeper into the code, the notebooks for all of the work above can be found here: