Introduction

Welcome to the third post in our search relevance series! In our previous posts, we explored text-based search and vector-based search separately. We saw how each approach has its strengths: text search excels at exact matches and handling specific terms, while vector search captures semantic meaning and handles synonyms and misspellings well. In this post, we’ll combine these approaches to create a hybrid search system that leverages the best of both worlds.

Our hybrid approach will aim to:

Combine the precision of text-based search with the semantic understanding of vector-based search.
Improve our ability to handle a wider range of queries, including those with misspellings or without exact matches.
Further enhance our search relevance metrics.

We’ll cover the following key topics:

Implementing a basic hybrid search by combining text and vector queries
Tuning the hybrid search through boosting
Evaluating the performance of our hybrid approach
Exploring the advantages of hybrid search, particularly for handling edge cases

Let’s dive in and see how we can create a more robust and effective search system through hybridization.

Running Hybrid Searches

With Elasticsearch’s kNN query, we can simply add the kNN matching clause into the list of text matches that we had originally used for our text search. One thing to note is that the text match scores are often orders of magnitude larger than the vector match scores. In order to give the vector match a fair chance in the final rankings, we will need to boost their scores. Similar to how we tuned the text scores, we will iterate over boost values for the vector scores in search of the highest-ranking evaluation metrics.

We can create our new hybrid search queries in the following manner:

python
Copy
query_body = {
    "size": num_results,
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "product_title": {
                            "query": search_text,
                        }
                    }
                },
                {
                    "match": {
                        "product_description": {
                            "query": search_text,
                        }
                    }
                },
                {
                    "nested": {
                        "path": "product_attributes",
                        "query": {
                            "match": {
                                "product_attributes.name_value": {
                                    "query": search_text,
                                }
                            }
                        }
                    }
                },
                {
                    "knn": {
                        "field": "product_vector",
                        "query_vector": search_vector,
                        "num_candidates": 50,
                    }
                }
            ]
        }
    }
}

By tuning our vector match boosts, we’re effectively able to balance the quality of the text matches between the original text and the products with the semantic relationship captured by the vector embeddings.

Hybrid Search Results

Let’s take a look at a sample query in our three methods so far. The user query in this case is “hanging shelves”

Text

Position	Score	Product ID	Product Title	Relevance
1	115.4625	107182	Ameriwood Wardrobe Storage Closet with Hanging Rod and 2-Shelves in American Cherry	2.67
2	87.12547	117672	Whitmor 19.50 in. x 45.38 in. x 68.00 in. Double Rod Closet Shelves
3	84.42163	174420	4-Shelves Tier Pole Caddy in Bronze
4	83.25287	106280	Rolling Shelves 17 in. Express Pullout Shelf
5	83.25287	114168	Rolling Shelves 21 in. Express Pullout Shelf
6	81.89681	116589	Zenith Premium Bathtub and Shower Pole Caddy with 4 Shelves in White
7	80.95274	193107	Stack-On 42 in. DIY Workbench with Full Length Steel Shelves
8	80.55455	192343	Command Picture Hanging Solution Kit
9	80.42712	190110	Martha Stewart Living 24 in. Espresso Shelves (2-Pack)
10	80.26959	181845	Fresca Allier 16 in. W Bathroom Linen Cabinet with 2 Glass Shelves in White

The top result here for the “Ameriwood Wardrobe Storage Closet with Hanging Rod and 2-Shelves” comes in with a relatively high relevance score of 2.67 (out of three). The rest of the results seem to be for mostly shelves and are missing our “hanging” component.

Vector

Position	Score	Product ID	Product Title	Relevance
1	0.702164	128711	Honey-Can-Do 8-Shelf PEVA hanging organizer	3
2	0.700011	104901	4D Concepts Hanging Wall Corner Shelf Storage	3
3	0.699165	197453	Martha Stewart Living Solutions 70 in. Silhouette 2-entryway Shelf with Hooks
4	0.699132	119985	Design House 12 in. x 10-3/16 in. White Shelf-Hanging Rod Bracket
5	0.699011	139090	4D Concepts Hanging Wall Corner Shelf Storage	2.67
6	0.696995	134901	Prepac 36 in. W Hanging Entryway Shelf	3
7	0.696304	136407	Prepac 60 in. Wall-Mounted Coat Rack in White
8	0.696301	188206	New Age Industrial 15 in. D x 48 in. L 12-Gauge Aluminum Wall Shelf
9	0.695971	136535	Prepac 48.5 in. x 19.25 in. Floating Entryway Shelf and Coat Rack in Black
10	0.695865	161429	Houseworks 34 in. x 5-1/4 in. Unfinished Wood Decor Shelf with Pegs

Our vector query is able to embed the “hanging shelves” into a single representation, and because of that is able to return more products that look like a hanging shelf solution.

Hybrid

Position	Score	Product ID	Product Title	Relevance
1	115.4625	107182	Ameriwood Wardrobe Storage Closet with Hanging Rod and 2-Shelves in American Cherry	2.67
2	110.6256	163204	ClosetMaid 54 in. Canteen 8-Shelf Hanging Organizer
3	110.371	119437	ClosetMaid 54 in. Mocha 8-Shelf Hanging Organizer	2.67
4	109.9637	186249	ClosetMaid 54 in. Canteen 10-Shelf Hanging Organizer
5	109.8537	141074	ClosetMaid 54 in. Mocha 10-Shelf Hanging Organizer	3
6	107.4734	124425	ClosetMaid 24 in. White Versatile Hanging Shelf	3
7	106.2367	109858	ClosetMaid 24 in. Hanging Wire Shelf	3
8	104.0418	154365	Home Decorators Collection 2-Shelves and Towel Rack in Chrome
9	103.2512	128711	Honey-Can-Do 8-Shelf PEVA hanging organizer	3
10	101.6648	212563	Martha Stewart Living Garage 6 in. H x 24 in. W White Metal Shelves

Our hybrid solution shows us that we really can get the best of both worlds. The top result is our top result from the text search. The 9th result is our top result from the vector search. In between these two are four other high-relevance products that weren’t in the top ten results of our text or vector searches alone.

Evaluation

We’ve run a lot of additional queries up to this point, let’s check back into the evaluation scores to see how we’re shaping up. Recall from the first post that we will be using Mean Recipricol Rank, Mean Average Precision, and Normalized Discounted Cumulative Gain.

Name	MRR	MAP	NDCG	Run Time
textsearch	0.261	0.113	0.170	178.5
textsearch_boosted	0.318	0.149	0.218	207.4
vectorsearch	0.331	0.159	0.237	509.6
vectorsearch_multifield	0.241	0.097	0.156	623.0
vectorsearch_multifield_tuned	0.255	0.106	0.168	662.4
hybrid	0.325	0.163	0.238	662.6
hybrid_boosted	0.342	0.170	0.251	716.3
Increase over Text	7.7%	14.1%	15.5%	245.4%
Increase over Vector	3.4%	6.9%	6.2%	40.6%

We can see that our tuned hybrid really does give us that “best of both worlds” and the best performance across all three metrics. By tuning our hybrid search we were able to make a 7%-16% gain across all three of our metrics from the tuned text search and a 3%-7% gain across all three of our metrics from the vector search. It worth noting, however, that we take a 245% hit on the time it takes to run these queries over the basic text query.

Conclusion

In this post, we combined the additional capability of vector based search to our text searches across the search relevance dataset. We tuned our hybrid queries using a similar method to how we tuned our text search.

This added capability allowed us to surface better results across our set of queries and even surface results where there weren’t any previously available due to the limitation of text based search.

While Hybrid approaches offer clear advantages in terms of relevance and semantic understanding, it’s once again important to note that these techniques often come with increased computational complexity and indexing costs. The trade-off between performance and relevance should be carefully evaluated based on the specific requirements of the search application.

In the next post, we’ll take our exploration of hybrid search one step further by exploring an additional ranking method called Reciprocal Rank Fusion.

If you want to dive deeper into the code, the notebooks for all of the work above can be found here:

3-hybrid.ipynb

📚↗️ Search Relevance: Hybrid Search

Combining The Best of Text and Vector Approaches

Introduction

Running Hybrid Searches

Hybrid Search Results

Evaluation

Conclusion