When a search is performed on Google, it can find and present the most relevant results from billions of pages within seconds. When we look at how it achieves this, we encounter the concept of "embedding." In short, embedding is a method of translating the meanings of words, numbers, and images into numerical expressions that computers can understand. Artificial intelligence assigns a coordinate to a piece of content in a virtual universe, and in this way, it positions semantically similar content close to each other. The search engine algorithm then finds the most accurate results by looking at the proximity of these coordinates.

Until recently, each piece of content was represented by a single vector. While it was fast, its accuracy was low because it couldn't delve into the depth of the content. Afterward, more advanced multi-vector models began to emerge. These models did not define each word of the content with a separate vector but represented the entire content with the sum of these vectors. This allowed for more accurate results, but this time it was slower. MUVERA takes the good aspects of these two methods, solves their drawbacks, and can deliver the most accurate result in the fastest way.

Embedding Technology - Speed or Accuracy?

For years, the working principle of search engines was based on single-vector embedding technology. We can think of it like summarizing a large book with a single word. When we search for that word, we can easily find that book among billions of others, but we miss out on many other things described in the book.

To address this shortcoming, multi-vector embedding technologies like ColBERT were developed. These models summarize each chapter of a book separately and then bring these summaries together. When a chapter from that book is searched, it uses the combined summaries and can consider the book as a whole. With this method, we can find what we are looking for more accurately, but we may have to wait a bit. Because when hundreds of vectors are created instead of a single one, the amount of data that needs to be processed increased. As the similarity calculation algorithm evolved, the way data was processed also became more complex. For these reasons, search time and costs increased.

This is where MUVERA comes in and combines the good aspects of both embedding technologies. It achieves this by reducing the multi-vector set to a single vector to make it more efficient. The name of this method is: Fixed Dimensional Encoding, also abbreviated as FDE. MUVERA takes the complex yet semantically rich vector set of a piece of content, created with multi-vector embedding technology, and compresses it into a single FDE vector.

The compression process constitutes the first step of the two-step process in MUVERA. The search doesn't just look at this compressed FDE vector; it uses an elimination method to find the best result.

First, it performs a very fast search on the FDEs to create a small candidate group of the most likely relevant ones. Then, it moves on to a slower analysis, this time starting to examine the original multi-vector sets. Thanks to this stage, which it calls "re-ranking," it can offer both efficiency with fast systems and accuracy thanks to data richness.

One of the things that makes the FDE method so efficient and smart is that it treats queries and documents differently. MUVERA performs an asymmetric encoding by summing the vectors in a user's search query while taking the average of the vectors in the documents to be searched. Thanks to this technique, it can determine whether what is sought in a query exists in a document. In this technique, since FDEs work independently of a specific dataset, they have a structure that can adapt to constantly changing and newly added data.

MUVERA in Statistics

  • Compared to the previous most advanced system, MUVERA answers search queries on average 90% faster while increasing the accuracy rate by an average of 10%.

  • Compared to traditional embedding methods, it scans 5 to 20 times fewer candidate documents to achieve the same accuracy rate.

  • The key point in MUVERA's performance increase is its ability to compress by a factor of 32 without a significant drop in search quality. This allows it to increase its query capacity per second by 20 times.

  • With single-vector embedding, 300 unique document candidates were needed to achieve an 80% accuracy rate. MUVERA can achieve the same rate with 60 candidates. This means that even in the most efficient scenario, it only needs to process 5 times fewer candidates.

MUVERA is much more than a technological development in the digital world. It is an innovation that will fundamentally change the way and comfort of accessing information and should have a place in the strategies of digital marketers. From now on, instead of asking, "What searches is my target audience performing?" professionals should ask, "What problem is my target audience trying to solve, what information do they want to access?" This means that in the new era, strategies should shift from a keyword focus to a context focus. To succeed, you must place user intent at the center of your strategy and produce content by covering a topic in all its details and from different perspectives.