Happy to have Roi Krakovski joining us to talk about search. Roi is the Founder and CEO of Usearch, “The World’s First Search Engine Based on Synthetic AI-Generated Data.”. I have followed Usearch over the past couple years and am excited to have the opportunity to ask Roi some questions about what he’s working on.
1. Hi Roi, how do you describe Usearch in lay person’s terms so that non-search people understand what you are working on? What sets it apart from other search engines?
Conventional enterprise search engines are not capable of addressing user queries with accuracy and coverage that is anywhere near that of the two behemoths in the internet search engine space – Google and Bing. On the other hand, Google or Bing are not customizable to a sector or an industry and lack domain knowledge.
As an example, let’s consider the search query “Silver Oak”. There is a Silver Oak Services Partners (a leading lower-middle marketing firm), a Silver Oak winery, a Silver Oak golf course, a Silver Oak street, a Silver Oak tree, a Silver Oak college and many more. All are relevant in the broad sense and can be found by scanning Google’s results. However, in the realm of finance, only the first interpretation is likely to be relevant for a financial organization.
We bring the ability to build a Google-like search engine for every sector or industry (like real estate, finance, health, media etc.) so that organizations can have a customized Google-like experience at scale. Our search engines enjoy the accuracy and scalability of internet search engines like Google and Bing, but unlike these general purpose search engines, our search engines are highly customizable, leverage domain knowledge and offer a B2B business model.
2. I know that Usearch uses its own technology to create query logs. Basically mapping queries to documents. This offers huge benefits for protecting user privacy because Usearch is not reliant on real query logs. I’m wondering if this is the final approach for Usearch, or if it’s a stop-gap, bootstrap approach to training an engine without having extensive query logs and user data?
An exhaustive query log is the starting point for building an internet search engine. It is the only way to retrieve the most relevant web pages for every search query. The quality and the volume of a Query Log (that is, the quality of the associations between the queries and the relevant web pages) is a key factor of a search engine’s quality. This dominates the search engine performance and accuracy. The richness of a query log, enables us to train our AI algorithms to understand the user intent and predict what possibly a user would like to see.
Our benchmarks prove that we can get very close to Google with our synthetic query log and without using any real query logs or tracking users. That’s the only way to build a fully private search engine.
3. As a follow up question, how do you build query logs this way? My understanding is that you have based your query generation system on how the human brain works. Just as the brain uses bite-sized chunks to remember things, Usearch breaks a document into bite-sized chunks by creating a list of terms and tokenized phrases that are deemed as being important on the document. Do I have that right? Is this done in isolation at each document level, or is there an overlaying graph that leverages terms and phrases across your larger index and thereby provides weighting for terms and phrases?
There is a learning process that extracts the important features of a document. Or phrased differently, the minimal set of entities that best describes the document. This process is called context encoding and is similar to what our brain does when it indexes our memories. That’s where Autoassociative Memory Networks comes into the game. Autoassociative Memory Networks are capable of remembering data by storing a small portion of that data (the context). The task of extracting the important parts of a single document requires knowledge gathered from a large volume of data. Our first algorithm was a large-scale implementation of an Autoassociative Memory Network, that processes millions of web pages together and outputs their important parts.
4. I have tremendous respect for you starting a new search engine. Can you tell us about the business side of Usearch? Usearch offers general web search and tabs for images, news, and trending news. Usearch also offers custom search, a search API, and a search query API. Is the goal to become a consumer destination search engine or to be a B2B platform for other engines to be built on?
Our goal is to give an entry point to the web for any organization or a company. We focus on vertical search engines, such as real estate search engines, investment management search engines, financial search engines etc.
Our vision is to create a unified B2B entry point to the web. Both for supporting other search engines and for providing a scale solution for organizations to find the most relevant data in the vastness of the web.
5. Lastly, what is the search industry like in Israel these days?
Israel is famous for its enterprise search solutions, like the popular open source Elasticsearch. We know that technologies like Elasticsearch/Solr/Lucene couldn’t scale up to support an internet search engine and provide such a high level of accuracy. That is why we took a completely different approach.