Wayfair has tens of millions of products that serve the needs of over 33 million customers. Each of our customers have nuanced preferences as they buy products to help them create their own unique sense of home. Our search and recommender systems play an important role in helping customers find these products as quickly and conveniently as possible.
For example, when a customer searches for “outdoor seating” on Wayfair, our search system should know the most relevant products to surface to the customers at the top of the search results. Is this customer looking for an outdoor dining table? Or should the algorithm surface wooden benches to the top of the list?
State-of-the-art approaches to solving this problem rely on mining patterns in the behavior of a particular customer – or the behaviors of similar customers – to identify and display relevant products. The query classifier model at Wayfair learns from historical customer behavior, and considers criteria like items added to a shopping cart during a search session, past searches and past purchases as the dataset to develop our model. Wayfair’s query classifier models the raw text of the customer’s search query as the input. For each existing class of products at Wayfair – couches, dining tables, etc. – the model outputs a confidence score based on its relevance to the search query.
Our query classifier model has evolved over the years. The first version of the classifier was made up of a set of logistic regression models based on n-grams: models that assign probabilities to sequences of words. We found that these simplified models fell short in being able to capture the complexities and nuances of search queries. While it performed adequately on simpler queries, it failed when it came to understanding more complex customer searches: for example, if a customer searching for a “desk bed” was referring to bunk beds with a desk underneath, or tray desks used in beds.
To be able to respond to more complex queries effectively, we adopted a Convolutional Neural Network (CNN) model for the second version of our classifier. This model architecture allows us to interpret the class information in the search query within a few milliseconds. Latency, or the time it takes for a model to return results for a search query, is an especially important criterion for e-commerce recommender systems. In a recent study, Amazon found that every 100 milliseconds in latency cost the company one percent in sales. We found that the CNN model delivered a far superior semantic understanding of customer queries in part due to pre-trained embeddings. Crucially, the latency of the model was a mere 3 milliseconds.
To further improve model performance, the third version of our classifier added multiple filters of varying sizes to the model. In this most recent version, multiple filters work in parallel before applying pooling – essentially summarizing the presence of features – to the concatenated results.
Because the model has multiple filter sizes, we capture dependencies between words in search queries extending to as many as four terms. As a result, the third version of our classifier is more sensitive to the complexities in customer queries. However, with greater complexity comes higher latency – our newest classifier registered a 4X increase in latency (12 milliseconds) as compared to the older model.
We didn’t want to give up the improvements in semantic understanding. At the same time, we wanted to drive down the latency to prior levels. We were able to get the best of both worlds by following one of the guiding tenets at Wayfair’s science organization: collaborate with the broader scientific community to solve problems at scale.
Mimicking the sparsity in the human brain
We were able to train our version three classifier model on commodity CPUs, while at the same time achieve a markedly lower latency rate by working with artificial intelligence startup ThirdAI (pronounced Third Eye). The startup’s BOLT AI technology uses hash-based processing algorithms to accelerate both the training and inference of neural networks.
Transformers consist of encoders and decoders to understand context and then produce a meaningful output. The former encodes each sequence of an input into a vector, while the latter understands the context and produces an output. ThirdAI’s Universal Deep Transformers (UDT) can convert feature vectors in a wide variety of formats into a prediction. UDTs are designed for tasks involving extreme classification, where algorithms have to pick among a variety of correct responses in scenarios characterized by uncertainty. Especially relevant to is the low latency enabled by their solution. UDT provides an inference latency ranging up to a few milliseconds on traditional CPUs, irrespective of model size.
Our team was particularly impressed by how ThirdAI’s technology tackles this problem by developing a framework that mimics the sparsity in the human brain.
The human brain has over 100 billion neurons. Each of these neurons is connected on average to over 70,000 other neurons. Even though the brain is a densely connected system, most stimuli activate only a few number of neurons – or “spikes”—in our brains. To date, the promise of sparse coding has been difficult to replicate in artificial neural networks, because it has been difficult to train the network to know exactly which neurons to spike.
The scientists at ThirdAI solved this problem by using a simple and elegant method. Instead of exhaustively evaluating and then sorting neuron activations, they reorganize the neurons in computer memory so that neurons with similar activation patterns are stored close together. As neurons get updated, they adjust their locations.
During training, the UDT receives the activations from the previous layer. It queries the computer memory for a cluster of neurons that are similar to the current activation pattern. This allows the technology to spike the returned neurons, all without performing an exhaustive search. This efficient associative memory step makes the neural network faster – to an extent where it takes orders of magnitude lower FLOPS on a standard CPU to achieve the same accuracy as traditional algorithms on a GPU.
The path forward
With ThirdAI, we can train larger models that allow us to capture nuances in customer queries with lower latencies. We are currently running a series on A/B experiments between the ThirdAI model versus our earlier approach. The initial results are promising – the ThirdAI model has been shown to reduce the model inference time while creating a more relevant and faster browsing experience for our customers.
You might have heard of the phrase, “It takes a village.” This is especially true for scientists. Solving complex problems that make a real-world impact on the lives of millions of people can be hard. We are excited to continue our journey of collaborating with the larger community to develop solutions that scale. If you have an idea on how you can help fulfill our mission of helping people create their own unique sense of home, give us a shout at