Many approaches to ranking a list of products or search results are based on assigning a score to each item and sorting in descending order—in other words, greedy sorting approaches. In e-commerce, predictive models place the product that the customer is most likely to be interested in at the top, followed by the second most likely product, and so forth. But shopkeepers in physical stores know that shelf arrangement is key, and that product appeal is not a fixed quality: it can actually change depending on the context in which a product appears. For example, placing a cheaper item next to an expensive item of the same category could show them both off to their best advantage, highlighting the thriftiness of one and the luxuriousness of the other. When shelves are perfectly arranged, even if some individual products sell less, the store as a whole will make more sales. This phenomenon has been studied in marketing psychology under the heading of context effects, including such phenomena as the attraction, compromise, and similarity effects.
At Wayfair, we use a state-of-the-art deep learning model that predicts the probability of ATC (add to shopping cart) for each product by a customer at a given time. The ATC probabilities of items on a page are used to sort that page in real time. This approach has been very successful for us, but it assumes that the local context of the product doesn’t matter. However, products on the website are almost always shown with neighbors—either above and below in a list, or all around it in a product grid. By optimizing the arrangement of the page as a whole we can jointly improve the customer experience and site performance even further. We can train a model that is sensitive to the properties of a product’s neighbors including price, sale flag saturation, and visual diversity, without any assumptions about the direction of impact. For example, diverse choices might help sales of wall art (where variety would make it easier to to scan a wide assortment of looks), while in the category of kitchen utensils people might prefer to compare similar items (where similarity would make it easier to find the exact right style of kitchen knife, for example).
To accomplish this, we developed a new model—called TopShelf—that acts as an extension to this existing product ranking algorithm, and leverages the juxtaposition of products to make the list or grid as a whole shine. It thinks of browse pages as a series of “shelves,” and uses a predictive model to rearrange items between the top few shelves on the page for maximum appeal. The rest of this article explains how it was developed.
The First Challenge: Defining Context
To train a model for arranging browse pages that is sensitive to local context, we first had to define how that context would be represented, including how far it would reach. Let’s say we defined the context of a sofa on Wayfair’s sofa browse page in terms of the placement of every other couch around it. Given the 16,000 sofas in our catalogue, the number of possible contexts would quickly become unmanageable! Even if we limited the context to the 48 products that could appear on a browse page, that would still leave 1.24 x 1061 possible arrangements—more than the number of atoms that make up the Earth!
While looking for a way to tame the combinatorial explosion of possibilities to evaluate, we noticed that most monitors only show one row of products at a time. Therefore we made the assumption that the most relevant local context in this situation was the products that appear on the same row—or “shelf”—of a product grid, and that the exact order on the shelf was less important Although other, more wide-reaching definitions of context are possible, the need for efficiently recommending lists of products means that we chose the narrowest definition of context that could still capture important the important effects of neighbors.
Scoring a Product in Context
After defining this context, the first step we took towards optimizing the page as a whole was to build a new ranking model to score products, not just for a customer at a certain point in their browsing journey, but also for all the possible contexts in which the product could appear. Of course the training data did not contain every possible context for every product, but there are enough variations in Wayfair’s product grids that we were still able to generalize the effect of a neighbor’s properties.
The training dataset we built for this model contains columns not only for product and customer properties, but also for new features derived from the other two products found on the same shelf. These features include the price of the neighbors, their sale status (did they have a “sale” flag in the corner), and how distant their product images were in a visual embedding space from the images of the target product. Using this dataset, we then trained a deep neural network, just as we do for our regular personalized ranking algorithm.
In the resulting model, the score—which represents the value of a product to a customer at a point in their journey—varies depending on the neighbors (see Figure 1 for an example). Some products are very sensitive to context, and some are not. In this example, the score for the target bed (on the left) is much higher in a case when the neighbors are more expensive, more visually distant, and have fewer sale flags.
In fact, in this example there is a steady linear relationship between visual distance of neighbors and the score assigned by the model, as well as between neighbor price and score. See Figure 2 for an illustration of this relationship.
Although it is hard to generalize about the effects of contextual features, since an effect can be modulated by other features, we often see this phenomenon of higher price and diversity translating to higher predicted ATC rates. But the neural network can capture all the higher order interactions and category-specific phenomena that alter contextual effects. Adding these contextual features provided a notable increase in model test performance in both MSE and AUC.
TopShelf: Using the Contextual Reranker to Optimize the Page
Having a contextual ranking model allowed us to predict how well each product would do in a particular local context. However, we still needed to translate that into building a more optimal page arrangement. In contrast to a basic ranking algorithm that does not take into account context, when trying to optimize the page as a whole, placing products is no longer as simple as sorting in descending order of score. It is necessary to evaluate products in many possible contexts, and when they are placed they in turn provide the context to other products. Although the assumptions listed earlier help to cut down the possibilities, each potential shelf requires three neural network evaluations, or around 50,000 for every single browse page load, which could add unacceptable lag. By focusing on optimizing the top 15 products on each page, (the top 5 shelves of 3 products) we were able to cut our search space down to 455 shelves, or 1365 evaluations, and this number can be scaled up or down depending on performance needs.
Wayfair’s new algorithm TopShelf uses this model of customer behavior using local context to arrange products in the most critical locations at the top of the page. TopShelf resembles a basic ranking algorithm, in that it greedily picks the item with the best score, and then selects the best of what’s left until no options remain. But instead of ranking products, TopShelf ranks possible shelves. As illustrated in Figure 3, when a shelf is chosen, all other shelves that contain those products are eliminated from consideration for the next round of scoring.
One challenge we faced while developing the algorithm was determining how to score a shelf as a whole from the scores of each product in context. A first approach would be to take the mean of the scores on each shelf. However, offline evaluation (described below) showed that to maximize sales for the page as a whole, it might be better to pick the shelves with the best maximum score for its component items (this concept is illustrated in Figure 4).
Offline Evaluation of Pagewide Performance Boost
The recommendation teams at Wayfair use rigorous offline evaluation to compare new models to the previous baseline before any test launch, computing metrics such as nDCG and recall from historical data. However, these metrics reward algorithms only for placing the item that was actually selected by the customer as high in the ranking as possible, which ignores the effect of the nearby products on customer behavior. A metric that captures these effects and rewards boosting the overall ATC rate of the page should better match the in-practice performance of the KPIs we care most about—overall ATC rate for a page of products.
As such, we trained a model to predict pagewide ATC rate from a combination of individual product contextual ranker scores, positional effects, and product category, since positions higher on a page receive much more attention and ATC rate differs greatly between categories. In the high-traffic categories we examined, TopShelf boosted pagewide ATC rates approximately 5% more above the baseline, relative to the basic ranking algorithm (see Figure 5 for a visualization by class).
During this development period we also worked to optimize the runtime in production, since whatever time the algorithm takes to run is added on top of the time needed for the existing recommendation system.
We Are Never Done
TopShelf is one approach to the cutting edge field of slate optimization, and there are a number of emerging technologies that we hope to evaluate as alternative approaches in the future. TopShelf leverages our existing recommendation engine to efficiently optimize the most profitable slots at the top of browse pages, and it will provide a crucial first test of the benefit of considering context. It is significant to the company as a new direction in recommendation strategy, from optimizing one product at a time to the page as a whole. Once the concept has been proven for browse pages, it could be applied to every part of the website that displays products in context: sales pages, emails, “customer also viewed” carousels, and so on. And many other features of the neighboring products can be added. Along with the power of deep neural networks to capture interactions among features, we can capitalize on every context effect, not just the ones discovered and named in marketing psychology—effects that are powerful or subtle, personalized or generic, positive or negative, direct or interactive with other effects. Context matters!