How data scientists at Wayfair build scalable ML systems to programmatically optimize marketing decisions.
As an eCommerce company, Wayfair runs many ad campaigns each day with a variety of formats, including digital ads on display network, social media, online video, as well as physical ads such as direct mail postcards and catalogs. In addition, we also use our marketing technology to optimize organic ways to reach our customers such as email and push notifications, etc. Every day we make numerous marketing decisions such as: what message/content to show, where to show this content, whom to show, and how much to pay. In this blog post, I’ll explain a few common machine learning (ML) approaches in marketing data science, and how data scientists at Wayfair build scalable ML systems to programmatically optimize marketing decisions.
General Propensity Modeling
To improve marketing performance, one common ML approach is to build general propensity models, which predict the likelihood of a person to buy or engage with certain products. These models are usually trained with observational data, and thus can be retrained and updated in production without running randomized experiments. This ML approach is highly scalable with relatively low model maintenance costs. Once a robust ML scoring system is built with foundational propensity models, adding additional models and/or supporting different marketing use cases are relatively low effort (Figure 1). We have successfully applied general propensity models to improve business performance of multiple marketing campaigns at Wayfair. That said, general propensity models may not perform well for each use case. In addition, rolling out the same models across different marketing programs might lead to over-messaging to the same audience that can hurt our customer’s experience and lead to cross-program competition/cannibalization. These problems can be solved by adding a treatment-optimization system on top of the general propensity modeling system, which will be discussed later in this article.
Specialized Response/Uplift Modeling
In contrast to the general solution of propensity modeling, a different approach is to develop specialized solutions for each marketing program. In this approach, a number of marketing programs (or “channels”) are designed according to the customer journey, where audience eligibility for each program is defined by business rules. For example, a retargeting program aims to drive conversions from people who had a site visit in the past X number of days; a loyalty program aims to drive repeat orders from customers who had made a purchase in the past Y months. Data scientists can then build channel-specific response models to target people who are most likely to respond to a specific type of ads and convert. To drive true incremental revenue, we also build uplift models to target “persuadables'' whose conversions are caused by ad treatments 1, 2. The channel-specific models generally perform well for well-designed marketing programs, since each model is developed and back-tested with data that are most representative of each channel’s operations. It’s noteworthy that the success of channel-specific models relies on a well-scoped business strategy that outlines different marketing programs. For example, it is impossible to build a working model for a program with ill-defined audience eligibility and/or success metrics. Compared with general propensity models, channel-specific models are more costly to maintain and less generalizable outside the initial scope. In particular, building robust uplift models requires regular data collection with randomized controlled trial (RCT) experiments on each program (Figure 2), which incurs both operational costs and opportunity costs of not showing ads to some customers. Consequently, uplift modeling is hard to scale with hundreds of marketing campaigns while adapting to constant changes in the ads environments, e.g. updates in business operations, privacy updates in the advertising industry.
Multi-layer Marketing ML Platform
Both general propensity models and specialized response/uplift models leverage supervised ML algorithms to learn from historical data what the characteristics of a given customer are and whether we should send a message to the customer. These models do not provide the best “global optimization” since they do not take into account the full context of “customer” x “ad treatment” interactions to optimize multichannel marketing decisions, e.g. what treatment options are available for a given customer, how to assign different treatments among all the customers with budget allocation constraints. In addition, always assigning the “best ad” predicted for a given customer may not be the best solution, since it can cause ad fatigue that leads to campaign performance drop over time. To solve these challenges, we are developing a multi-layer ML platform to optimize the aggregated rewards (e.g. measured by a business KPI like ROAS) across marketing campaigns. For example, our Paid Media ML platform (“WayLift”) can leverage reinforcement learning algorithms to balance tradeoffs between exploiting the “best ads” and exploring alternative ads treatments. By constantly observing the rewards for recent treatments, WayLift can quickly adapt to changes in the environment (e.g. new campaigns, refreshed creatives, updated vendor processes) and update the optimal treatment decisions (Figure 3).
More details of WayLift will be explained in future blog posts. Briefly, a multi-layer marketing ML platform is composed of three types of models (Figure 4):
(1) Customer Scoring: A group of general propensity models provide the fundamental knowledge about our customers. Each model is trained on observational data to help us understand what resonates with a customer, from affinity for different styles, to interests in particular products or categories (beds, sofas, etc), to where in the purchase cycle is the customer likely to be and what price points will resonate with them. The customer scoring models can be regularly retrained, usually at a quarterly or yearly cadence, without interruption to business-as-usual (BAU) operations, to capture the newest data trends. Each model can be directly integrated into marketing tests (like mentioned above), or used as “base models” in the WayLift platform. An interpretable low-dimensional customer vector can be constructed with outputs from all the base models to provide the full context around a customer for the “decision optimization” models to determine the optimal treatments.
(2) Decision Optimization: A set of rules and/or ML models are applied on top of customer scoring models to improve performance of marketing campaigns. The implementation details are case dependent (see examples in Figure 4). In some cases, we develop online and batch learning systems to constantly update and deploy “optimization models” to algorithmically optimize treatment decisions. Each optimization model takes the output(s) of one or more prediction models as the customer context inputs, and decides the best treatment for similar customers based on their past engagement and outcomes of various treatments. Since the system is set to regularly refresh & deploy models with newly available data (usually at a daily or weekly cadence), it will automatically adapt to industry/business updates to maintain strong performance.
(3) Feedback Generation: A set of forecasting models produce “delayed rewards” as feedback for the treatment optimization models to learn and improve decisions. Each model predicts the longer term financial impact of our decisions as early as possible, based on near-term engagement metrics. For example, we have some models to forecast business KPIs (e.g. 60-day GRS) from short-term upper-funnel metrics (e.g. clicks, product page views), and other models to predict changes in customer lifetime value (CLV) upon certain events (e.g. app installs, service signups). These models allow the reinforcement learner to learn quickly while still optimizing for longer term impact.
In this blog post, we explained a few common ML approaches for improving marketing campaign performance. The pros and cons of each approach is summarized in Table 1 for the readers’ reference. We also introduced WayLift as a multi-layer ML platform that can achieve both scalability of ML solutions and strong performance of marketing campaigns. In the next blog post, we’ll explain more details on how Wayfair data scientists collaborate with ad tech engineers and marketers to build WayLift. Stay tuned to learn more!
Table 1. Pros and Cons of different ML approaches for improving marketing campaign performance
If you find our work interesting, please connect with us! We’re looking for talented data scientists, machine learning engineers, and managers to join our team and lead innovations in marketing data science and ad tech at Wayfair! Please find job descriptions below: