Wayfair Tech Blog

An Inside Look at Wayfair’s Data Engine, Scribe

Today, many companies rely heavily on third-party event tracking platforms for collecting event data. Perhaps the most well-known of these is Google Analytics. However, at Wayfair, we chose to go another route, largely eschewing the third-party vendor model to develop our own first-party event tracking platform. It’s called Scribe.

Scribe was developed to serve as the primary event data platform at Wayfair, a place where teams can register, publish, stream, and consume event data. Scribe currently processes as much as 20 terabytes of data every day for a peak of 300,000 events per second. The platform also provides monitoring capabilities to data producers to ensure they have real-time visibility into their data publication patterns. As a result, Scribe powers teams across Wayfair, including those delivering shoppers’ experiences, managing suppliers, and optimizing marketing to drive revenue for Wayfair.

Scribe enables teams across Wayfair to monitor the different actions taking place on the site or within our Wayfair systems. For example, business teams may look for answers to the following questions:

  • Did a customer visit a particular page on the Storefront website, were they served the control or variant arm of an A/B test, and if so, what was the end impact on engagement?
  • Did Wayfair’s systems make a decision and serve it to the customer, was the decision in-line with our expectations?
  • Did our notification system decide to send an email to a new customer, and if not, what other event led to that decision?
High-Level Scribe Platform Diagram
High-Level Scribe Platform Diagram

Why First-Party Versus Third-Party?

At the top, we touched on our decision to develop a first-party platform. While there are additional technology costs for implementing and maintaining an event platform, there are significant benefits to this pattern.

At Wayfair, we focus on providing a seamless experience for our customers, which means that data must be at the root of many of our decisions. Here’s how Scribe delivers.

Single View of the Customer

Scribe enables teams across applications and domains to publish data to one source of truth. By doing so, teams within one organization can view the end-to-end journey of a customer. This is primarily enabled through data standardization concepts that we’ve enforced at the ingestion layer to ensure a consistent set of events across domains. For example, when referencing data related to a user, it’s important to have consistent identifiers that can be easily understood. Setting a standard that can be enforced at registration and ingestion helps encourage teams to publish data that can be consumed for valuable insights.

As a result, a data consumer, such as a data scientist building a model to improve the customer experience, can build models with high confidence, knowing that the data they are using across domains has consistencies such as unique identifiers for a customer.

Control Over The Data

In addition to providing a single view, the first-party platform lets us maintain ownership of the data and infrastructure that gathers the event information. This provides peace of mind that our data is only collected in-house and stored in databases and storage technologies that we control.

Next is the event data itself. Third-party platforms can capture event data and provide details in areas such as customer page views, clicks, how many visitors you have over a given period, and other boiler-plate signals. This data is valuable, but these third-party offerings are preconfigured, which means you don’t have much control over things such as how a customer is defined or how the data is organized and stored. Teams also lack control when it comes to what they are looking for.

As you can infer, this control is important because with it comes greater data richness, granularity, and quality. This is crucial, especially when introducing concepts about who our users are. For example, a third-party system may have a general concept of who a user is and allow for variations, but it might not be as specific as Wayfair would like. With Scribe, we can define a visitor type attribute and provide more detailed types of users, such as an active customer versus a non-active customer. By maintaining this platform in-house, we also enable other teams within Wayfair to build supplemental platforms that are interoperable with Scribe. Some examples include user segmentation, identity services, and more.

How Does Scribe Handle Privacy?

As a company focused on delivering relevant experiences to our customers, we work hard to ensure their privacy preferences are respected across the data lifecycle. As you might expect, because the platform collects and stores customer data, this is a significant area of focus for Scribe. In fact, many of the benefits of developing a first-party platform outlined above also apply to privacy.

Control of the Data

By having complete control over the event creation process, we can define the exact schema for each event created and event-specific data retention periods. This means we capture only the data required to fulfill the purpose for which it is being collected, and we only retain it as long as it is needed - key data minimization principles. For example, we could configure a page view event, so it only captures customer ID, page ID, time viewed, and IP address, and then set its retention at 1.5 years. It’s unlikely we could get this level of control with 3rd party tools, which often have pre-configured event schema and retention periods.

Single View of the Customer

Developing Scribe as the core event data platform at Wayfair and having it support multiple domains (from Storefront to Logistics) also delivers privacy benefits. Specifically, it allows us to apply privacy protections, such as those outlined above, across different domains that use Scribe as their event storage platform. When event data from different domains is stored in separate platforms across the business, achieving this level of privacy becomes more complex and requires greater coordination across teams and platforms.

Conclusion

We are thrilled with our decision to develop our first-party platform. Since its launch, Scribe’s impact has exceeded our expectations while serving as yet another shining example of Wayfair’s commitment to innovation and growing a team of talented and visionary technologists. If you’ve been seeking this type of challenge and environment, Wayfair is the place for you. We encourage you to reach out now to hear more.

Share