How Wayfair Tests Software for the Most-Shopped Catalog of Home Goods in the World

The online retailing market is very competitive and demanding, so you need to deliver new features/functionalities to your customers as quickly as possible. In these realities, every wasted minute can cost you thousands of dollars.

At Wayfair, we have dozens of different tools/applications (internal and external) and more than a hundred teams that support them. We release our code to production hundreds of times a day with confidence in the quality of the products. In this article, I focus on the Wayfair Catalog Engineering team

In Catalog Engineering we create a flexible, scalable, and user-centric marketplace-platform. We perpetually curate the largest, most engaging, and most shopped catalog of home goods in the world. We efficiently collect, transform, and optimize product data to power a market-defining consumer catalog experience so that consumers discover the product they’ll love.

This article is about testing, so, please, let's meet the Catalog Quality, Engineering team.

Catalog Quality Engineering enables Catalog Engineering teams with excellence in test and development strategy through automation and test pipeline infrastructure. We partner with many Wayfair teams and working groups to advance quality training, tooling, and automated frameworks.

Let’s start our journey from the very beginning of a feature life-cycle, where testing starts when a Product Manager (PM) comes up with the idea of a new feature and shares it with the team.

We analyze requirements

The cost to fix bugs found during the testing phase could be 15x more than the cost of fixing those found during requirements-phase or design-phase. We train the teams on what to look for during requirements-analysis and share the best practices to make sure everyone is on the same page (acceptance-test-driven development, behavior-driven development). Through conversations and collaborations between key stakeholders, we discover the value of the proposed functionality and build the right software.

QEs educate teams on how to write acceptance-criteria in BDD style where applicable, and on how to analyze requirements for completeness, correctness, consistency, clearness, and testability.

We use static сode analysis, vulnerability scanners, and code reviews

“Quality comes not from inspection, but from the improvement of the production process.”

- - W. Edwards Deming

Quality is not an afterthought. It must go beyond the product, and we cannot add it at or after release or inspect it into the product. That is why, in our department, we heavily invest in the tools and processes that help us to prevent defects or at least identify issues as early as possible in the process.

We actively use tools and platforms for continuous inspection of code quality to perform automatic reviews with static analysis of code to detect bugs, code smells, and security vulnerabilities. These approaches help us to catch tricky bugs, prevent undefined behavior (impacting end-users), fix vulnerabilities that compromise our apps, and make sure our codebase is clean and maintainable with minimal tech debt.

We write unit tests

Unit testing helps us to provide developers with a mechanism for producing self-documenting code, gives us a higher-level of quality in our software, and uncovers problems early. We cultivate a culture of writing unit tests for new features within the sprint and making it a part of the Definition of Done. We set quality gates on the unit test coverage for newly added functionality to visualize our progress in expanding coverage.

We automate only what matters most

At Wayfair we take care to differentiate automated tests and test automation. Automated tests are just scripts that help you to avoid testing manually, while Test Automation is about how we write and apply tests as a key element of the software development life cycle (SDLC).

Most of the teams do not produce in-sprint test automation for newly added features, especially for GUI-related functionalities. This is intentional; we automate where Return on Investment is high.

We focus on isolated unit tests and isolated component integration tests. GUI and integration tests are slow, and they do not put pressure on design. They are expensive to write and maintain, and they are also fragile. With so many teams and applications supported, it is complex to have a stable Development environment for comprehensive testing. We build tools that allow us to run tests locally within our own instances of shared services so we can manipulate data any way we want and be confident in the stability of the environment and tests. I can’t emphasize this enough: tests are either trustworthy or useless.

We visualize the health of the product and test automation

We acquire, aggregate, and analyze automated test results and metrics (code coverage, passing rate, performance metrics, etc) from development and production environments to visualize the product's health. Based on that data, we create dashboards with fancy-looking charts and graphs and display them on TVs across our office to make everyone feel engaged in perfecting the quality of our products.

We enable continuous delivery through deployment pipelines

Automated tests are almost useless without proper integration into the delivery process. We design and build CI/CD pipelines for all of our tools to make it possible to deliver new features to our customers with a minimum of manual manipulation.

We have an infrastructure that allows us to deliver features one by one. We potentially can start developing functionality in the morning and release it the same day to production with confidence in quality.

We use incremental rollout for “big” features

Rolling out a new tool or new features to millions of customers could be a risky venture. Fortunately, we have tools and techniques that help us to decrease the risk:

We use Feature Toggles to easily turn features on or off in production or to make features available only on the Wayfair internal network.
We use Supplier Toggles so we can turn the feature on for some of the suppliers that want to contribute to making our tools even better. Our suppliers are eager to try new tools and features that allow them to boost their sales, so they are happy to participate in beta-tests of some of the features and provide us with valuable feedback.
Wayfair helps people around the world to find the best items for their homes. We can take advantage of this geographical diversity by, for example, releasing a feature for the European audience first and then, based on the results, releasing it to the North American audience (or the other way around). We can do that by using Deploy-level controls (servers).
We conduct A/B Testing. We give one version of the feature to one group of our users and the other version to another group. Then we measure the difference in performance and collect feedback.

We constantly monitor our applications in production

Slow and "buggy" website pages can be very costly and are a bad experience for customers. We always keep an eye on key metrics for our applications like usage, performance, System Health, errors, etc. It is doubly important if we release something “big” to our customers. We also communicate with a support team closely to gather feedback and complaints faster.

If something goes wrong we rollback the changes

We have infrastructure that allows us to easily rollback deployments or shut off code at a feature level, so we can easily remove or "hide" features if a large-impact-defect is identified after deployment. This can be achieved by using the Feature Toggles discussed above, rolling back changes, or delivering a hotfix in seconds.

Optimizing for a sense of urgency in getting code to our customers necessarily increases our risk of releasing the occasional bug. We accept this risk, balance it with the criticality of the system under development, and fix released faults in the next planned release.

A final note: we did not transition to our current team structure (without embedded Quality Engineers within the scrum teams) in one day. First, we had to get our team members test-infected (in a good way), inspired by good role models from adjacent teams, and engaged in the test automation process. Only then could we cultivate the culture in which the whole team is responsible for the test automation and quality in general.

Now we empower autonomous teams across Catalog Engineering to deliver value rapidly, repeatedly, and reliably through building tools for testing, defining standards/best practices, and training teams on test automation.