When a Wayfair customer shops for a product that is just right for her home, color plays a big role. For example, a customer might say, “I am looking for a green chair.” Or she might say, “I am looking for a teal chair.” There may be various versions of these customer color search stories. As data scientists at Wayfair, we want to solve these customer problems by algorithmically extracting the color information from the products and assigning customer-friendly color names.
Describing the color of a product accurately is challenging. Colors can be defined precisely by a set of numeric values – RGB describes a color by the intensity of red, green, and blue hues, but describing them in natural language is not straightforward.
Some RGBs can be described by multiple similar color names. For example, a hex-code (the hexadecimal format of an RGB value) #0F385C can be described as “navy”, “dark blue”, “midnight blue” or just “blue”. On the other hand, some RGBs can be described by multiple different color names. For example, a hex-code #9999FF in the second figure above can be described as both “blue” and “purple”.
Solving this problem will provide accurate, complete and granular color names tagged to each product. Accuracy will allow us to reduce the number of customers leaving the website due to sub-par color filtering. Driving completeness by tagging the color of the millions of products in the Wayfair catalog will give our customers a richer selection to shop from, not missing out on quality products that don’t have any color tagged. Granularity of color names will allow our customers to quickly filter to the specific shade they have in mind. Focusing on these three pillars will make it so that a customer searching for a “Blue Sofa” will only see blue sofas, not black. They’ll see all the blue sofas that Wayfair sells, and if they want to narrow their search to “Teal Sofa”, they’ll be able to filter further to get the exact shade they have in mind.
In order to solve this challenging problem of describing the color of a product, we first capture the relationship between RGBs and the more frequently used color names by algorithmically defining a color taxonomy. We then build an algorithm that takes a product image as an input, extracts the RGB values from it, and leverages the color taxonomy to assign color names to the product. In the rest of the post, we will talk about how we build the color taxonomy, how we operationalize the assignment of color tags to products in the Wayfair catalog, associated challenges, and future work.
Wayfair’s Color Taxonomy
We developed Wayfair’s Color Taxonomy, an algorithm-defined color palette that captures the hierarchical relationship between RGBs of products in the Wayfair catalog and their associated color names. The Wayfair’s Color Taxonomy consists of two main elements: 1) RGB hierarchy, which describes the relationship between RGB values (e.g. #000080 - “navy” is a child of blue), and 2) RGB naming, which assigns human-friendly names to RGB values (e.g, #0c1f5e can be described as “navy blue” and “dark blue”).
RGB hierarchy and naming algorithm
We build the RGB hierarchy that contains groups of similar RGB values at a particular level (RGB groups associated with “midnight blue”, “dark blue” and “navy blue” are in the same level) and parent-child relationships between RGB groups at different levels (e.g. RGBs associated with “teal” and RGB associated with “navy” share the same parent group “blue”).
The RGB hierarchy algorithm quantifies the difference between two colors using a metric called delta-E  which takes into account the human eye perception. The larger the value of delta-E, the more distinguishable the color difference: delta-E < 2 has negligible color differences to the human eye, and delta-E >10 has distinguishable color differences to the human eye.
The algorithm groups the RGBs based on pairwise delta-E distances and organizes them into the hierarchy using a bottom-up approach, as illustrated in the figure below.
After building the RGB hierarchy, we assign color names to the RGB groups on each level. We first assign the color names to the lowest level of RGB groups leveraging an open source (Wikipedia + ColorHexa) mapping of color names to RGB values. We then use a bottom up strategy to roll up the color names from the lowest to highest level.
We will next explain how we build each level in the hierarchy and how we assign color names at each of those levels.
Level 4 -- Most granular(RGB) level
The lowest level (level 4) of the hierarchy consists of the RGBs that we extracted and sampled from the bounding boxes of products across the Wayfair catalog (bedding, outdoor, rugs, upholstery, etc.). Bounding boxes are drawn onto product images by human annotators to represent a specific product attribute (i.e. upholstery, leg, frame...). We run K-means clustering on those extracted RGBs and derive level 4 color groups . We choose k = 4055, in order to keep the minimum pairwise distance between cluster centroids of delta-E > 2. This ensures all RGBs in level 4 are visually distinguishable.
Once the color groups at level 4 are derived, our algorithm assigns color names to them by leveraging ~1200 unique pairs of RGB values and color names that were scraped from public sources e.g. Wikipedia and colorhexa. We compute the delta-E pairwise distances between level 4 RGBs and web scraped RGBs. If the distance is <3, the name corresponding to web scraped RGB is assigned to the level 4 RGB group. For example, “medium turquoise”, “blizzard blue”, “middle blue” are assigned to three different RGB groups at level 4, as their shades are different enough to be assigned separate color names based on the delta-E pairwise distance.
Level 3 -- Granular colors with narrow spectrum
We obtain level 3 color groups by further grouping visually similar RGB values from level 4 by using Birch clustering. The distance between a RGB to its cluster centroid is 5 delta-E. This ensures that the clusters are cohesive and the RGBs in the clusters are distinguishable at a glance.
In order to assign the color names to the color groups at level 3, our algorithm adopts a bottom-up strategy. A particular level 3 group takes the aggregated color names from level 4 groups that were clustered together to form that level 3 group. For example, one level 3 RGB group contains three level 4 color groups and hence has three color names: “medium turquoise”, “blizzard blue”, “middle blue”.
Level 2 -- Granular colors with wide spectrum
We obtain level 2 color groups by using graph algorithms on level 3 groups. We build a graph such that level 3 color groups are the nodes and the two nodes are connected if the delta-E distance between the color of their centroids is <10 (where colors are more similar than opposite). We then run a graph-based algorithm that iteratively identifies the maximum size cliques at our level 2 color groups until the smallest clique is of size 3. This allows a color group of level 3 to be tied to two different colors (e.g. RGBs for “teal” can be tied to both “green” and “blue”). It also enables a wide spectrum of RGBs within a major color (e.g. “turquoise”, “aqua”, “teal” RGBs are tied to “blue”). The example below highlights a level 3 color group in the center, which can be tied to both blue and green.
Similar to level 3 color naming, level 2 color groups also take the aggregated color names from level 3 color groups that were clustered together. For example: “medium turquoise”, “blizzard blue”, “middle blue”, “teal”, “dark turquoise” describe one color group at level 2.
Level 1 -- Most basic & unambiguous colors
For level 1, we utilize a predefined list of colors (“red”, “green”, “blue”, “yellow”, “purple”, “pink”, “black”, “white”, “orange”, “brown”, “gray”) identified through research  as the most unambiguous basic colors. In addition, we include “beige” due to its popularity in the Wayfair catalog. Each of these level 1 colors have ~30 RGB values associated with them which we obtain by requesting a team of designers to curate them. We define the relationship between level 2 and level 1 via k - nearest neighbor search – defining all level 1 RGBs as points in space and every RGB in a level 2 group as a query point. We choose k = 9 and assign the level 2 RGB to the level 1 colors that have similarity above a threshold (30%). The threshold is introduced to allow one level 2 RGB to be tied to different level 1 color names.
Once the RGB relationship between level 1 and level 2 color groups is defined, the associated level 1 color names are linked to the level 2 color names (e.g. blue from level 1 is linked to “turquoise”, “blizzard blue” etc.).
Wayfair’s Color Tagging Pipeline
Once we define the taxonomy, we can turn our attention to the main problem: ‘What color is this product?’ We extract the RGB information from the product image and leverage the developed taxonomy in order to provide a color tag. The color tagging algorithm consists of two main components: 1) dominant color extraction from product imagery, and 2) mapping of the extracted colors to the taxonomy to obtain string tags.
We leverage human annotators to draw bounding boxes onto product images to represent a specific product attribute (i.e. upholstery, leg, frame...). We then cluster the RGBs within the bounding boxes using mini-batch k-means, with k=5 and extract up to 5 dominant colors with their corresponding color volumes.
Once we extract the colors from the step above, we want to map those values to the closest neighbor within level 4 colors in the taxonomy. We leverage a popular package called faiss , a similarity search library that is optimized for speed and memory usage and supports GPU acceleration for index search. After finding the nearest neighbor in level 4, we utilize our RGB hierarchy to get names of the color at four levels of granularity.
The color tagging pipeline provides the color names for a product. However, we need to make sure that the color tags predicted by our algorithms are accurate. Using a human-in-the-loop framework allows us to evaluate the accuracy of our models. In the next section, we describe our evaluation setup and the results.
There are two main challenges in evaluating the color tags. 1) Lack of ground truth data and 2) Definition of correctness. We used the following strategies to overcome these challenges.
- Supplier tags followed by human judgement as ground truth: When suppliers add products to the Wayfair catalog, they also provide color tags for each product. These tags are not always complete, accurate or granular. However, we observed that they are mostly accurate when the product has only one color and the supplier provided tag is level 1 color such as “blue”. We considered this as our pseudo ground truth data and then compared model predicted tags with supplier tags. When the model predicted tags did not match with supplier provided tags, we involved human judgement as a source of truth.
- Acceptance rate as a metric for evaluation: Color correctness is very subjective and, thus, hard to define. For this reason, we choose ‘acceptance rate’ as a metric for evaluation. We consider a color tag acceptable if the model predicted color and supplier tagged color are the same. If the model and supplier colors differ, we let a human evaluator determine if the model predicted color is acceptable.
While evaluating the model, we found that the model achieves 63% agreement with supplier tags. We sent the remaining 37% data for HITL evaluation. Combining the two-step evaluation together, the model achieves 88% acceptance rate. Below are some examples where human agents preferred model predictions over supplier tags.
We will continue to iterate and improve our color tagging algorithm. One of the biggest limitations currently is that our prediction is not robust to noise in the input image. For example, we observed “white” products being predicted as “gray” due to shadows from the images. This, if not solved, will result in poor customer experience when they place an order based on color tags and receive the white products instead of gray products. To solve this challenge, we are going to explore other data sources (i.e. supplier descriptions, digital swatches, etc.), as well as other features derived from product imagery (i.e. HSV spectrum, color histogram, etc.).
We plan to expand our scope to describe metallic-like colors to keep improving the experience of Wayfair customers. Our reflective model which covers metallic colors and metallic finishes using convolutional neural networks is under development and will come in the near future.
If you find our work interesting, please connect with us! We’re looking for talented data scientists, machine learning engineers, and product managers to join our team and lead innovations in Visual Information Extraction at Wayfair! Please find job descriptions below:
- Luo, Ming. (2002). The CIE 2000 colour difference formula: CIEDE2000. Color Research & Application. 4421. 10.1117/12.464549.
- Berlin B, Kay P (1969) Basic Color Terms: Their Universality and Evolution (Univ of California Press, Berkeley, CA).
- Jeff Johnson, Matthijs Douze, and Herv`e J`egou. Billion-scale similarity search with gpus. ´ arXiv preprint arXiv:1702.08734, 2017. [github]