Wayfair Tech Blog

Guide to Seamless Kubernetes Deploys

b348bcb9-k8.jpg

Search Technologies is a highly impactful team at Wayfair looking for experienced engineers. We build fast, reliable and powerful customer-facing search systems across all Wayfair platforms. Our infrastructure provides the most enjoyable browsing and shopping experience for our customers and partners. Search Technologies is very collaborative, as we partner with storefront and app teams, data science, infrastructure teams and operational engineering teams to create a holistic search experience for Wayfair’s customers.

Our Product Search service serves all search traffic on all Wayfair websites. Most pages at Wayfair use this service to power at least one feature, whether it's the entire search results or a small product carousel. Our search service is hosted in kubernetes aka k8s as a stateless service. 

Problem Statement

We were facing connection draining issues while deploying new code to our search service and dropping about 10% req/min requests due to service unavailable - http status code 503. 

checks
Griffin Carroll
Note:  This sign means we implemented it as part of our solution.

We are very well equipped with resources in terms of CPU and RAM in k8s so we knew this was not the problem. We noticed that there were some pods which got destroyed even before new ones started up at each deployment; this happens because the default value of Max Unavailable value is 25%. Which means at each deployment we tell k8s that it’s okay if 25% pods are unavailable at deploy time, this affects the calls which are just sent to the pods which k8s terminated.  Hence we tuned configs in k8s with maxSurge = 25% and MaxUnavailable = 0% which ensured that old pods are not removed until new ones came up.

What is Max Unavailable?

  • spec.strategy.rollingUpdate.maxUnavailable is an optional field that specifies the maximum number of Pods that can be unavailable during the update process.
  • Max Unavailable value can’t be 0 if maxSurge is 0
  • The default value is 25%.

What is Max Surge?

  • spec.strategy.rollingUpdate.maxSurge is an optional field that specifies the maximum number of Pods that can be created over the desired number of Pods.
  • Value can’t be 0 if MaxUnavailable is 0.  
  • The default value is 25%.

You can specify maxUnavailable and maxSurge to control the rolling update process. The value can be an absolute number (for example, 5) or a percentage of desired Pods (for example, 10%). The absolute number is calculated from the percentage by rounding up.

checks
Griffin Carroll
 We implemented following K8s configs at deployment level

mainDeployment:
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0%
type: RollingUpdate

After doing this change while deploying new code pods were still dropping ~0.75% req/min due to service being unavailable

So we went back to the origins to understand the Kubernetes Termination Lifecycle and used it to our advantage. We will walk through each step in detail below:

Kubernetes Termination Lifecycle

pic12.jpg
Griffin Carroll

Step 1: Pod is set to the “Terminating” State and removed from the endpoints list of all Services.

At this point, the pod stops getting new traffic. Containers running in the pod will not be affected. There is not much we can do to modify this step.

Step 2 : preStop Hook is executed

What is a prestop hook?

  • The preStop Hook is a special command or http request that is sent to the containers in the pod.
  • Prestop hook is called immediately before a container is terminated 
  • Prestop hook fails if the container is already in terminated or completed state. 
  • It is blocking (synchronous) must complete before the signal to stop the container can be sent. 
  • No parameters are passed to the handler.

When to use a prestop hook?

  • The preStop hook is a great way to trigger a graceful shutdown without modifying the application.
  • If your application doesn’t gracefully shut down when receiving a SIGTERM 
  • Most programs gracefully shut down when receiving a SIGTERM
  • If you are using third-party code or are managing a system you don’t have control over

Example 1 : http request that is sent to the containers in the pod.

lifecycle:
preStop:
exec:
command: ["curl", "-XPOST", "http://URL"]

Example 2 : special command request that is sent to the containers in the pod.

lifecycle:
preStop:
exec:
command: ["/bin/bash", "-c", "sleep 15"]

Step 3 - SIGTERM signal is sent to the pod

At this point, Kubernetes will send a SIGTERM signal to the containers in the pod. 

What is SIGTERM?

  • This signal lets the containers know that they are going to be shut down soon.
  • Your code should listen for this event and start shutting down cleanly at this point. 
  • This may include stopping any long-lived connections (like a database connection or WebSocket stream), saving the current state, or anything like that.

When to use SIGTERM?

Even if you are using the preStop hook, it is important that you test what happens to your application if you send it a SIGTERM signal, so you are not surprised in production!

We looked into our application and tried to figure out what happens when SIGTERM is sent to the application. We noticed that we didn't allow enough time for the application to terminate. So we looked into options where we could intercept the SIGTERM signal and give additional time for the search application to close connections and terminate gracefully. Search for this option led us to discover DelayedShutdownHander from dropwizard-health.

checks
Griffin Carroll
We implemented DelayedShutdownHandler from dropwizard-health

We modified our code to make use of intercepting the SIGTERM process as part of our application. Our microservice is a Java service which uses the Dropwizard framework. 

Adding DelayedShutdownHandler from dropwizard-health

  • DelayedShutdownHandler sets a healthy flag to false and delays shutdown to allow for a load balancer to determine the instance should no longer receive requests.
  • We were able to test the DelayedShutdownHandler with the following command kill -15 [pid] which sends a SIGTERM to the pod by k8s as one of the steps in k8s termination lifecycle.
  • This is how you can verify that your handler is working correctly: You will see following log from your application

INFO [2020-11-20 20:59:51,312] INFO [2020-11-20 20:59:51,312]

io.dropwizard.health.shutdown.Delayed ShutdownHandler: delayed shutdown:

started (waiting 30 seconds)

INFO [2020-11-20 21:00:21,318]

io.dropwizard.health.shutdown.Delayed ShutdownHandler: delayed shutdown:

finished

INFO [2020-11-20 21:00:21,326] org.eclipse.jetty.server.AbstractConnector:

Stopped application@47d4e28a{HTTP/1.1, [http/1.1]}{0.0.0.0:8080}

INFO [2020-11-20 21:00:21,327] org.eclipse.jetty.server.AbstractConnector:

Stopped admin@177068db{HTTP/1.1, [http/1.1]}{0.0.0.0:8081}

INFO [2020-11-20 21:00:21,329]

org.eclipse.jetty.server.handler.ContextHandler: Stopped

i.d.j. MutableServletContextHandler@7eb774c3{/,null, UNAVAILABLE)

INFO [2020-11-20 21:00:21,338]

org.eclipse.jetty.server.handler.ContextHandler: Stopped

i.d.j. MutableServletContextHandler@5357de0e{/,null, UNAVAILABLE)

Process finished with exit code 130 (interrupted by signal 2: SIGINT)

What if you use Springboot? Don't worry, you can follow below steps:

Spring Boot - Max Wait Time Before Termination

  • Though we've configured to wait for ongoing and queued up tasks to complete, Spring continues with the shutdown of the rest of the container. This could release resources needed by our task executor and cause the tasks to fail.
  • In order to block the shutdown of the rest of the container, we can specify a max wait time on the ThreadPoolTaskExecutor: eg.
  • taskExecutor.setAwaitTerminationSeconds(30);
  • This ensures that for the specified time period, the shutdown process at the container level will be blocked.
  • When we set the setWaitForTasksToCompleteOnShutdown flag to true, we need to specify a significantly higher timeout so that all remaining tasks in the queue are also executed.

Reference: (link)

Step 4 - Kubernetes waits for a grace period

What is the termination grace period?

  1. At this point, Kubernetes waits for a specified time called the termination grace period. By default, this is 30 seconds. It’s important to note that this happens in parallel to the preStop hook and the SIGTERM signal. Kubernetes does not wait for the preStop hook to finish.
  2. If your app finishes shutting down and exits before the terminationGracePeriod is done, Kubernetes moves to the next step immediately.

When to use grace period?

You will see messages in the k8s logs saying the pod was SIGKILLED, which means the pod did not terminate with the previous SIGTERM command. If your pod usually takes longer than 30 seconds to shut down, make sure you increase the grace period. You can do that by setting the terminationGracePeriodSeconds option in the Pod YAML.

checks
Griffin Carroll

We implemented terminationGracePeriodSeconds to change from default 30 seconds to 60 seconds
which you can add in your k8s configs at deployment level: 

Example:

containers:
mainContainer:
terminationGracePeriodSeconds: 60

Step 5 - SIGKILL signal is sent to pod, and the pod is removed

The SIGKILL signal is sent to a process to cause it to terminate immediately (kill). In contrast to SIGTERM and SIGINT, this signal cannot be caught or ignored, and the receiving process cannot perform any clean-up upon receiving this signal.

We implemented the following strategies to have more seamless deployments : 

  • Rolling Update Deployment in k8s with maxSurge = 25% and MaxUnavailable = 0%
  • Added DelayedShutdownHandler from dropwizard-health in our code
  • Change terminationGracePeriodSeconds from default 30 seconds to 60 seconds

After these changes while deploying new code pods were dropping ~0% req/min due to service unavailable - http status code 503. We reached our goal of seamless deploys of product search service in kubernetes at deployment time. Your solution may differ from ours but these guidelines are helpful to understand the basics so that you can tweak configs and code to your advantage.

We have a couple of openings on our Search Technologies team if you would like to join please check out our career page.  

Please reach out if you have any questions.

Special thanks to John Castillo, Conrad Liu  and Upendra Penegalapati for helping to work on this effort and refine this blog. 

Share