Part 1 of a series on Code Deployment at Wayfair
Code Deployment at Wayfair has always been about creating the
fastest and most friction free deployment process possible .
Wayfair.com Architecture History
For the last 9 years our platform has been primarily a Classic ASP
environment on a Windows Server stack. In order to facilitate code deployment in that environment we wrote a script that replicated file changes out to the webfarm once they were FTPed to a central server. Unfortunately as the webfarm grew, so did the amount of time it took to push code, and in the event of an issue, the time it took to roll back code. In this environment it took about 15 minutes to replicate code out to all of our servers.
From the chatter on the inter-webs and talks at Velocity and other conferences I'm sure a lot of people would think that is "fast enough". For us this was unacceptable, since changes that had to go out together were not guaranteed to replicate at the same time, and with upwards of 50 deployments a day we needed to see our changes more quickly.
Wayfair.com New Architecture
Luckily a large project came along that allowed us to change things substantially instead of just slapping a band aid on the problem.
As part of a major site layout redesign we decided to switch coding languages and web serving infrastructure. This was definitely not a small undertaking, but it was a huge opportunity for advancement in the architecture and infrastructure of the site.
After a long debate and lots of pro/con discussions we decided on PHP for the language of our next platform. This allowed us a lot more flexibility with the architecture we were going to use to host the new platform, and provided lots of other benefits(perhaps we can write a blog article about that later!). As we designed the new architecture one of the requirements was to have a sub 2 min SLA to deploy code to all servers and the same SLA for rolling back.
We researched and tested a large number of different methods and systems before we set out to write our own. We looked at everything from puppet to murder,rsync to capistrano, but each one was not a good fit for one reason or another.
Deployment System requirements
- Deploy code to N servers in less than 2 mins.
- Allow any developer to deploy to production with the push of a button.
- Only require a systems engineer's time if the deployment is out of SLA.
- Ability to handle multiple applications on different pools of servers.
- Extensible and able to leverage the skills of our existing engineers.
- Application servers don't need access to code repo.
In order to achieve the above requirements we decided on a client pull model to allow for each application and server type to handle the deployment of its own code. We use the deployment server mainly for serving up the code to be deployed and acting as a traffic control system. This allowed us to avoid the bottleneck of the deployment server having to loop through and deploy to each server.
In order to accomplish these goals and architecture we wrote the system in PHP so that anyone on the Engineering team could contribute to and enhance the system.
That's the high level history of what we had and what we replaced it with.
I'll dive deeper into the architecture of the new Wayfair deployment system in the next installment.