PHP Static Analysis with HHVM and Hussar

Wayfair Engineering places special emphasis on software testing as a means of maintaining stability in production. The DevTools team, which I am a member of, has built and integrated a number of tools into our development and deploy process in order to catch errors as early as possible, especially before they land in master. If you missed it, last week we released sp-phpunit, a script to manage running PHPUnit tests in parallel.

Today we're open-sourcing hussar, another tool we've been using as part of our deploy process, that performs PHP static analysis using HHVM. The name comes partly from the cavalry unit in Age of Empires II — a classic strategy game where a few of us on DevTools still fight to the end — but mainly from the fact that it's a good, open name that shares a few letters with the tool's main feature: HHVM static analysis.

Put simply, hussar builds and compares HHVM static analysis reports. It maintains a project workspace and shows errors introduced by applying patches or merging branches. With hussar, projects that cannot yet run on HHVM are able to realize the benefits of static analysis and catch potentially fatal errors prior to runtime. Here is a list of errors hussar can catch. The tool displays these errors in a formatted report. This means our engineers get the safety of strong typing and static code analysis in addition to the flexibility of development they're accustomed to.

We wrote hussar as a preparatory step toward possibly running Wayfair on HHVM. When we first tried to use HHVM, we discovered that it lacked support for a number of features and extensions used throughout our codebase. Recognizing both the performance and code quality benefits it could provide, we hacked together a script that would get at least the code analysis component working. Over the past few months this script has gone through several iterations as we worked on edge cases and ironed out false-positives to increase its accuracy and utility. The result is a tool that reliably reports legitimate errors.

We're using hussar as part of our deploy process in addition to our integration and unit tests. Since we've started using the tool it has made us aware of a number of errors that slipped through our other tests. This multi-faceted approach to testing has allowed us to be more confident in deployments while keeping productivity high.

Our engineers are also able to run hussar against their code in advance of the deployment process, so ideally any errors are caught even before code review. We're using a remotely triggered Jenkins job to coordinate running hussar builds on a dedicated testing cluster. HHVM is a bit heavy, so each machine has 6gb RAM, and reports are written to a shared filesystem to avoid repeating work. Run time is usually five minutes or less.

We also generate a full report nightly and are working through the backlog of existing errors. Each resolved error improves our codebase and brings us one step closer to the possibility of running our sites on the HHVM platform.

We think hussar solves the "backsliding" problem likely faced by all project maintainers with large PHP codebases when considering a migration onto HHVM. That is, it's usually impractical to address all the static analysis issues at once, since tech debt continues to accumulate as developers work through the backlog. This is solved by hussar's focus on preventing new errors, which allows momentum to build in the efforts to clean the rest of the codebase. For us, the proof of this is that the number of errors found by static analysis across our codebase has been steadily declining since we started using hussar.

For more details on how hussar works and information on how you can start using it to analyze your own code, head over to the project's GitHub page. We hope you find it useful, and welcome any contributions!