Better three-word searches with SOLR

We use the Apache SOLR search platform behind the scenes at Wayfair. Sometimes, when vanilla SOLR doesn't quite do what we want, we improve it for our purposes. When we suspect that others might have the same purposes, and we think that we have solved our problems in a generally useful way, we contribute our solutions back to the open source community, either on github, or through a more project-specific distribution channel. SOLR is an Apache project, so for SOLR, this means attaching a patch to a 'Jira'. This blog post is about SOLR Jira 1093.

We were having a problem with the quality of the searches that were executed when our customers typed three words into the search box on the site. The best vanilla SOLR treatments we could find for these searches were either too narrow or too broad: they gave us either a small number of high-quality results (sometimes no results at all), or a very large number of medium-quality results that matched two out of three of the terms. So we wanted to do the restrictive, high-quality search, and set a threshold for an acceptable minimum number of results. Then, if the number of results from the first search did not cross the threshold, we would run the broader search for backfill. Why not just have the front-end client code issue queries in parallel? A good question, with a simple answer: pagination. It is always better to let SOLR or Elastic Search handle pagination for you if you can. Otherwise your client code will have to work unnecessarily hard to interleave or sequence results.

We found that other people had been asking for this feature, and related features, in SOLR-1093, since 2009. The comments indicated a good amount of interest in the community. The original feature request was for a parallel executor of more than one query. Grant Ingersoll quickly commented that 'fallback queries' should be handled as well, and Lance Norskog added that serial execution would also be desirable. It turns out we only need the serial executor for fallback queries, so that's what Karthick from our search team implemented. It's been in production for us for a couple of months now, and it works very well. Karthick checked with the SOLR mailing list and then posted the patch. It's an extension to the SearchHandler class called MultiSesarchHandler.

Can others benefit from this patch? I think so, but perhaps not indefinitely. Will it be accepted into SOLR? In the end, I think it will not, because the core SOLR people are working on a more general solution in trunk, similar in design to the ScriptUpdateProcessor. This other thing is a completely different approach based on scripted queries. When that's in general use, we'll probably switch to it. But it's not in SOLR 3.6, which is the most up-to-date general release as of this writing, and it's not going to make it into SOLR 4.0. For now we have all our patches applied to the very safe 3.6 branch, and we're preparing to re-apply everything to 4.0 when it gets out of beta. It might be a while before the other approach is easily available. In the mean time, if you want to do this, get out your patch kits and enjoy!