Faceted Search with Solr
Faceted search has become a critical feature for enhancing findability and the user search experience for all types of search applications. In this article, Solr creator Yonik Seeley gives an introduction to faceted search with Solr.
By Yonik Seeley
Faceted search is the dynamic clustering of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field. Each facet displayed also shows the number of hits within the search that match that category. Users can then « drill down » by applying specific contstraints to the search results. Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search.
It’s easiest to understand what faceted search is through an example, appropriately from CNET Reviews, the first website to use Solr even before it had been contributed to Apache by CNET.
This example is actually faceted browsing because it started with all digital cameras and not a user search. Manufactuter is a facet of these search results, and the facet constraints or facet values for this facet include Canon USA, Sony, and Nikon. A previous screen also included Price and Digital camera type facets, but they are no longer displayed because we already selected the $400-$500 and SLR constraints from those facets. Due to those constraints, the resulting facet counts and camera listings reflect only digital SLR cameras between $400 and $500 in price. Any of the displayed facet constraints can be clicked to drill down and further narrow the search results by that constraint. Applied constraints can be removed by clicking on them within the breadcrumb trail.
Faceted search provides an effective way to allow users to refine search results, continually drilling down until the desired items are found. The benefits include
Superior feedback – users can see at a glance a summary of the search results and how those results break down by different criteria.
No surprises or dead ends – users know how many results match before they click. Values with zero counts are normally removed to reduce visual noise and eliminate the possibility of a user accidentally selecting a constraint that would lead to no results.
No selection hierarchy is imposed – users are generally free to add or remove constraints in any order.
It’s relatively simple to get faceting information from Solr, as there are few prerequisites. Solr offers the following types of faceting, all of which can be requested with no prior configuration:
Field faceting – retrieve the counts for all terms, or just the top terms in any given field. The field must be indexed.
Query faceting – return the number of documents in the current search results that also match the given query.
Date faceting – return the number of documents that fall within certain date ranges.
Faceting commands are added to any normal Solr query request, and the faceting counts come back in the same query response. If you are not familiar with the details of making requests in Solr, see the tutorial.
To implement the Manufacturer facet, I send a field faceting command to Solr. This example assumes a field named « manu » exists in the schema and that it has the manufacturer indexed as a single token. The « string » type in the Solr schema is an appropriate field type to meet these needs.
Let’s assume for a moment that the user typed « camera » into the search box. The Solr query to retrieve the top « camera » matches would be:
To retrieve facet counts for the « manu » field, we would simply add the following parameters to that query request:
Any number of facet commands can be added to a single query request. To facet on both the « manu » field and the « camera_type » field, we would add the following parameters:
The query response will now contain facet count information for the given fields in addition to the top matches for the query.
<lst name="facet_fields"> <lst name="manu"> <int name="Canon USA">17</int> <int name="Olympus">12</int> <int name="Sony">12</int> <int name="Panasonic">9</int> <int name="Nikon">4</int> </lst> <lst name="camera_type"> <int name="Compact">17</int> <int name="Ultracompact">11</int> <int name="SLR">17</int> <int name="Full body">9</int> </lst>
Facet counts returned are always in the context of the current query. For example, there may be 100 cameras manufacturerd by Canon in the search index, but only 17 that match the current search results. Now it’s up to the presentation layer to display this information to the user in the form of clickable links with counts displayed next to them.
If we request field faceting on the « price » field, we get back counts for individual prices. However, we want price ranges, not individual prices. One workaround is to index another field that contains the ranges that the prices fall into (for example 100_200, 200_300, 300_400) and use field faceting on that field. A more flexible solution is to utilize query facets that provide the ability to retrieve facet counts for arbitrary queries.
Let’s assume that we have an indexed « price » field and we want to get the facet counts for the following ranges of prices: $100 or less, $100-$200, $200-$300, $300-$400, $400-$500, and $500 or over. We simply add a facet.query command to our query request for each desired range:
&facet=true&facet.query=price:[* TO 100] &facet.query=price:[100 TO 200];&facet.query=[price:200 TO 300] &facet.query=price:[300 TO 400];&facet.query=[price:400 TO 500] &facet.query=price:[500 TO *]
In addition to the standard query results and any field faceting counts requested, the query response will also contain a facet count for each facet query that was specified.
<lst name="facet_queries"> <int name="price:[* TO 100]">28</int> <int name="price:[100 TO 200]">54</int> <int name="price:[200 TO 300]">98</int> <int name="price:[300 TO 400]">84</int> <int name="price:[400 TO 500]">73</int> <int name="price:[500 TO *]">56</int> </lst>
Now that we’ve learned how to retrieve facet counts, how do we allow the user to drill down and narrow the search results with any of those constraints? The answer is standard Solr filter queries, where search results are filtered by any number of arbitrary queries.
Let’s assume again that the user typed « camera » into the search box and we have queried solr for the top matching documents, and also requested some facet counts to be displayed to the user.
http://localhost:8983/solr/select?q=camera &facet=on&facet.field=manu&facet.field=camera_type &facet.query=price:[* TO 100] &facet.query=price:[100 TO 200];&facet.query=[price:200 TO 300] &facet.query=price:[300 TO 400];&facet.query=[price:400 TO 500] &facet.query=price:[500 TO *]
Now let’s assume that the user wants to drill down on the constraint $400-$500 from the Price facet to get a new set of results that include only cameras in that price range. For this we use the fq (filter query) parameter, which allows one to filter by a query. We’ll also send the relevant faceting commands again since we also want to update the facet counts.
http://localhost:8983/solr/select?q=camera &facet=on&facet.field=manu&facet.field=camera_type &fq=price:[400 to 500]
The fq command can appear anywhere in the query request – parameter order does not matter. Notice that we no longer request the facet counts for the prices since we are constraining results to one price range and thus already know that the counts for other ranges would be zero.
Now that we’ve received the Solr response and updated the facet display and results listing in our web frontend (including adding « $400-$500 » breadcrumb to allow the user to remove it if desired), let’s assume that the user clicks on « SLR » in the camera_type facet. To filter our search results further, we’ll simply add an additional fq parameter.
http://localhost:8983/solr/select?q=camera &facet=on&facet.field=manu &fq=price:[400 to 500] &fq=camera_type:SLR
When the Solr response to this request is received by the web front-end, we can add an « SLR » breadcrumb, and once again update our facet counts and top search results.