Configuring Solr to provide search suggestions

I needed to provide search term suggestions based on characters that the user has typed into the search box. Doing this is pretty easy with Solr, an open source enterprise search platform, powered by Java, Apache and Lucene.

If you’re using a version prior to 4.8, this can be accomplished using the SpellCheckComponent. See this document for details.

As of 4.8 a new component is available, the solr.SuggestComponent. This post will go through the steps to configure an index to provide search suggestions using this component. In my case I created a separate index to handle this, it could be combined into an existing index such as sitecore_web_index (or any other custom indexes you may be using), depending on what your needs are.

Define the schema for the index:

In order to create smaller documents I trimmed the fields down to the bare minimums. This is done in schema.xml.

<fields>
    <field name="_content" type="text_general" indexed="true" stored="false" />
    <field name="_database" type="string" indexed="true" stored="true" />
    <field name="_uniqueid" type="string" indexed="true" stored="true" required="true" />
    <field name="_name" type="text_general" indexed="true" stored="true" />
    <field name="_indexname" type="string" indexed="true" stored="true" />
    <field name="_version" type="string" indexed="true" stored="true" />
    <field name="_version_" type="long" indexed="true" stored="true" />
</fields>

Then I added two fields that will be used by the suggester. One to store the suggestion text and another to store the weight of that suggestion. The suggestion field should be a text type and the weight field should be a float type. Both need to be stored in the index. In this case these fields get their values form corresponding fields in our sitecore instance. These fields can be added to documents based on your specific indexing strategy.

<field name="term" type="text_general" indexed="true" stored="true" />
<field name="weight" type="float" indexed="true" stored="true" />

Define a custom field type for the suggest component:

Next we need to add a new type that the suggester will use to analyze and build the suggestion fields. This particular type will remove all non alphanumeric characters and be case-insensitive as well as tokenizing the contents of the field. This is not strictly necessary, existing types may be used. Again, this is done in schema.xml.

<types>
...
<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>
...
</types>

Define the suggest component for the index:

Now that we have the schema set up, we need to define a searchComponent that will do the suggesting. This is done in solrconfig.xml.

Add the following to the <config> node:

<searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
        <str name="name">fuzzySuggester</str>
        <str name="lookupImpl">FuzzyLookupFactory</str>
        <str name="storeDir">fuzzy_suggestions</str>
        <str name="dictionaryImpl">DocumentDictionaryFactory</str>
        <str name="field">term</str>
        <str name="weightField">weight</str>
        <str name="suggestAnalyzerFieldType">suggestType</str>
        <str name="buildOnStartup">false</str>
        <str name="buildOnCommit">false</str>
    </lst>
    <lst name="suggester">
        <str name="name">infixSuggester</str>
        <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
        <str name="indexPath">infix_suggestions</str>
        <str name="dictionaryImpl">DocumentDictionaryFactory</str>
        <str name="field">term</str>
        <str name="weightField">weight</str>
        <str name="suggestAnalyzerFieldType">suggestType</str>
        <str name="buildOnStartup">false</str>
        <str name="buildOnCommit">false</str>
    </lst>
</searchComponent>

lookupImpl

In this case we’re setting up a suggest component that has two suggester data sources available to it.

  • The first uses the FuzzyLookupFactory: a FST-based sugester (Finite State Transducer) which will match terms starting with the provided characters while accounting for potential misspellings. This lookup implementation will not find terms where the provided characters are in the middle.
  • The second uses the AnalyzingInfixLookupFactory: which will look inside the terms for matches. Also the results will have <b> highlights around the provided terms inside the suggestions.

Using a combination of methods, we can get more complete results. Additional suggester implementations are available:

  • WFSTLookup: offers more fine-grained control over results ranking than FST
  • TSTLookup: “a simple, compact trie-based lookup”. Whatever that means.
  • JaspellLookup: see the Jaspell source.

See the Suggester Documentation for more details on the different types of Lookup Implementations. They each have properties unique to their implementation.

storeDir and indexPath

These parameters define the directory where the suggester structure will be stored after it’s built. This parameter should be set so the data is available on disc without rebuilding.

field

The field to get the suggestions from. This could be a computed or a copy field.

weightField

As of Solr 5.1 this field is optional. In previous versions this field is required. If no proper weight value is available, a workaround is to define a float field in your schema and use that. Even if this field is never added to a document the code will compensate.

threshold (not used in this example)

A percentage of the documents a term must appear in. This can be useful for reducing the number of garbage returns due to misspellings if you haven’t scrubbed the input.

suggestAnalyzerFieldType

This parameter is set to the fieldType that will process the information in the defined ‘field’. I suggest starting simple and adding complexity as the need arises.

  • This fieldType is completely independent from the analysis chain applied to the field you specify for your suggester. It’s perfectly reasonable to have the two fieldTypes be much different.
  • The “string” fieldType should probably not be used. If a “string” type is appropriate for the use case, the TermsComponent will probably serve as well and it is much simpler.

buildOnStartup and buildOnCommit

Building the suggester data involves re-reading, decompressing and and adding the field from every document to the suggester. These two settings should both generally be set to “false”. On Startup happens every time Solr is started. On Commit happens every time a document is committed. In the case of a smaller list of potential suggestions, the latter is acceptable.

Define a requestHandler for the Suggest Component

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
    <lst name="defaults">
        <str name="suggest">true</str>
        <str name="suggest.dictionary">infixSuggester</str>
        <str name="suggest.dictionary">fuzzySuggester</str>
        <str name="suggest.onlyMorePopular">true</str>
        <str name="suggest.count">10</str>
        <str name="suggest.collate">true</str>
    </lst>
    <arr name="components">
        <str>suggest</str>
    </arr>
</requestHandler>

The “name” of the requestHandler defines the url that will be used to request suggestions. In this case it will be http://”localhost”:8983/solr/index_name/suggest. Your port number may be different.

The requestHandler definition contains two parts:

defaults

These are settings that you would like to apply to each request. They may be provided in the querystring if different values are necessary.

Multiple “suggest.dictionary” values may be used. Each one will have it’s own section of results. The values are the names of the suggesters that were defined in the Suggest Component.

components

The name of the Suggest Component is set here. This connects the handler to the component.

See the documentation for more details on configuring search components and request handlers.

Actually getting suggestions

Once all of this is set up, using it is very simple. Assuming a solr index url like this:
http://localhost:8983/solr/index_name

  • Build the suggester:
    Issue http://localhost:8983/solr/index_name/suggest?suggest.build=true.

    • Until you do this step, no suggestions are returned.
    • The two build settings (buildOnStartup and buildOnCommit) can be used to avoid this, but consider the size of your index and the time and cpu that will be required to build the suggest index automatically.
  • Ask for suggestions:
    Issue http://localhost:8983/solr/index_name/suggest?suggest.q=whatever

    • Additional parameters can be included, such as the count, the desired format (json or xml) or a specific suggest.dictionary.
    • Use “wt” and “indent” parameters to format your results into json or xml and apply indenting. e.g.: &wt=json&indent=true
    • The response will contain a “suggest” field. This field will contain fields for each of the suggest.dictionaries that was used. Each of these dictionary fields will have a “numFound” field as well as a “suggestions” field containing an array of the found suggestions and their weights.

Response Format:

{
  suggest: {
    suggester_name: {
       suggest_query: { numFound:  .., suggestions: [ {term: .., weight: .., payload: ..}, .. ]} 
   }
}

I hope you find this information useful. See the Suggester documentation for more details.

Thanks for reading!

Advertisements

12 Responses to “Configuring Solr to provide search suggestions”

  1. Get response from custom Solr requestHandler with .NET | Horizontal Integration Says:

    […] This is a follow-up to my previous post: Configuring Solr to provide search suggestions […]

  2. Singh Says:

    After Implementing on the Solr 5.3 exactly the same steps I am getting “Store Lookup build failed” and not able to retrieve any term.

    • Nate Hase Says:

      Try removing the StoreDir from the Fuzzy Suggester. This will build the fuzzy suggestions into the default index.
      I’m looking into the cause of this, but so far it appears to be a bug.

  3. Misugi Says:

    Hi,
    When the search condition (suggest.q) is blank, Do we have any ways to retrieve all suggestions for default?
    Currently It’s not able to retrieve any term with search condition is blank.

    Regards,

    • Nate Hase Says:

      Suggestions are based on user input. eg: Start typing in a search box and get a list of options based on what has been typed. I’m not sure why suggestions would be needed if we don’t have any input from the user.

      However, if the default solrconfig.xml was left alone and the suggestion component and handler were added to it the /select and /query handlers will still be available. You can use these with a wildcard to get all results from the suggest index.
      eg: /suggest_index/query?q=*:*

      The default row count is 10, so if you have more than 10 suggestions you’ll need to ask for more rows in the query with &rows=[number]. An arbitrarily large number can be used, or you can do one query and read the response.numFound value and do a second query using this value to get all of the rows in the index.

  4. jimbotsov Says:

    Is it possible to change the order in which multiple dictionaries are returned? Let’s say I want suggester_2 to be positioned before suggester_1. I ask this because I need to dynamically iterate through all the suggesters to actually retrieve and combine the terms. Unfortunately I cannot hardcode the order in an array like [suggester_2, suggester_1, …]. In any case, thank you for the support.
    {
    suggest: {
    suggester_1: {
    suggest_query: { …}
    },
    suggester_2: {
    suggest_query: {…}
    },
    }

  5. Yogesh Says:

    Hi,

    Is there any way to configure fields which we want to see in the response ?

    For ex : rather then below response
    {
    suggest: {
    suggester_name: {
    suggest_query: { numFound: .., suggestions: [ {term: .., weight: .., payload: ..}, .. ]}
    }
    }

    i would like to see

    {
    suggest: {
    suggester_name: {
    suggest_query: { numFound: .., suggestions: [ {name: ..}, .. ]}
    }
    }

  6. Raj Malhotra Says:

    Hi, was browsing the net for solr auto suggestions and came across your blog. It’s interesting and helpful. I was looking for a bit more intelligent solution for suggestions and I am not sure if that’s possible with Solr. We have a ecommerce site where we would like to suggest to the users (as desired and mentioned in your blog). We have various categories of products (like clothing shoes jewellery etc) and associated attributes like colors, type of work, material etc. Now if a user starts typing for “gold”, then we should be able to suggest “gold jewellery” and not “gold shoes”. But if user types for “leather”, then we should be able to suggest “leather shoes”. Now if user types few more words like “black leather”, then suggestion would be “black leather shoes” whereas “black” can also be there in “clothing”. The permutations of attributes and categories can be at various level. I tried implementing this with Solr suggester but was not able to come up with a good schema or how suggested could be used. Is this possible using Solr?

    • Nate Hase Says:

      The terms that are suggested come from a specific field in the index that is used to get the suggestions, so all terms in that field are available as suggestions, based on the input.

      The WeightField:
      This can be used to order the results, placing higher valued results closer to the top of the list. A custom analyzer can be created to provide the appropriate weighting.

      Context Field (category/type filter):
      Some Lookup Factories have a “contextField” property. This can be used to filter the results by that field. e.g.:
      https://cwiki.apache.org/confluence/display/solr/Suggester
      See the “Context Filtering” section. The example there sets the contextField to “cat” which will allow the suggest query to take a “suggest.cfq” parameter set to a specific category name and filter the suggestions accordingly.
      Note that this is currently only supported by the AnalyzingInfixLookupFactory and the BlendedInfixLookupFactory

      Multiple Dictionaries:
      You can define multiple “suggester” dictionaries (my example has the fuzzySuggester and the infixSuggester) for a suggest component and choose which ones to use for each query by using one or more suggest.dictionary parameters.

      Custom Lookup Implementation:
      This can be done to get the business rules into Lucene (and Solr).

      Business rules can be also implemented on the consumption side by getting the suggestions and processing them before returning them for use.

      I think that a combination of Weight, Multiple Dictionaries and server-side processing is the best way to go. I hope that helps!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: