Sitecore ContentSearch Fails for Lucene Reserved Keywords like and/or

Problem:

We are using item buckets for a client and using custom URL item resolver that implements HttpRequestProcessor to find items in the item bucket.

Here is the code for that:


public override void Process(HttpRequestArgs args)
{
if (Context.Item == null)
{
var requestUrl = args.Url.ItemPath;
// remove last element from path and see if resulting path is a bucket
var index = requestUrl.LastIndexOf('/');
if (index > 0)
{
var bucketPath = requestUrl.Substring(0, index);
var bucketItem = args.GetItem(bucketPath);
if (bucketItem != null && BucketManager.IsBucket(bucketItem))
{
var itemName = requestUrl.Substring(index + 1).Replace("-", " ").Replace("%20", " ");
using (var searchContext = ContentSearchManager.GetIndex(bucketItem as IIndexable).CreateSearchContext())
{
var result = searchContext.GetQueryable().Where(x => x.Name == itemName).FirstOrDefault();
if (result != null)
{
Context.Item = result.GetItem();
}
}
}
}
}
}

The problem was this line:

var result = searchContext.GetQueryable().Where(x => x.Name == itemName).FirstOrDefault();

It works for items that do not have any keywords defined in Lucene.Net.Analysis.StopAnalyzer, but fails for items that do. After checking the Lucene index with Luke we found that the item name was being stored correctly in the “_name” field. So there was a problem with the way Sitecore.ContentSearch was searching for the Lucene entry.

Fix:

In, App_Config/Includes/Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config, we added the entry:

<fieldMap>

<fieldNames>

<fieldType fieldName=”_name” storageType=”YES” indexType=”TOKENIZED” vectorType=”NO” boost=”1f” type=”System.String” settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider”>
<Analyzer type=”Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider” />
</fieldType>

</fieldNames>

</fieldMap>

Now go to Sitecore, Control Panel, Indexing, Indexing Manager and rebuild the sitecore_master_index and sitecore_web_index.

How it works:

The defaultAnalyzer is Lucene.Net.Analysis.Standard.StandardAnalyzer in Lucene.Net DLL. This adds a StopFilter, LowerCaseFilter, and StandardFilter. The StopFilter restricts common English keywords from the search. When these words get removed, the search that performed isn’t the same. For instance, if I have an Item named, “Sitecore CMS and DMS”, after it processed by the StandardAnalyzer, it will try to compare “Sitecore CMS and DMS” to “Sitecore CMS DMS” which will return no results. But after applying the fix it now works.

If you would like to fix this without changing the Standard Analyzer, please see this blog post by Sheetal Jain

Thanks Sheetal Jain and Alistair Deneys for your help on this one!

Advertisements

4 Responses to “Sitecore ContentSearch Fails for Lucene Reserved Keywords like and/or”

  1. Paul Says:

    It seems to me that the bug is in Lucene, not in Sitecore.
    https://issues.apache.org/jira/browse/LUCENE-2202

  2. Item Buckets and URLs | Coffee => Coder => Code Says:

    […] Please also read https://blog.horizontalintegration.com/2013/07/30/sitecore-contentsearch-fails-for-lucene-reserved-ke&#8230; to solve issues if you have common stopwords in your item names such as “and” and […]

  3. Sitecore Standard Analyzer : Managing you own stop words filter | Horizontal Integration Says:

    […] Colleague Brent Svac also blogged  an other way to solve the stop words […]

  4. Unique Item Name per Bucket | Coffee => Coder => Code Says:

    […] As we’re using Content Search to locate the items, if you’re using Lucene as your search provider (the default) you’ll need to consider if the default analyzer is adequate. The default analyzer (named the standard analyzer) removes common words from English phrases such as “and” and “or”. You can read more about that at https://blog.horizontalintegration.com/2013/07/30/sitecore-contentsearch-fails-for-lucene-reserved-ke&#8230;. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: