Sitecore ContentSearch Fails for Lucene Reserved Keywords like and/or

Problem:

We are using item buckets for a client and using custom URL item resolver that implements HttpRequestProcessor to find items in the item bucket.

Here is the code for that:


public override void Process(HttpRequestArgs args)
{
if (Context.Item == null)
{
var requestUrl = args.Url.ItemPath;
// remove last element from path and see if resulting path is a bucket
var index = requestUrl.LastIndexOf('/');
if (index > 0)
{
var bucketPath = requestUrl.Substring(0, index);
var bucketItem = args.GetItem(bucketPath);
if (bucketItem != null && BucketManager.IsBucket(bucketItem))
{
var itemName = requestUrl.Substring(index + 1).Replace("-", " ").Replace("%20", " ");
using (var searchContext = ContentSearchManager.GetIndex(bucketItem as IIndexable).CreateSearchContext())
{
var result = searchContext.GetQueryable().Where(x => x.Name == itemName).FirstOrDefault();
if (result != null)
{
Context.Item = result.GetItem();
}
}
}
}
}
}

The problem was this line:

var result = searchContext.GetQueryable().Where(x => x.Name == itemName).FirstOrDefault();

It works for items that do not have any keywords defined in Lucene.Net.Analysis.StopAnalyzer, but fails for items that do. After checking the Lucene index with Luke we found that the item name was being stored correctly in the “_name” field. So there was a problem with the way Sitecore.ContentSearch was searching for the Lucene entry.

Fix:

In, App_Config/Includes/Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config, we added the entry:

<fieldMap>

<fieldNames>

<fieldType fieldName=”_name” storageType=”YES” indexType=”TOKENIZED” vectorType=”NO” boost=”1f” type=”System.String” settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider”>
<Analyzer type=”Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider” />
</fieldType>

</fieldNames>

</fieldMap>

Now go to Sitecore, Control Panel, Indexing, Indexing Manager and rebuild the sitecore_master_index and sitecore_web_index.

How it works:

The defaultAnalyzer is Lucene.Net.Analysis.Standard.StandardAnalyzer in Lucene.Net DLL. This adds a StopFilter, LowerCaseFilter, and StandardFilter. The StopFilter restricts common English keywords from the search. When these words get removed, the search that performed isn’t the same. For instance, if I have an Item named, “Sitecore CMS and DMS”, after it processed by the StandardAnalyzer, it will try to compare “Sitecore CMS and DMS” to “Sitecore CMS DMS” which will return no results. But after applying the fix it now works.

If you would like to fix this without changing the Standard Analyzer, please see this blog post by Sheetal Jain

Thanks Sheetal Jain and Alistair Deneys for your help on this one!