Predicting Item Names of Media Assets Uploaded in Sitecore

A while ago I wrote an F# script to rename image references in CSS to match the URLs of media items uploaded to Sitecore. This ended up being both more challenging and much simpler than I expected — more challenging due to how Sitecore handles item names of media assets uploaded in the content editor, specifically how it proposes valid item names, and much simpler because Sitecore provides utility classes for dealing with this scenario so you really don’t have to understand why media items are named what they are. Nevertheless, since I spent a fair amount of time trying to figure out how item names are proposed by Sitecore, I thought I would share my findings.

One of the things I love about Sitecore is that it gives developers control over almost every aspect of the system including how a content editor is allowed to name an item. There are four settings in the Web.config that affect how media items are named: ItemNameValidation, InvalidItemNameChars, Media.WhitespaceReplacement and Media.IncludeExtensionsInItemNames. Note, I left out MaxItemNameLength. MaxItemNameLength is only used when creating items with the UI or creating items programmatically using any of the Item.Add overloads. This means you can upload media assets with arbitrarily long filenames limited only by your operating system. The key settings in determing how media assets will be named though are ItemNameValidation and InvalidItemNameChars. Depending on what you have for these settings, media items may not end up being named what you might expect.

Here’s an example:

Using the default settings in the Web.config for ItemNameValidation, InvalidItemNameChars, Media.WhitespaceReplacement and Media.IncludeExtensionsInItemNames below:

<setting name="ItemNameValidation" value="^[\w\*\$][\w\s\-\$]*(\(\d{1,}\)){0,1}$" />
<setting name="InvalidItemNameChars" value="\/:?&quot;&lt;>|[]" />
<setting name="Media.WhitespaceReplacement" value=" " />
<setting name="Media.IncludeExtensionsInItemNames" value="false" />

Upload an image named ‘test_1.gif’. The item name of the media item will be ‘test_1’, which is consistent with the regular expression found in the ItemNameValidation setting. Next, upload an image named ‘te.st_1.gif’. The item name of this media item will be ‘test1’.

Why did both the period and the underscore get removed from the second item name? To answer this, we must first understand how item names for uploaded media items are determined. When you upload an image to the media library, Sitecore tries to generate a valid item name based on the filename of the asset uploaded. An item name is considered valid if it meets the following requirements:

  1. Passes the ItemNameValidation regular expression
  2. Contains none of the characters in the InvalidItemNameChars
  3. Doesn’t start or end with whitespace
  4. Doesn’t end with a period

If the filename of the asset uploaded without the file extension does not meet the above criteria, Sitecore loops through each character in the InvalidItemNameChars setting and removes that character from the item name. If the modified item name still doesn’t pass the ItemNameValidation regular expression test, it then goes through each character in the item name and removes every character that isn’t a letter, number or whitespace.

We can now better understand what happened in our example scenario above. The first image we uploaded had a file name of ‘test_1.gif’. Since the underscore is part of the ‘\w’ character set in the majority of regular expression engines including .NET’s, the filename matched the ItemNameValidation regular expression and there were no characters from the InvalidItemNameChars setting in the item name so the item name ended up being ‘test_1’. The second asset named ‘te.st_1.gif’ did not pass the ItemNameValidation regular expression so Sitecore loops through each character in the InvalidItemNameChars and tries to remove any invalid items. The item name remains unchanged, however, since a period is not in the InvalidItemNameChars so it once again fails the InvalidItemName regular expression which means every character that is not a letter, number or whitespace (as determined by .NET’s System.Char.IsWhiteSpace, which does not include the underscore) is removed from the item name leaving ‘test1’ as the item name.

Ideally, the InvalidItemNameChars setting would include every character that is not permitted somewhere in the item name, but even then it may not be sufficient depending on what characters a content editor chooses to use in his or her media assets. For instance, in order to support wild card items, the asterisk needs to be permitted as the first character of an item name, but should not appear after the first character. So if you upload an image named, ‘test_*1.jpg’ the resulting item name will be ‘test1’ and there is no way around this unless you choose to never use wildcard items and add ‘*’ to the InvalidItemNameChars setting.

Below are some general guidelines regarding how item names will be determined for uploaded media assets:

  1. The item name of an uploaded media asset will never contain characters that are listed in the InvalidItemNameChars setting.
  2. If the filename of a media item does not pass the ItemNameValidation setting and removing all of the characters found in the InvalidItemNameChars setting from the item name does remove the character that violates the pattern, then the item name will only contain letters, numbers and whitespace not including underscores.
  3. If some combination or letters, numbers or whitespace is not allowed by the ItemNameValidation regular expression and those characters are also not specifically listed in the InvalidItemNameChars setting, then the item name could potentially still contain letters, numbers and/or whitespace. Therefore, in this case you cannot rely on the ItemNameValidation regular expression alone.
  4. All whitespace characters contained in the media item name will be replaced with what is in the Web.config setting for Media.WhitespaceReplacement.

As I mentioned at the beginning of this post, even though the logic for determining what Sitecore names an item is somewhat complicated, Sitecore makes it easy for you to leverage the work they’ve done to ensure proper item names as pointed out in the comments here. So above all else, if you are trying to ensure that code you write is using the same logic as Sitecore in deciding media item names, use the Sitecore.Resources.Media.MediaPathManager.ProposeValidMediaPath or Sitecore.Data.Items.ItemUtil.ProposeValidItemName method depending on if you are working with an entire path or just an item name.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: