Monday, April 30, 2012

Almost Excluding Specific Search Results in SharePoint 2010

Sometimes you want to hide certain content from being exposed through search in certain SharePoint web-applications, even if the user really has access to the information in the actual content source. A scenario is intranet search that is openly used, but in which you want to prevent accidental information exposure. Think of a group working together on reqruiting, where the HR manager use the search center looking for information - you wouldn't want even excerpts of confidential information to be exposed in the search results.

So you carefully plan your content sources and crawl rules to only index the least possible amount of information. Still, even with crawl rules you will often need to tweak the query scope rules to exclude content at a more fine-grained level, or even add new scopes for providing search-driven content to users. Such configuration typically involves using exclude rules on content types or content sources. This is a story of how SharePoint can throw you a search results curveball, leading to accidental information disclosure.

In this scenario, I had created a new content source JobVault for crawling the HR site-collection in another SharePoint web-application, to be exposed only through a custom shared scope. So I tweaked the rules of the existing scopes such as "All Sites" to exclude the Puzzlepart JobVault content source, and added a new JobReqruiting scope that required the JobVault content source and included the content type JobHired and excluded the content type JobFired.

So no shared scopes defined in the Search Service Application (SSA) included JobFired information, as all scopes either excluded the HR content source or excluded the confidential content type. To my surprise our SharePoint search center would find and expose such pages and documents when searching for "you're fired!!!".

Knowing that the search center by default uses the "All Sites" scope when no specific scope is configured or defined in the keyword query, it was back to the SSA to verify the scope. It was all in order, and doing a property search on Scope:"All Sites" got me the expected results with no confidential data in it. The same result for Scope:"JobReqruiting", no information exposure there either. It looked very much like a best bet, but there where no best bet keywords defined for the site-collection.


The search center culprit was the Top Federated Results web-part in our basic search site, by default showing results from the local search index very much like best bets. That was the same location as defined in the core results web-part, so why this difference?

Looking into the details of the "Local Search Results" federated location, the reason became clear: "This location provides unscoped results from the Local Search index". The keyword here is "unscoped".


The solution is to add the "All Sites" scope to the federated location to ensure that results that you want to hide are also excluded from the federated results web-part. Add it to the "Query Template" and optionally also to the "More Results Link Template" under the "Location Information" section in "Edit Federated Location".


Now the content is hidden when searching. Not through query security trimming, but through query filtering. Forgetting to add the filter somewhere can expose the information, but then only to users that have permission to see the content anyway. The results are still security trimmed, so this no actual information disclosure risk.

Note that this approach is no replacement for real information security; if that is what you need, don't crawl confidential information from an SSA that is exposed through openly available SharePoint search, even on your intranet.

Saturday, April 21, 2012

Migrate SharePoint 2010 Term Sets between MMS Term Stores

When using the SharePoint 2010 managed metadata fields connected to termsets stored in the Managed Metadata Service (MMS) term store in your solutions, you should have a designated master MMS that is reused across all your SharePoint environment such as the development, test, staging and production farms. Having a single master termstore across all farms gives you the same termsets and terms with the same identifiers all over, allowing you to move content and content types from staging to production without invalidating all the fields and data connected to the MMS term store.

You'll find a lot of termset tools on CodePlex, some that use the standard SharePoint 2010 CSV import file format (which is without identifiers), and some that on paper does what you need, but don't fully work. Some of the better tools are SolidQ Managed Metadata Exporter for export and import of termset (CSV-style), SharePoint Term Store Powershell Utilities for fixing orphaned terms, and finally SharePoint Taxonomy and TermStore Utilities for real migration.

There are, however, standard SP2010 PowerShell cmdlets that allow you to migrate the complete termstore with full fidelity between Managed Metadata Service applications across farms. The drawback is that you can't do selective migration of specific termsets, the whole term store will be overwritten by the migration.

This script exports the term store to a backup file:

# MMS Application Proxy ID has to be passed for -Identity parameter

Export-SPMetadataWebServicePartitionData -Identity "12810c05-1f06-4e35-a6c3-01fc485956a3" -ServiceProxy "Managed Metadata Service" -Path "\\Puzzlepart\termstore\pzl-staging.bak"

This script imports the backup by overwriting the term store:

# MMS Application Proxy ID has to be passed for -Identity parameter
# NOTE: overwrites all existing termsets from MMS
# NOTE: overwrites the MMS content type HUB URL - must be reconfigured on target MMS proxy after restoring

Import-SPMetadataWebServicePartitionData -Identity "53150c05-1f06-4e35-a6c3-01fc485956a3" -ServiceProxy "Managed Metadata Service" -path "\\Puzzlepart\termstore\pzl-staging.bak" -OverwriteExisting

Getting the MMS application proxy ID and the ServiceProxy object:

$metadataApp= Get-SpServiceApplication | ? {$_.TypeName -eq "Managed Metadata Service"}
$mmsAppId = $metadataApp.Id
$mmsproxy = Get-SPServiceApplicationProxy | ?{$_.TypeName -eq "Managed Metadata Service Connection"}

Tajeshwar Singh has posted several posts on using these scripts, including how to solve typical issues:
In addition to such issues, I've run into this issue:

The Managed Metadata Service or Connection is currently not available. The Application Pool or Managed Metadata Web Service may not have been started. Please Contact your Administrator. 

The cause of this error was neither the app-pool nor the 'service on server' not being started, but the service account used in the production farm not being available in the staging farm. Look through the user accounts listed in the ECMPermission table in the MMS database, and correct the "wrong" accounts. Note that updating the MMS database directly might not be supported.

Note that after the term store migration, the MMS content type HUB URL configuration will also have been overwritten. You may not notice for some time, but the content type HUB publishing and subscriber timer jobs will stop working. What you will notice, is that if you try to click republish on a content type in the HUB, you'll get an "No valid proxy can be found to do this operation" error. See How to change the Content Type Hub URL by Michal Pisarek for the steps to rectify this.

Set-SPMetadataServiceApplication -Identity "Managed Metadata Service" -HubURI "http://puzzlepart:8181/"

After resetting this MMS configuration, you should verify that the content type publishing works correctly by republishing and running the timer jobs. Use "Site Collection Administration > Content Type Publishing" as shown on page 2 in Chris Geier's article to verify that the correct HUB is set and that HUB content types are pushed to the subscribers.

Monday, April 16, 2012

Getting Elevated Search Results in SharePoint 2010

I often use the SharePoint 2010 search CoreResultsWebPart in combination with scopes, content types and managed properties defined in the Search Service Application (SSA) for having dynamic search-driven content in pages. Sometimes the users might need to see some excerpt of content that they really do not have access to, and that you don't want to grant them access to either; e.g. to show a summary to anonymous visitors on your public web-site from selected content that is really stored in the extranet web-application in the SharePoint farm.

What is needed then is to execute the search query with elevated privileges using a custom core results web-part. As my colleague Mikael Svenson shows in Doing blended search results in SharePoint–Part 2: The Custom CoreResultsWebPart Way, it is quite easy to get at the search results code and use the SharedQueryManager object that actually runs the query. Create a web-part that inherits the ootb web-part and override the GetXPathNavigator method like this:

namespace Puzzlepart.SharePoint.Presentation
{
    [ToolboxItemAttribute(false)]
    public class JobPostingCoreResultsWebPart : CoreResultsWebPart
    {
        protected override void CreateChildControls()
        {
            base.CreateChildControls();
        }
 
        protected override XPathNavigator GetXPathNavigator(string viewPath)
        {
            XmlDocument xmlDocument = null;
            QueryManager queryManager = 
              SharedQueryManager.GetInstance(Page, QueryNumber)
                .QueryManager;
            SPSecurity.RunWithElevatedPrivileges(delegate()
            {
                xmlDocument = queryManager.GetResults(queryManager[0]);
            });
            XPathNavigator xPathNavigator = xmlDocument.CreateNavigator();
            return xPathNavigator;
        }
    }
}

Running the query with elevated privileges means that it can return any content that the app-pool identity has access to. Thus, it is important that you grant that account read permissions only on content that you would want just any user to see. Remember that the security trimming is done at query time, not at crawl time, with standard SP2010 server search. It is the credentials passed to the location's SSA proxy that is used for the security trimming. Use WindowsIdentity.GetCurrent() from the System.Security.Principal namespace if you need to get at the app-pool account from your code.

You would want to add a scope and/or some fixed keywords to the query in the code before getting the results, in order to prevent malicious or accidental misuse of the elevated web-part to search for just anything in the crawled content of the associated SSA that the app-pool identity has access to. Another alternative is to run the query under another identity than the app-pool account by using real Windows impersonation in combination with the Secure Store Service (see this post for all the needed code) as this allows for using a specific content query account.

The nice thing about using the built-in query manager this way, rather than running your own KeywordQuery and providing your own result XML local to the custom web-part instance, is that the shared QueryManager's Location object will get its Result XML document populated. This is important for the correct behavior for the other search web-parts on the page using the same QueryNumber / UserQuery, such as the paging and refiners web-parts.

The result XmlDocument will also be in the correct format with lower case column names, correct hit highlighting data, correct date formatting, duplicate trimming, getting <path> to be <url> and <urlEncoded>, have the correct additional managed and crawled properties in the result such as <FileExtension> and <ows_MetadataFacetInfo>, etc, in addition to having the row <id> element and <imageUrl> added to each result. If you override by using a replacement KeywordQuery you must also implement code to apply appended query, fixed query, scope, result properties, sorting and paging yourself to gain full fidelity for your custom query web-part configuration.

If you don't get the expected elevated result set in your farm (I've only tested this on STS claims based web-apps; also see ForceClaimACLs for the SSA by my colleague Ole Kristian Mørch-Storstein), then the sure thing is to create a new QueryManager instance within the RWEP block as shown in How to: Use the QueryManager class to query SharePoint 2010 Enterprise Search by Corey Roth. This will give you correctly formatted XML results, but note that the search web-parts might set the $ShowMessage xsl:param to true, tricking the XSLT rendering into show the "no results" message and advice texts. Just change the XSLT to call either dvt_1.body or dvt_1.empty templates based on the TotalResults count in the XML rather than the parameter. Use the <xmp> trick to validate that there are results in the XML that all the search web-parts consumes, including core results and refinement panel.

The formatting and layout of the search results is as usual controlled by overriding the result XSLT. This includes the data such as any links in the results, as you don't want the users to click on links that just will give them access denied errors.

When using the search box web-part, use the contextual scope option for the scopes dropdown with care. The ContextualScopeUrl (u=) parameter will default to the current web-application, causing an empty result set when using the custom core results web-part against a content source from another SharePoint web-application.