Adjust dependencies in Readiness probe

It is important to understand the Liveness and Readiness probes when you run Sitecore on Kubernetes. Their documentation provides more detail around these 2 probes:

  • Liveness: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy.
  • Readiness: Indicates whether the container is ready to respond to requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod.

In Sitecore these probes can be found under /healthz/live and /healthz/ready. The rest of this post will focus on the Readiness probe. There is a great article by Vitalii Tylyk which discusses these probes at a high level. By default, a Sitecore XP install checks a variety of xDB services as well as Solr as part of this probe. If these services are not all healthy then the probe will fail and no requests will be send to this pod.

It is important to understand this default behavior in the context of your Sitecore solution. If having xDB and Solr up are critical to a solution then this default behavior does not need to be changed. The risk with this setup is that all pods can be pulled from the load balancer if there is an issue with Solr or xDB and the site will be completely down. If these services are not critical to the Sitecore site then they can be removed from the readiness probe.

Below patch file shows how to remove all these checks so the readiness probe still returns healthy even if Solr and xDB are completely down. In many real world scenarios only a subset of these will have to be removed, for example removing xDB services but leaving Solr as the solution has a critical dependency on it.

<?xml version="1.0"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <services>
      <configurator type="Sitecore.ContentSearch.SolrProvider.DependencyInjection.ContentSearchServicesConfigurator, Sitecore.ContentSearch.SolrProvider">
        <patch:delete />
      </configurator>
      <configurator type="Sitecore.XConnect.Client.Configuration.HealthCheckServicesConfigurators.XConnectCollectionHealthCheckServicesConfigurator, Sitecore.XConnect.Client.Configuration">
        <patch:delete />
      </configurator>
      <configurator type="Sitecore.XConnect.Client.Configuration.HealthCheckServicesConfigurators.XConnectConfigurationHealthCheckServicesConfigurator, Sitecore.XConnect.Client.Configuration">
        <patch:delete />
      </configurator>
      <configurator type="Sitecore.XConnect.Client.Configuration.HealthCheckServicesConfigurators.XConnectSearchHealthCheckServicesConfigurator, Sitecore.XConnect.Client.Configuration">
        <patch:delete />
      </configurator>
      <configurator type="Sitecore.Xdb.Common.Web.Xmgmt.XdbEnabledHealthCheckServicesConfigurator`2[[Sitecore.Reporting.Service.Http.XConnectClient.XdbReportingWebClient, Sitecore.Reporting.Service.Http.XConnectClient],[Sitecore.Reporting.Service.Http.Abstractions.Routes, Sitecore.Reporting.Service.Http.Abstractions]], Sitecore.Xdb.Common.Web.Xmgmt">
        <patch:delete />
      </configurator>      
      <configurator type="Sitecore.Xdb.Common.Web.Xmgmt.XdbEnabledHealthCheckServicesConfigurator`2[[Sitecore.Xdb.ReferenceData.Client.ReferenceDataHttpClient, Sitecore.Xdb.ReferenceData.Client],[Sitecore.Xdb.ReferenceData.Client.Routes, Sitecore.Xdb.ReferenceData.Client]], Sitecore.Xdb.Common.Web.Xmgmt">
        <patch:delete />
      </configurator>
      <configurator type="Sitecore.Xdb.Common.Web.Xmgmt.XdbEnabledHealthCheckServicesConfigurator`2[[Sitecore.Xdb.ReferenceData.Client.ReadOnlyReferenceDataHttpClient, Sitecore.Xdb.ReferenceData.Client],[Sitecore.Xdb.ReferenceData.Client.Routes, Sitecore.Xdb.ReferenceData.Client]], Sitecore.Xdb.Common.Web.Xmgmt">
        <patch:delete />
      </configurator>
      <configurator type="Sitecore.Xdb.Common.Web.Xmgmt.XdbEnabledHealthCheckServicesConfigurator`2[[Sitecore.Xdb.MarketingAutomation.ReportingClient.AutomationReportingClient, Sitecore.Xdb.MarketingAutomation.ReportingClient],[Sitecore.Xdb.MarketingAutomation.ReportingClient.ReportingRoutes, Sitecore.Xdb.MarketingAutomation.ReportingClient]], Sitecore.Xdb.Common.Web.Xmgmt">
        <patch:delete />
      </configurator>
      <configurator type="Sitecore.Xdb.Common.Web.Xmgmt.XdbEnabledHealthCheckServicesConfigurator`2[[Sitecore.Xdb.MarketingAutomation.OperationsClient.AutomationOperationsClient, Sitecore.Xdb.MarketingAutomation.OperationsClient],[Sitecore.Xdb.MarketingAutomation.OperationsClient.OperationRoutes, Sitecore.Xdb.MarketingAutomation.OperationsClient]], Sitecore.Xdb.Common.Web.Xmgmt">
        <patch:delete />
      </configurator>      
    </services>
  </sitecore>
</configuration>

Fix NullReferenceException in CompositeDataProvider

Sitecore 10 comes with a new Dataprovider which merges items from disk and the database. Jeremy Davis already wrote a great article about this. Sitecore is using this new approach by deploying all their items through an Item Resource file.

TDS now also supports creating item resource files, which makes it convenient to deploy your own items this way too. There are a few uncommon scenarios where this will result in a uncaught NullReferenceException, for example when there is an item without a version. Many areas of Sitecore and your custom solution will most likely be broken if this happens. The stack trace will look something like this:

FATAL Uncaught application error 
Exception: System.NullReferenceException 
Message: Object reference not set to an instance of an object. 
Source: Sitecore.Kernel 
   at Sitecore.Data.DataProviders.CompositeDataProvider.GetItemVersions(ItemDefinition itemDefinition, CallContext context) 
   at Sitecore.Data.DataProviders.DataProvider.GetItemVersions(ItemDefinition item, CallContext context, DataProviderCollection providers) 
   at Sitecore.Data.DataSource.LoadVersions(ItemDefinition definition, Language language) 
   at Sitecore.Data.DataSource.GetVersions(ItemInformation itemInformation, Language language) 
   at Sitecore.Data.DataSource.GetLatestVersion(ItemInformation itemInformation, Language language) 
   at Sitecore.Data.DataSource.GetItemData(ID itemID, Language language, Version version) 
   at Sitecore.Nexus.Data.DataCommands.GetItemCommand.GetItem(ID itemId, Language language, Version version, Database database) 

One way to solve this is by identifying all the items which cause the issue and fix each of them. Another way it to override the CompositeDataProvider and wrap a try/catch block around this logic. The advantage of this approach is that it addresses the root cause and the issue will not reoccur in the future. Following code can be used to catch the exception:

using Sitecore.Collections;
using Sitecore.Configuration;
using Sitecore.Data;
using Sitecore.Data.DataProviders;
using Sitecore.Data.DataProviders.ReadOnly;
using Sitecore.Diagnostics;
using System;
using System.Collections.Generic;

namespace Foundation.Providers
{
    public class SafeCompositeDataProvider : CompositeDataProvider
    {
        public SafeCompositeDataProvider(IEnumerable<ReadOnlyDataProvider> readOnlyDataProviders, DataProvider headProvider) : base(readOnlyDataProviders, headProvider) { }

        public SafeCompositeDataProvider(ObjectList readOnlyDataProviders, DataProvider headProvider) : base(readOnlyDataProviders, headProvider) { }

        public override VersionUriList GetItemVersions(ItemDefinition itemDefinition, CallContext context)
        {
            try
            {
                return base.GetItemVersions(itemDefinition, context);
            }
            catch (Exception ex)
            {
                Log.Error($"SafeCompositeDataProvider: Caught exception for item {itemDefinition.ID} {itemDefinition.Name} {ex}", this);
                return null;
            }
        }
    }
}

This new provider can be patched in through a configuration file like this:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/" xmlns:role="http://www.sitecore.net/xmlconfig/role/">
  <sitecore>
    <databases >
      <!-- target to CM or standalone, no master database in CD-->
      <database id="master" role:require="ContentManagement or StandAlone">
        <dataProviders >
          <dataProvider >
            <patch:attribute name="type">Foundation.Providers.SafeCompositeDataProvider, Foundation.Providers</patch:attribute>
          </dataProvider>
        </dataProviders>
      </database>
    </databases>
  </sitecore>
</configuration>