Check Redis connectivity inside heartbeat.aspx health check

Getting your health check right is important when designing a highly available and elastic Sitecore solution. For years Sitecore comes with an builtin health check page at /sitecore/service/heartbeat.aspx which checks the status of the the SQL databases. This can be used for a load balancer or docker health check. A few quick notes on the heartbeat.aspx:

  • In some versions of Sitecore the heartbeat.aspx will throw an error, and you will have to exclude some connection strings from it as described in a different article on this blog
  • Starting in Sitecore 9.3 a new health check mechanism is used based on the Microsoft.Extensions.Diagnostics.HealthChecks namespace. Here is a great article describing how to customize this. The same code from below can be used in the updated health check mechanism.

There are several different approaches when setting up a health check in Sitecore. In most cases I recommend keeping the health check small to prevent it from going unhealthy during heavy load, this technique can be combined with the Application Initialization feature in IIS to warmup the solution after the site starts.

The code for the heartbeat.aspx lives in Sitecore.Web.Services.HeartbeatCode in the Sitecore.Client assembly. The important methods are virtual so they can be overriden to implement additional checks to ensure all critical components of the solutions are healthy.

There are many Sitecore solutions where the private session state is stored in Redis and its availability is critical. In such scenarios it will make sense to ping Redis from the health check to ensure the server can access it. Below code sample shows how to check the Redis database which is setup for private session state:

public class CustomHeartbeat : Sitecore.Web.Services.HeartbeatCode
{
    protected BeatResults CheckRedis(BeatResults beatresult)
    {
        //get connection details for private Redis session database
        //same pattern can be used to check shared session database
        var sessionSection = (SessionStateSection)WebConfigurationManager.GetSection("system.web/sessionState");
        var connString = sessionSection.Providers["Redis"].Parameters.Get("connectionString");
        string redisConnection = ConfigurationManager.ConnectionStrings[connString].ConnectionString;

        using (ConnectionMultiplexer connection = ConnectionMultiplexer.Connect(redisConnection))
        {
            var subscriber = connection.GetSubscriber();
            var timespan = subscriber.Ping();

            Log.Info($"Successfully pinged Redis from healthcheck in: {timespan}", this);
        }

        return beatresult;
    }

    protected override BeatResults DoBeat()
    {
        //this checks the SQL databases
        var beatResults = base.DoBeat();

        beatResults = CheckRedis(beatResults);

        return beatResults;
    }
}

Turn off Session State locking in Sitecore MVC pages

The default implementation of the ASP .NET Session State Module uses exclusive locking for each request from the same session. This means ASP .NET will only execute one request at a time from the same browser. Any other request will be locked by the Session State Module and will not be executed until the previous request is complete and it can obtain the exclusive lock. This can cause performance issues in many real-world scenarios.

Below screenshot from IIS shows 6 concurrent request to the homepage from the same browser. Sitecore is only executing the bottom request, which is in the ExecuteRequestHandler state. All other 5 requests are in the RequestAcquireState state and will only be fulfilled one at a time after the bottom request is complete. Each of the requests in RequestAcquireState state will check the session store every 0.5 seconds to see if it can obtain a lock.

This can cause pressure on the session state store in case many requests take some time to execute. Depending on the session store it is common to see messages like below in log:

Common errors with session state in Redis:

Exception type: TimeoutException
Exception message: Timeout performing EVAL, inst: ....
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags)
at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.<>c__DisplayClass7.b__6()
at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryForScriptNotFound(Func1 redisOperation) at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryLogic(Func1 redisOperation)
at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs)
at Sitecore.SessionProvider.Redis.RedisConnectionWrapper.TryTakeWriteLockAndGetData(String sessionId, DateTime lockTime, Object& lockId, ISessionStateItemCollection& data, Int32& sessionTimeout)
at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemFromSessionStore(Boolean isWriteLockRequired, HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemExclusive(HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
at System.Web.SessionState.SessionStateModule.GetSessionStateItem()

Common errors with session state in SQL:

Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
Source: System.Data
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   ... 
   at System.Web.SessionState.SqlSessionStateStore.SqlStateConnection..ctor(SqlPartitionInfo sqlPartitionInfo, TimeSpan retryInterval)
Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
Source: System.Data
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   ... 
   at System.Web.SessionState.SqlSessionStateStore.SqlStateConnection..ctor(SqlPartitionInfo sqlPartitionInfo, TimeSpan retryInterval)

Common errors with session state in Mongo:

ERROR Application error.
Exception: System.TimeoutException
Message: Timeout waiting for a MongoConnection.
Source: MongoDB.Driver
   at MongoDB.Driver.Internal.MongoConnectionPool.AcquireConnection(AcquireConnectionOptions options)
   ...
   at Sitecore.SessionProvider.MongoDB.MongoSessionStateProvider.GetItemExclusive(HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
   at System.Web.SessionState.SessionStateModule.GetSessionStateItem()
   at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
   at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Too many locked requests from a single session

ERROR Application error.
Exception: System.Web.HttpException
Message: The request queue limit of the session is exceeded.
Source: System.Web
   at System.Web.SessionState.SessionStateModule.QueueRef()
   at System.Web.SessionState.SessionStateModule.PollLockedSession()
   at System.Web.SessionState.SessionStateModule.GetSessionStateItem()
   at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
   at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Sitecore has a good KB article which describes this in more detail which can be found here. This article mentions to set session state to readonly and describes how to do this for 2 scenarios:

  • Custom MVC Routes: Set the session state to readonly on the controller. This can be done by decorating the controller with this attribute: [SessionState(SessionStateBehavior.ReadOnly)]
  • ASP.NET Web Forms pages: Set the EnableSessionState=”Readonly” on the pages directive

This article does not mention how to fix this for Sitecore MVC pages. The solution provided below describes how to address this for Sitecore MVC pages.

Solution

Sitecore sets this to the Default Session state behavior in the SitecoreControllerFactory for Sitecore MVC pages. This is a virtual method so this can be overridden to change the session state behavior:

using Sitecore.Diagnostics;
using Sitecore.Mvc.Controllers;
using Sitecore.Mvc.Extensions;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.Mvc;
using System.Web.Routing;
using System.Web.SessionState;

namespace Foundation.Extensions.Factory
{
    public class ReadOnlySessionStateSitecoreControllerFactory : SitecoreControllerFactory
    {
        public ReadOnlySessionStateSitecoreControllerFactory(IControllerFactory innerFactory) : base(innerFactory)
        {
        }

        public override SessionStateBehavior GetControllerSessionBehavior(RequestContext requestContext, string controllerName)
        {
            Assert.ArgumentNotNull(requestContext, "requestContext");
            Assert.ArgumentNotNull(controllerName, "controllerName");

            if (controllerName.EqualsText(SitecoreControllerName))
            {
                return SessionStateBehavior.ReadOnly;
            }

            return InnerFactory.GetControllerSessionBehavior(requestContext, controllerName);
        }
    }
}

An initialize pipeline processor needs to be created to set our new controller factory:

using Foundation.SitecoreExtensions.Factory;
using Sitecore.Mvc.Controllers;
using Sitecore.Mvc.Pipelines.Loader;
using Sitecore.Pipelines;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;

namespace Foundation.Extensions.Processors.Initialize
{
    public class InitializeReadOnlySessionStateSitecoreControllerFactory : InitializeControllerFactory
    {
        protected Func<System.Web.Mvc.ControllerBuilder> ControllerBuilder = () => System.Web.Mvc.ControllerBuilder.Current;

        protected override void SetControllerFactory(PipelineArgs args)
        {
            System.Web.Mvc.ControllerBuilder controllerBuilder = ControllerBuilder();
            var controllerFactory = new ReadOnlySessionStateSitecoreControllerFactory(controllerBuilder.GetControllerFactory());
            controllerBuilder.SetControllerFactory(controllerFactory);
        }
    }
}

Below XML file can be used to patch in this new pipeline processor

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <pipelines>
      <initialize>
        <processor type="Foundation.Extensions.Processors.Initialize.InitializeReadOnlySessionStateSitecoreControllerFactory, Foundation.Extensions" patch:instead="*[@type='Sitecore.Mvc.Pipelines.Loader.InitializeControllerFactory, Sitecore.Mvc']"/>
      </initialize>
    </pipelines>
  </sitecore>
</configuration>

Below screenshot shows the same scenario as in the beginning of this post, but now all 8 requests are getting executed at the same time.

Setting the session state to readonly for Sitecore MVC pages can cause significant performance improvements and will help reduce the load on the session store as described in Sitecore’s KB article. Before doing this it is important to understand below considerations:

  • Multiple requests from the same browser will execute at the same time. Your application should be able to handle this without causing any unintended issues by multiple threads modifying shared objects at the same time.
  • Custom objects cannot be stored in the session state anymore when it is set to ReadOnly, except when the session state is in process. Using a custom cache as already suggested in Sitecore’s article is a good solution.
  • This issue might not occur when a site is running smoothly, but can turn a small issue into an overall site stability issue. The session store can get under a lot of load for example if some pages in your site start being slow or in case of an app pool recycle. This can impact the overall stability of the site as it can overload SQL, Redis or Mongo.

Sitecore and Redis lessons learned

I noticed that my previous post about Redis is one of the most popular on my blog. Since I’ve been using Redis for a while I decided to write another post with some of the lessons learned.

Sitecore connectivity to Redis

The first step in getting Redis to work with Sitecore is to ensure there is connectivity between them. When Sitecore starts up it will ping Redis. The Sitecore log will contain something like below when connectivity to Redis is established successfully. Notice the Redis response to the Ping and the message that the endpoint returned with success.

11056 11:06:22 INFO  Sending critical tracer: Interactive/jeroen.redis.cache.windows.net:6380
11056 11:06:22 INFO  Writing to Interactive/jeroen.redis.cache.windows.net:6380: ECHO
11056 11:06:22 INFO  Flushing outbound buffer
11056 11:06:22 INFO  Starting read
11056 11:06:22 INFO  Connect complete: jeroen.redis.cache.windows.net:6380
11056 11:06:22 INFO  Response from Interactive/jeroen.redis.cache.windows.net:6380 / ECHO: BulkString: 16 bytes
WIN-RCJOA5J2MOL:Write 11:06:22 INFO  Writing to Interactive/jeroen.redis.cache.windows.net:6380: GET __Booksleeve_TieBreak
WIN-RCJOA5J2MOL:Write 11:06:22 INFO  Writing to Interactive/jeroen.redis.cache.windows.net:6380: PING
8912 11:06:22 INFO  Response from Interactive/jeroen.redis.cache.windows.net:6380 / GET __Booksleeve_TieBreak: (null)
8912 11:06:22 INFO  Response from Interactive/jeroen.redis.cache.windows.net:6380 / PING: SimpleString: PONG
1068 11:06:22 INFO  All tasks completed cleanly, IOCP: (Busy=0,Free=800,Min=800,Max=800), WORKER: (Busy=43,Free=757,Min=789,Max=800)
1068 11:06:22 INFO  jeroen.redis.cache.windows.net:6380 returned with success

There can be a variety of issues which prevents Sitecore from connecting to Redis:

  • Wrong Redis engine version: Sitecore does not work with Redis engine version 4 or 5. This is easy to get wrong especially if using AWS ElastiCache which currently defaults to version 5.0.3. When using AWS ElastiCache make sure to select version 3.2.6. This issue is not obvious from the log. When using the wrong version the log might show something like this:
INFO name.cache.amazonaws.com: 6380 failed to nominate (Faulted)
INFO > UnableToResolvePhysicalConnection on GET 33488
  • AccessKey missing in connection string: The access key might need to be put inside connectionString value. I have blogged about this issue before see here
  • Intermittent timeout issues: There might be intermittent timeout issues when Sitecore is connected to Redis. This KB article provides a good start to resolve these kind of issues. If this happens the log will show something like this:
Exception: System.TimeoutException
Message: Timeout performing EVAL, inst: 1, mgr: Inactive, err: never, queue: 24, qu: 0, qs: 24, qc: 0, wr: 0, wq: 0, in: 12544, ar: 0, IOCP: (Busy=5,Free=395,Min=200,Max=400), WORKER: (Busy=4,Free=396,Min=88,Max=400), clientName: client
Source: StackExchange.Redis.StrongName
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
   at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.<>c__DisplayClass12_0.<Eval>b__0()
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryForScriptNotFound(Func`1 redisOperation)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs)
   at Sitecore.SessionProvider.Redis.RedisConnectionWrapper.TryTakeWriteLockAndGetData(String sessionId, DateTime lockTime, Object& lockId, ISessionStateItemCollection& data, Int32& sessionTimeout)
   at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemFromSessionStore(Boolean isWriteLockRequired, HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
  at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemExclusive(HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
   at System.Web.SessionState.SessionStateModule.GetSessionStateItem()
   at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
   at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Designing for performance

There are many factors which impact the performance of Redis. The only way to determine the best configuration for a certain site is to perform a load test with a load that is similar to production traffic. Based on my experience I recommend exploring below options:

  • Enable Clustering: It is often more effective to create a Redis cluster with multiple instances than to increase the size of a single non clustered Redis instance. Each Redis instance can only be scaled vertically by allocating more resources to it. With a cluster Redis will create multiple instances and divide the data over the instances based on its key. This technique is also referred to as sharding and is supported by Redis, which makes it transparent to Sitecore. Therefore there are no changes needed on Sitecore’s side, it just needs to have its Redis connection string pointed to the endpoint of the cluster.
    • Important note: Sitecore is using StackExchange.Redis.StrongName to access Redis. “Move” exceptions can occur below version 1.1.603 of this library when clustering is enabled. A little more information about this issue can be found here. This link only describes the issue in Azure but the same issue can occur anywhere else as well. Per below table all Sitecore 9.0 versions use a version of the Stackexchange Redis driver below 1.1.603 and might throw “Move” exceptions when configured to use a Redis cluster.
      Sitecore StackExchange Redis
      9.0 Initial Release (171002) 1.0.488
      9.0 Update-1 (171219) 1.0.488
      9.0 Update-2 (180604) 1.0.488
      9.1 Initial Release (001564) 1.2.6
      9.1 Update-1 (002459) 1.2.6
  • Keep compression disabled: the Redis server is single-threaded. This makes it perform well with small key-value pairs, but performance will decrease when the size of the data it stores goes up. The advantage of disabling compression is that Sitecore does not need to spend CPU time compressing and decompressing the data. However the amount of data that needs to be send to Redis goes up, we have seen the amount of data send to Redis triple without compression. This had a significant adverse impact on Redis’ performance and the performance of the entire site. The extra CPU time with compression enabled was negligible compared to overall CPU. Below image taken from Redis.io shows how throughput decreases with increased data size.