Check Redis connectivity inside heartbeat.aspx health check

Getting your health check right is important when designing a highly available and elastic Sitecore solution. For years Sitecore comes with an builtin health check page at /sitecore/service/heartbeat.aspx which checks the status of the the SQL databases. This can be used for a load balancer or docker health check. A few quick notes on the heartbeat.aspx:

  • In some versions of Sitecore the heartbeat.aspx will throw an error, and you will have to exclude some connection strings from it as described in a different article on this blog
  • Starting in Sitecore 9.3 a new health check mechanism is used based on the Microsoft.Extensions.Diagnostics.HealthChecks namespace. Here is a great article describing how to customize this. The same code from below can be used in the updated health check mechanism.

There are several different approaches when setting up a health check in Sitecore. In most cases I recommend keeping the health check small to prevent it from going unhealthy during heavy load, this technique can be combined with the Application Initialization feature in IIS to warmup the solution after the site starts.

The code for the heartbeat.aspx lives in Sitecore.Web.Services.HeartbeatCode in the Sitecore.Client assembly. The important methods are virtual so they can be overriden to implement additional checks to ensure all critical components of the solutions are healthy.

There are many Sitecore solutions where the private session state is stored in Redis and its availability is critical. In such scenarios it will make sense to ping Redis from the health check to ensure the server can access it. Below code sample shows how to check the Redis database which is setup for private session state:

public class CustomHeartbeat : Sitecore.Web.Services.HeartbeatCode
{
    protected BeatResults CheckRedis(BeatResults beatresult)
    {
        //get connection details for private Redis session database
        //same pattern can be used to check shared session database
        var sessionSection = (SessionStateSection)WebConfigurationManager.GetSection("system.web/sessionState");
        var connString = sessionSection.Providers["Redis"].Parameters.Get("connectionString");
        string redisConnection = ConfigurationManager.ConnectionStrings[connString].ConnectionString;

        using (ConnectionMultiplexer connection = ConnectionMultiplexer.Connect(redisConnection))
        {
            var subscriber = connection.GetSubscriber();
            var timespan = subscriber.Ping();

            Log.Info($"Successfully pinged Redis from healthcheck in: {timespan}", this);
        }

        return beatresult;
    }

    protected override BeatResults DoBeat()
    {
        //this checks the SQL databases
        var beatResults = base.DoBeat();

        beatResults = CheckRedis(beatResults);

        return beatResults;
    }
}

Turn off Session State locking in Sitecore MVC pages

The default implementation of the ASP .NET Session State Module uses exclusive locking for each request from the same session. This means ASP .NET will only execute one request at a time from the same browser. Any other request will be locked by the Session State Module and will not be executed until the previous request is complete and it can obtain the exclusive lock. This can cause performance issues in many real-world scenarios.

Below screenshot from IIS shows 6 concurrent request to the homepage from the same browser. Sitecore is only executing the bottom request, which is in the ExecuteRequestHandler state. All other 5 requests are in the RequestAcquireState state and will only be fulfilled one at a time after the bottom request is complete. Each of the requests in RequestAcquireState state will check the session store every 0.5 seconds to see if it can obtain a lock.

This can cause pressure on the session state store in case many requests take some time to execute. Depending on the session store it is common to see messages like below in log:

Common errors with session state in Redis:

Exception type: TimeoutException
Exception message: Timeout performing EVAL, inst: ....
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags)
at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.<>c__DisplayClass7.b__6()
at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryForScriptNotFound(Func1 redisOperation) at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryLogic(Func1 redisOperation)
at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs)
at Sitecore.SessionProvider.Redis.RedisConnectionWrapper.TryTakeWriteLockAndGetData(String sessionId, DateTime lockTime, Object& lockId, ISessionStateItemCollection& data, Int32& sessionTimeout)
at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemFromSessionStore(Boolean isWriteLockRequired, HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemExclusive(HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
at System.Web.SessionState.SessionStateModule.GetSessionStateItem()

Common errors with session state in SQL:

Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
Source: System.Data
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   ... 
   at System.Web.SessionState.SqlSessionStateStore.SqlStateConnection..ctor(SqlPartitionInfo sqlPartitionInfo, TimeSpan retryInterval)
Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
Source: System.Data
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   ... 
   at System.Web.SessionState.SqlSessionStateStore.SqlStateConnection..ctor(SqlPartitionInfo sqlPartitionInfo, TimeSpan retryInterval)

Common errors with session state in Mongo:

ERROR Application error.
Exception: System.TimeoutException
Message: Timeout waiting for a MongoConnection.
Source: MongoDB.Driver
   at MongoDB.Driver.Internal.MongoConnectionPool.AcquireConnection(AcquireConnectionOptions options)
   ...
   at Sitecore.SessionProvider.MongoDB.MongoSessionStateProvider.GetItemExclusive(HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
   at System.Web.SessionState.SessionStateModule.GetSessionStateItem()
   at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
   at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Too many locked requests from a single session

ERROR Application error.
Exception: System.Web.HttpException
Message: The request queue limit of the session is exceeded.
Source: System.Web
   at System.Web.SessionState.SessionStateModule.QueueRef()
   at System.Web.SessionState.SessionStateModule.PollLockedSession()
   at System.Web.SessionState.SessionStateModule.GetSessionStateItem()
   at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
   at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Sitecore has a good KB article which describes this in more detail which can be found here. This article mentions to set session state to readonly and describes how to do this for 2 scenarios:

  • Custom MVC Routes: Set the session state to readonly on the controller. This can be done by decorating the controller with this attribute: [SessionState(SessionStateBehavior.ReadOnly)]
  • ASP.NET Web Forms pages: Set the EnableSessionState=”Readonly” on the pages directive

This article does not mention how to fix this for Sitecore MVC pages. The solution provided below describes how to address this for Sitecore MVC pages.

Solution

Sitecore sets this to the Default Session state behavior in the SitecoreControllerFactory for Sitecore MVC pages. This is a virtual method so this can be overridden to change the session state behavior:

using Sitecore.Diagnostics;
using Sitecore.Mvc.Controllers;
using Sitecore.Mvc.Extensions;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.Mvc;
using System.Web.Routing;
using System.Web.SessionState;

namespace Foundation.Extensions.Factory
{
    public class ReadOnlySessionStateSitecoreControllerFactory : SitecoreControllerFactory
    {
        public ReadOnlySessionStateSitecoreControllerFactory(IControllerFactory innerFactory) : base(innerFactory)
        {
        }

        public override SessionStateBehavior GetControllerSessionBehavior(RequestContext requestContext, string controllerName)
        {
            Assert.ArgumentNotNull(requestContext, "requestContext");
            Assert.ArgumentNotNull(controllerName, "controllerName");

            if (controllerName.EqualsText(SitecoreControllerName))
            {
                return SessionStateBehavior.ReadOnly;
            }

            return InnerFactory.GetControllerSessionBehavior(requestContext, controllerName);
        }
    }
}

An initialize pipeline processor needs to be created to set our new controller factory:

using Foundation.SitecoreExtensions.Factory;
using Sitecore.Mvc.Controllers;
using Sitecore.Mvc.Pipelines.Loader;
using Sitecore.Pipelines;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;

namespace Foundation.Extensions.Processors.Initialize
{
    public class InitializeReadOnlySessionStateSitecoreControllerFactory : InitializeControllerFactory
    {
        protected Func<System.Web.Mvc.ControllerBuilder> ControllerBuilder = () => System.Web.Mvc.ControllerBuilder.Current;

        protected override void SetControllerFactory(PipelineArgs args)
        {
            System.Web.Mvc.ControllerBuilder controllerBuilder = ControllerBuilder();
            var controllerFactory = new ReadOnlySessionStateSitecoreControllerFactory(controllerBuilder.GetControllerFactory());
            controllerBuilder.SetControllerFactory(controllerFactory);
        }
    }
}

Below XML file can be used to patch in this new pipeline processor

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <pipelines>
      <initialize>
        <processor type="Foundation.Extensions.Processors.Initialize.InitializeReadOnlySessionStateSitecoreControllerFactory, Foundation.Extensions" patch:instead="*[@type='Sitecore.Mvc.Pipelines.Loader.InitializeControllerFactory, Sitecore.Mvc']"/>
      </initialize>
    </pipelines>
  </sitecore>
</configuration>

Below screenshot shows the same scenario as in the beginning of this post, but now all 8 requests are getting executed at the same time.

Setting the session state to readonly for Sitecore MVC pages can cause significant performance improvements and will help reduce the load on the session store as described in Sitecore’s KB article. Before doing this it is important to understand below considerations:

  • Multiple requests from the same browser will execute at the same time. Your application should be able to handle this without causing any unintended issues by multiple threads modifying shared objects at the same time.
  • Custom objects cannot be stored in the session state anymore when it is set to ReadOnly, except when the session state is in process. Using a custom cache as already suggested in Sitecore’s article is a good solution.
  • This issue might not occur when a site is running smoothly, but can turn a small issue into an overall site stability issue. The session store can get under a lot of load for example if some pages in your site start being slow or in case of an app pool recycle. This can impact the overall stability of the site as it can overload SQL, Redis or Mongo.

Deploying Sitecore with Terraform Part 1

During last month’s Sitecore symposium I had the pleasure to present with my colleague Paula Simontacchi on deploying Sitecore through Terraform. This is the first post in a series of 2 and will discuss Terraform features which are beneficial when deploying Sitecore.

Terraform introduction

Terraform is an Infrastructure as Code (IaC) tool developed by Hashicorp. A Terraform solution is written in the HCL language, which is a proprietary language from Hashicorp. Below small code sample shows how to create a Resource Group and Virtual Network in Azure:

resource "azurerm_resource_group" "tfsimple" {
  name     = "tf-resources"
  location = "${var.location}"
}

resource "azurerm_virtual_network" "tfsimple" {
  name                = "tf-network"
  address_space       = ["10.0.0.0/16"]
  location            = "${azurerm_resource_group.tfsimple.location}"
  resource_group_name = "${azurerm_resource_group.tfsimple.name}"
}

There are many good resources to learn Terraform in more detail, for example the Terraform docs site or this PluralSight training. A few key concepts will be covered here. It is encouraged to have a solid understanding of Terraform before using it in production Sitecore deployments.

Multi-provider based model

Terraform uses a provider based model and has providers for almost everything you would want to deploy to. For example it supports all major cloud providers but also has providers for solutions like Cloudflare, Akamai, Docker or F5. A more elaborate list can be found in Terraform’s site here

Terraform workflow

Below diagram shows the typical Terraform workflow:

  1. Init: Initializes working directory and downloads providers
  2. Plan: Creates and displays the execution plan
  3. Apply: Makes the changes to the underlying platform
  4. Destroy: Deletes the changes made in step 3

Step 1 is not necessary when you have already initialized the working directory and have downloaded all providers.

Step 4 is optional as well. However it is recommended to always destroy and reprovision infrastructure at least when changes are made to it, to ensure Terraform stays up-to-date and can continue to be used reliably to stand up infrastructure.

Plan/Dry run

The plan phase, also referred to as dry-run sometimes, is the most interesting phase. During this phase the Terraform code is compared to the underlying deployment and displays the difference i.e. the updates it will make during the apply phase. This is useful for following reasons:

  1. The result of the deployment can be validated without actually running it. This can save a lot of time and money in most Sitecore deployments as deploying all infrastructure is a time consuming process and doing this many times can result in significant cost.
  2. The result of the plan phase can be saved for execution later, for example by a different team or during a maintenance window. Running the saved plan will avoid any surprises and will provision the infrastructure exactly as per the plan
  3. In an enterprise scenario infrastructure will be provisioned from CI/CD and not from a local developers machine. Performing a dry-run is a good validation before pushing changes to source control.

Modularity

Terraform natively supports modules. Modules can be used to create reusable infrastructure. Modules can reference other modules as well. Common resources to create through modules are Subnets, Vnets, Security Groups or Vms.

A module typically creates resources based on some values passed in through variables. Outputs can be used to pass information about the created resources back to the calling code. Below is a sample module which will create a windows VM. It will use some of the variables to determine the correct settings and it will return the public IP as an output.

module "windowsservers" {
    source                        = "Azure/compute/azurerm"    
    version                       = "1.2.0"
    location                      = "${var.location}"
    remote_port                   = "3389"
    vm_size                       = "${lookup(var.vm_size, var.environment)}"
    vnet_subnet_id                = "${module.network.vnet_subnets[0]}"
  }

  output "windows_vm_public_ip"{
    value = "${module.windowsservers.public_ip_address}"
  }

Dependency Tracking

Providers in Terraform are aware of dependencies between resources. This provides some key benefits:

  • Create resources in parallel: any independent resources are created in parallel. In a Sitecore scenario this means that all the databases can be stood up at the same as the VMs
  • Visualize architectural dependencies: Terraform can generate a dependency graph which will show dependencies between all the resources in the deployment
  • IDE support: popular IDE’s have plugins for Terraform which show where each resource or variable is used, similar to “find references” in Visual Studio. This is helpful when understanding the impact of changes made to the Terraform solution.

Terraform vs. ARM

ARM is a popular solution for Sitecore deployments in Azure, however there are some benefits to using Terraform even in Azure. Below comparison lists some key differences:

TerraformARM
plan/dry-run: Validates with the deployment in place and calculates delta. This delta can be saved for later use Validation: Will validate syntax, but does not compare to underlying deployment
HCL language: supports features like interpolation, attributes, and comments JSON language: Powerful, but missing features like interpolation and does not support comments
Modules: Modules are first-class citizen Modules: Modules can be created through nested templates, but not supported natively
Usage: Terraform has a provider for almost any resource. Supports hybrid cloud or Azure in combination with other infra, e.g. Azure with Cloudflare CDNUsage: specific to Azure

Sitecore and Redis lessons learned

I noticed that my previous post about Redis is one of the most popular on my blog. Since I’ve been using Redis for a while I decided to write another post with some of the lessons learned.

Sitecore connectivity to Redis

The first step in getting Redis to work with Sitecore is to ensure there is connectivity between them. When Sitecore starts up it will ping Redis. The Sitecore log will contain something like below when connectivity to Redis is established successfully. Notice the Redis response to the Ping and the message that the endpoint returned with success.

11056 11:06:22 INFO  Sending critical tracer: Interactive/jeroen.redis.cache.windows.net:6380
11056 11:06:22 INFO  Writing to Interactive/jeroen.redis.cache.windows.net:6380: ECHO
11056 11:06:22 INFO  Flushing outbound buffer
11056 11:06:22 INFO  Starting read
11056 11:06:22 INFO  Connect complete: jeroen.redis.cache.windows.net:6380
11056 11:06:22 INFO  Response from Interactive/jeroen.redis.cache.windows.net:6380 / ECHO: BulkString: 16 bytes
WIN-RCJOA5J2MOL:Write 11:06:22 INFO  Writing to Interactive/jeroen.redis.cache.windows.net:6380: GET __Booksleeve_TieBreak
WIN-RCJOA5J2MOL:Write 11:06:22 INFO  Writing to Interactive/jeroen.redis.cache.windows.net:6380: PING
8912 11:06:22 INFO  Response from Interactive/jeroen.redis.cache.windows.net:6380 / GET __Booksleeve_TieBreak: (null)
8912 11:06:22 INFO  Response from Interactive/jeroen.redis.cache.windows.net:6380 / PING: SimpleString: PONG
1068 11:06:22 INFO  All tasks completed cleanly, IOCP: (Busy=0,Free=800,Min=800,Max=800), WORKER: (Busy=43,Free=757,Min=789,Max=800)
1068 11:06:22 INFO  jeroen.redis.cache.windows.net:6380 returned with success

There can be a variety of issues which prevents Sitecore from connecting to Redis:

  • Wrong Redis engine version: Sitecore does not work with Redis engine version 4 or 5. This is easy to get wrong especially if using AWS ElastiCache which currently defaults to version 5.0.3. When using AWS ElastiCache make sure to select version 3.2.6. This issue is not obvious from the log. When using the wrong version the log might show something like this:
INFO name.cache.amazonaws.com: 6380 failed to nominate (Faulted)
INFO > UnableToResolvePhysicalConnection on GET 33488
  • AccessKey missing in connection string: The access key might need to be put inside connectionString value. I have blogged about this issue before see here
  • Intermittent timeout issues: There might be intermittent timeout issues when Sitecore is connected to Redis. This KB article provides a good start to resolve these kind of issues. If this happens the log will show something like this:
Exception: System.TimeoutException
Message: Timeout performing EVAL, inst: 1, mgr: Inactive, err: never, queue: 24, qu: 0, qs: 24, qc: 0, wr: 0, wq: 0, in: 12544, ar: 0, IOCP: (Busy=5,Free=395,Min=200,Max=400), WORKER: (Busy=4,Free=396,Min=88,Max=400), clientName: client
Source: StackExchange.Redis.StrongName
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
   at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.<>c__DisplayClass12_0.<Eval>b__0()
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryForScriptNotFound(Func`1 redisOperation)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs)
   at Sitecore.SessionProvider.Redis.RedisConnectionWrapper.TryTakeWriteLockAndGetData(String sessionId, DateTime lockTime, Object& lockId, ISessionStateItemCollection& data, Int32& sessionTimeout)
   at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemFromSessionStore(Boolean isWriteLockRequired, HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
  at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemExclusive(HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)
   at System.Web.SessionState.SessionStateModule.GetSessionStateItem()
   at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
   at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Designing for performance

There are many factors which impact the performance of Redis. The only way to determine the best configuration for a certain site is to perform a load test with a load that is similar to production traffic. Based on my experience I recommend exploring below options:

  • Enable Clustering: It is often more effective to create a Redis cluster with multiple instances than to increase the size of a single non clustered Redis instance. Each Redis instance can only be scaled vertically by allocating more resources to it. With a cluster Redis will create multiple instances and divide the data over the instances based on its key. This technique is also referred to as sharding and is supported by Redis, which makes it transparent to Sitecore. Therefore there are no changes needed on Sitecore’s side, it just needs to have its Redis connection string pointed to the endpoint of the cluster.
    • Important note: Sitecore is using StackExchange.Redis.StrongName to access Redis. “Move” exceptions can occur below version 1.1.603 of this library when clustering is enabled. A little more information about this issue can be found here. This link only describes the issue in Azure but the same issue can occur anywhere else as well. Per below table all Sitecore 9.0 versions use a version of the Stackexchange Redis driver below 1.1.603 and might throw “Move” exceptions when configured to use a Redis cluster.
      Sitecore StackExchange Redis
      9.0 Initial Release (171002) 1.0.488
      9.0 Update-1 (171219) 1.0.488
      9.0 Update-2 (180604) 1.0.488
      9.1 Initial Release (001564) 1.2.6
      9.1 Update-1 (002459) 1.2.6
  • Keep compression disabled: the Redis server is single-threaded. This makes it perform well with small key-value pairs, but performance will decrease when the size of the data it stores goes up. The advantage of disabling compression is that Sitecore does not need to spend CPU time compressing and decompressing the data. However the amount of data that needs to be send to Redis goes up, we have seen the amount of data send to Redis triple without compression. This had a significant adverse impact on Redis’ performance and the performance of the entire site. The extra CPU time with compression enabled was negligible compared to overall CPU. Below image taken from Redis.io shows how throughput decreases with increased data size.

Solve caching issues when rendering is on page multiple times

HTML caching is arguably the best way to improve Sitecore performance. Sometimes you can run into issues when you enable HTML cache on a rendering and the rendering has been added to the same page multiple times. This will only happen if the renderings do not have a datasource or share the same datasource, but will still render different content. This could happen for example when the renderings have different rendering parameters or have some custom logic which changes the content.

This can be fixed in a generic way by overriding the GenerateKey method of the GenerateCacheKey RenderRendering Processor. Below code will add the UniqueId of each rendering to the cachekey which will ensure the cached output is unique for each rendering.

using Sitecore.Mvc.Pipelines.Response.RenderRendering;
using Sitecore.Mvc.Presentation;

namespace Foundation.Pipelines.RenderRendering
{
    public class GenerateCustomCacheKey : GenerateCacheKey
    {
        protected override string GenerateKey(Rendering rendering, RenderRenderingArgs args)
        {
            var cacheKey = base.GenerateKey(rendering, args);

            cacheKey += rendering.UniqueId;

            return cacheKey;
        }
    }
}

 

Integrate Sitecore with Alexa

During last month’s Sitecore symposium I had the pleasure to present with my colleague Ben Adamski on expanding the reach of your Sitecore content with voice-activated assistants through an Alexa skill. This blog post will describe the integration discussed during this presentation and will provide some additional details.

Sitecore 9 omnichannel foundation

Sitecore has a solid omnichannel foundation which enables it to act as a headless CMS. Below diagram shows the main integration points exposed by Sitecore out of the box.

Omnichannel Foundation

  1. OData Item Service: this service can be used to query and retrieve any Sitecore item and retrieve it in JSON format.
  2. SXA Layout Service: the SXA layout service supports modelling content as JSON. This is done in the experience editor and uses the same layout engine as regular Sitecore pages. This allows content authors to use the tools they are already familiar with and personalization is supported. Also analytics and tracking are working like a regular Sitecore page as the layout engine is used to render.
  3. xConnect Client API: the xConnect client API needs to be used to retrieve the previous customers’ interactions with Sitecore.
  4. Commerce 9 OData API: any data which resides in Sitecore Commerce can be retrieved using this API.

Integration with the Sitecore services

There are several options to call the Sitecore services mentioned above. They were called from AWS Lambda in our demo during Symposium but there are some other options too:

  1. AWS Lambda: this is AWS’ serverless computing platform. Here are some key considerations for hosting this in Lambda:
    Pro:
    – Relatively simple integration with Alexa. Alexa runs in AWS and integration with Lambda takes just a few clicks and there are many examples online.
    – Little effort required to include Alexa SDK which simplifies integration with Alexa
    Con:
    – Most Sitecore developers are not familiar with Lambda and will need to spend some time getting up to speed
  2. Azure/on-premise: Alexa can call any restful endpoint so the integration layer can be hosted anywhere accessible by AWS so this can be hosted in Azure or your existing on-premise data center:
    Pro:
    – No need to get up to speed with a new platform
    Con:
    – Will require more effort to integrate securely with Alexa

Alexa Skill Kit SDK

There is an Alexa Skill Kit SDK available which makes working with Alexa significantly easier. The SDK is available in Node.js, Java and Python. Getting started with the Node.js SDK is surprisingly simple for C# developers as the new version 2 of the SDK is using async/await and promises instead of the callback based style which was previously used. Below is a code sample which runs when the user performs a search in the Alexa Skill. There are a few things to note about this:

  • This uses the Item Service to perform the search. The query is built on line 9.
  • The call to execute and await the search query is on line 11 and the httpGet method is starting at line 31. This calls the Item Service.
  • Methods from the Alexa Skill Kit SDK are used extensively for example on lines 23-28 to send the output speech to Alexa.

const SearchIntentHandler = {
    canHandle(handlerInput) {
        return handlerInput.requestEnvelope.request.type === 'IntentRequest'
            && handlerInput.requestEnvelope.request.intent.name === 'SearchIntent';
    },
    async handle(handlerInput) {
        const searchTerm = handlerInput.requestEnvelope.request.intent.slots.SearchTerm.value;

        const query = '/item/search?term=' + searchTerm;

        const response = await httpGet(query);

        var searchResult = "";
        var cnt = 0;

        for (var i = 0; i  {
        const request = http.request(options, (response) => {
            response.setEncoding('utf8');
            let returnData = '';

            if (response.statusCode = 300) {
                return reject(new Error(`${response.statusCode}: ${response.req.getHeader('host')} ${response.req.path}`));
            }

            response.on('data', (chunk) => {
                returnData += chunk;
            });

            response.on('end', () => {
                resolve(JSON.parse(returnData));
            });

            response.on('error', (error) => {
                reject(error);
            });
        });
        request.end();
    });
}

Alexa Skill Interaction Model

The focus on this blog post is on the Sitecore integration with Alexa but it is important to understand that there is some Alexa work as well, specifically setting up the interaction model. There are 3 main entities in the interaction model:

  • Intents: the intent defines what the user is trying to achieve. The code above is handling the search intent.
  • Utterances: these are phrases likely spoken by the user to invoke the intent. Most intents will have multiple utterances. In above example an utterance mapped to the search intent could be “please search for “
  • Custom slot types: slot types hold the values for phrases the user says, but cannot be part in the utterance. In above example the “search term” is an example of a slot type and Alexa will automatically populate it with the search team spoken by the user.

The interaction model is stored in json, below is the json from the search intent. More information about the interaction model can be found here.

{
  "name": "SearchIntent",
  "slots": [
    {
      "name": "SearchTerm",
      "type": "AMAZON.SearchQuery"
    }
  ],
  "samples": [
    "please search for {SearchTerm}",
    "search for {SearchTerm}",
    "what is {SearchTerm}"
  ]
}

Integrated Alexa with other channels

It is important to built an Alexa Skill which is integrated with your brand’s other channels. A user is not going to have a good voice experience with a disconnected Alexa skill as this skill is not able to leverage customer interaction information from other channels to deliver a relevant and personalized experience. It is also important to understand customers behavior across all channels to get a single view of the customer and to provide relevant content to each user.

With the customer’s permission Alexa can return the location of the customer. This can be used to provide more relevant location based content to the user. During our presentation we showed location based personalization with Sitecore and Alexa. The location cannot be used integrate between channels as Amazon does not allow use of the location to associate the user to a customer with the same address. Amazon can reject or suspend your skill if they find out this is being done. More information about the use of location can be found here

Account linking is the feature which should be used to connect Alexa with other channels. Account linking connects the identity of the Alexa user to an identity in a third party system through OAuth 2.0. Setting this up will be easier if the Sitecore solution runs on version 9 since this supports federated authentication. More information about account linking can be found here.

Sitecore 9 fix heartbeat.aspx

The heartbeat page is a useful page in Sitecore as it shows if Sitecore can connect to it’s databases. If so it will return a 200 status. It can be found at /sitecore/service/heartbeat.aspx and it can be a good practice to point the load balancer’s health check to this page. This will avoid that any traffic is send to a server which cannot connect to its backend database.

Sitecore 9 has introduced a number of new connectionstrings with xConnect and the heartbeat page will fail on these. This can be avoided by adding the new connectionstrings to the excluded connections so the heartbeat page will not return an error while Sitecore’s databases are online. Below is the value which can be used to get the heartbeat page to work in Sitecore 9.

<setting name=”Sitecore.Services.Heartbeat.ExcludeConnection” value=”LocalSqlServer| xconnect.collection| xconnect.collection.certificate| xdb.referencedata.client| xdb.referencedata.client.certificate| xdb.marketingautomation.reporting.client| xdb.marketingautomation.reporting.client.certificate| xdb.marketingautomation.operations.client| xdb.marketingautomation.operations.client.certificate|  EXM.CryptographicKey| EXM.AuthenticationKey| Session| sharedSession” />

Deploying Sitecore 9 in AWS RDS

Using RDS to host Sitecore databases can be a good option when you want to deploy Sitecore 9 in AWS. RDS is a database service so you do not need to setup and maintain VMs or SQL Server. However you might run into a few issues when trying to do so, which are related to contained database authentication.

Enabling contained database authentication

Sitecore 9 uses contained database authentication by default. This avoids needing to manage logins outside the database. However this is turned off by default in RDS and trying to enable it through SQL like below will throw an error saying you do not have permission to run the RECONFIGURE statement.

--this will not work in RDS
sp_configure 'contained database authentication', 1;
GO
RECONFIGURE;
GO

Instead you will have to go to the database instance’s parameter group and set enable contained database authentication, see screenshot below. The instance might need to be restarted for this change to take effect.

RDS enable contained database authentication

Fix errors with SIF

The Sitecore Installation Framework might throw some errors as well because some of the Sitecore web deploy packages (.scwdp) try to enable contained database authentication through the above SQL code. This can be fixed by:

  1. renaming the package to .zip
  2. unzipping
  3. remove SQL code
  4. zip again, make sure to keep original folder structure
  5. rename to .scwdp and deploy

How to fix “RedisConnectionException: No connection is available to service this operation” in Sitecore

Redis is a great choice for Sitecore’s shared session database. Sitecore has a good article which describes how to set this up, and links to this article to explain all options. I was running into issues when setting this up with my Azure Redis Cache which is using an access key. The “accessKey” attribute in the provider node was populated with the access key form the Azure portal. Initially i was seeing something like this in the log:

INFO  redisname.redis.cache.windows.net:6380,abortConnect=False
INFO
INFO  Connecting redisname.redis.cache.windows.net:6380/Interactive...
...
INFO  redisname.redis.cache.windows.net:6380 faulted: UnableToResolvePhysicalConnection on PING

Then the log would be full of errors like below:

ERROR GetItemFromSessionStore => StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: EVAL
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
   at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.c__DisplayClass12_0.b__0()
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryForScriptNotFound(Func`1 redisOperation)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation)
   at Sitecore.SessionProvider.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs)
   at Sitecore.SessionProvider.Redis.RedisConnectionWrapper.TryTakeWriteLockAndGetData(String sessionId, DateTime lockTime, Object& lockId, ISessionStateItemCollection& data, Int32& sessionTimeout)
   at Sitecore.SessionProvider.Redis.RedisSessionStateProvider.GetItemFromSessionStore(Boolean isWriteLockRequired, HttpContext context, String id, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions)

The only way I was able to get this to work was by not putting the access key in the provider but instead specifying it in the connection string (in ConnectionStrings.config):

<add name="sharedSession" connectionString="redisname.redis.cache.windows.net:6380,password=rediskey,ssl=True,abortConnect=False" />

Sitecore is now able to connect to Redis and all errors are gone from the log. Below lines from log file show the successful connect:

INFO  redisname.redis.cache.windows.net:6380,password=rediskey,ssl=True,abortConnect=False
INFO  Connecting redisname.redis.cache.windows.net:6380/Interactive...
....
INFO  Connect complete: redisname.redis.cache.windows.net:6380

Include files in TDS package that are not in Visual Studio solution

A Sitecore solution is often deployed by installing packages that are build by TDS. A lot of information can be found about different options to include files in a TDS package as long as the files are included in the Visual Studio solution. However there are many valid reasons to exclude files from a solution, for example CSS or JavaScript files which are build by gulp or webpack. In these scenarios it is often better to keep these files outside Visual Studio so developers update the source files and do not accidentally update the build artifacts.

TDS File Replacements

File replacements is an often overlooked feature of TDS but they are powerful for including files build by external tools which you do not want to include in Visual Studio or source control. Below screenshot shows how to include all files that are in the /dist/scripts/mysite folder. Notice that both the source and target location are relative paths. This ensures the source and target location always point to the correct path, even if the solution gets build in a different path, for example on the build server. File replacements