Marc Lognoul's IT Infrastructure Blog

Cloudy with a Chance of On-Prem


SharePoint 2013: AppFabric Caching service crashed. Lease with external store expired

SharePoint 2013

Description of the Problem

You experience poor performances when browsing SharePoint 2013-based sites or when consuming User Profile Service? Take a look at the SharePoint server’s event viewer, they might be full of the error hereunder:

AppFabric Caching service crashed.{Lease with external store expired:
Microsoft.Fabric.Federation.ExternalRingStateStoreException: Lease already expired
at Microsoft.Fabric.Data.ExternalStoreAuthority.UpdateNode(NodeInfo nodeInfo, TimeSpan timeout)
at Microsoft.Fabric.Federation.SiteNode.PerformExternalRingStateStoreOperations(Boolean& canFormRing, Boolean isInsert, Boolean isJoining)}

Similarly to many distributed services and applications, AppFabric (alone or packaged inside SharePoint 2013) heavily depends on perfect time synchronization between servers, therefore, a discrepancy of seconds to a minute may lead to AppFabric service repeated crashes.

Possible Causes

To name but a few possible causes of time discrepancies between Windows hosts:

  • Incorrect Windows Time Configuration on domain Controllers and/or Member Servers
  • Network Connectivity issues between member servers and their authenticating domain controllers
  • External mechanisms interfering with Windows Time such as VMWare, Hyper-V, OS deployment solutions…

Solution (or at least, some guidance)

  1. Fix the external causes preventing correct time sync (Network issues, 3rd party software…). Be particularly careful with virtual machines
  2. Make sure the Domain Controller holding the PDC Emulator role is configured to acquired its time from an authoritative source AND members server are configured to use the Windows domain hierarchy (NOT an authoritative time source). This configuration suits 99% of the implementations.
  3. Once 1 and 2 are fixed, run the command w32tm /resync /rediscover or restart the Windows service Windows Time (w32time). In both case this will force an time sync
  4. Make sure time is also valid on the SQL Server used for SharePoint because some sored procedure requires time accuracy to work properly as welll

Other SharePoint-Related Service Impacted by Windows Time

  • All timer jobs but in particular the one responsible for refreshing configuration. improper time sync may lead to stale timer job cache
  • Same applies to timer jobs responsible for (un)deploying solutions (WSP) in a multi-server farm. Out-of-sync servers may prevent proper WSP handling
  • Incorrect time may also prevent SharePoint Alerts from being sent
  • Customer timer jobs might not be started at the correct time
  • And finally, if you use Kerberos, the “clock skew” issue is one of the most common, preventing pre-authentication

Goodies

Here’s a PowerShell snippet to query every SharePoint server in a farm in order to retrieve their local time and time zone.

$SPServerInFarm = Get-SPServer | Where { $_.Role -eq "Application" }

Foreach ($Server in $SPServerInFarm)

{

    $DateTime = (Get-WmiObject -ComputerName $Server.Name -Query "select LocalDateTime from Win32_OperatingSystem").LocalDateTime

    $DateTimeFormatted = ([wmi]”).ConvertToDateTime($DateTime).tostring("MM/dd/yyyy HH:mm:ss")

    $TimeZoneOffset = (Get-WmiObject -ComputerName $Server.Name -Query "select CurrentTimeZone from Win32_OperatingSystem").CurrentTimeZone.ToString()

   Write-Host $Server.Name $DateTimeFormatted $TimeZoneOffset

}

Note: since servers are note queried exactly at the same time, there might by a small time difference. This is obviously harmless in this case.

Note2: While a difference in the time zone will not influence AppFabric, it is not recommended to operate a SharePoint farm on servers operating across different time zones.

Additional Information’s

Conclusions

SharePoint requires a healthy underlying Windows to run smoothly. Keep your Windows Server and AD in good shape and if you’re not in charge of them, make sure you colleague in charge of them does the job right.

Happy caching!

Marc

Advertisements


Leave a comment

SharePoint: PAL from the Field

Introduction

Performance Analysis of Logs (PAL) is a free tool designed to analyze Windows Perfmon-based logs against predefined thresholds. The thresholds are defined in configuration files usually mapped to an MS technology (.Net, IIS…) or product (SQL Server, SharePoint. It produces reports in HTML or XML formats, the first one also including eye-candy charts.

In a nutshell, PAL almost completely removes the hassle of reading and interpreting performance logs.

However, making sense of PAL reports in real life may also require time for experimenting and unfortunately, very few guidance can be found on the web. Therefore I wanted to close the gap a little.

This post assumes you are minimally familiar with PAL. If this was not the case, there are many other blogs detailing the installing and the usage basics. The CodePlex project also includes useful introduction:

What to Expect from PAL

PAL is the perfect tool to be used when you investigate mostly infrastructure-related performance problems impacting Microsoft product and technologies.

It helps translating Perfmon logs into humanly readable reports with added value brought by charts, recommended thresholds and generic guidance. A report is roughly made of 2 sections: chronologically ordered alerts and statistical figures enhanced with their matching charts.

In my opinion, PAL is not designed to help you trending or building up your capacity planning in the long run. for this purpose, product such as SCOM should be preferred. Likewise, PAL should not be used as a performance monitoring tool. Finally, PAL will not help drilling down into the code and will not cover end-to-end performance monitoring or troubleshooting. For this purpose, a real APM or tracing tool should be preferred.

Prerequisites

Make sure your performance counters are healthy, I can’t remember the number of times I had to fix broken counter before anything else could take place:

Practice a little with Perfmon capture and PAL in a test environment. It seems obvious but many organizations I worked for were directly in their production environment with a full counter set, a high capture frequency and this during abnormally long periods. This leads to loss of time for generating reports and lots of frustrations and confusions since the reports contains too many information’s to actually be helpful.

Decide if you will generate PAL report on a computer dedicated to this purpose or if you prefer to do it on the monitored server during off-peak hours. Keep in mind that while capturing counter has very little to no effect on performance, performing PAL analysis is extremely CPU and disk I/O intensive.

Although PAL does it for you, make sure you understand what each counter really means and what it means in your own environment.  Avg. Disk Queue Length/Current Disk Queue Length being a good example of misleading/misinterpreted counter.

Correctly identify your environment: what are the processes running (at least, the ones making sense), what are the physical/logical disks and their purpose, what are the memory sizing (physical and virtual) and of course the CPU characteristics.

In Perfmon/Perflogs, preferably identify processes by their PID instead of their instance ID. This is particularly useful with SharePoint and IIS where you can have multiple IIS Worker Processes (W3WP.exe)running, even in the most basic implementations

While some SharePoint counter will directly refer to SharePoint applications, others won’t. Therefore, it is always useful to have scripts at hand doing the job for you.

On Server 2003/IIS6 using a command-prompt:

cd %windir%System32
cscript.exe iisapp.vbs

From Server 2008/IIS7using a command-prompt:

cd %windir%System32inetsrv
appcmd list wp

Using PowerShell:

gwmi win32_process -filter “name=’w3wp.exe'”|Select ProcessId, CommandLine

Be watchful with process ID’s: they may evolve during the time of the capture since when a process crashes, a new one with its own ID is usually restarted. The same happens to a worker process if it recycles.

Take also time to benchmark PAL:

  • Estimate the storage used by captures
  • Estimate the time take for PAL to produce reports
  • Estimate the storage used by PAL report

While a 2-hours capture using the default SharePoint 2010 will generate from 30 to 50 MB of BLG file and take about 10 minutes for processing, things will start counting in larger amount.

Some counters (like the ones related to processes and SharePoint’s publishing cache) can boost the size and time to generate reports because they are multiplied by the number of running processes or existing Site Collections

And finally, download and install PAL on the computer you selected for this purpose. Remember, PAL will only be used to generate reports, not capture and reading reports. Therefore there is no strict requirement to install it on every SharePoint server.

Planning Performance Captures

To ease you life, generate the Perfmon configuration files directly from PAL: Start PAL, go to the tab Threshold File then select the Threshold file corresponding to the work load and finally click on the button Export to Perform Template File.

Select the format according to the operating system version captures will be taken from. LOGMAN format is the best choice if your goal is superior automation of the capture process.

Carefully plan the capture period. Usually, warm-up of ASP.Net/SharePoint application generate a lot of noise not really relevant to you performance troubleshooting, therefore, it is preferable to start capturing when your application is already in cruise mode. Unless of course if the performance problem occur at compilation time. The same applies to crawl performance troubleshooting: preferably start capturing when the crawl is effectively started, not when it is starting.

Keep the sampling interval between 5 and 15 seconds. While less than 5 does not help because it tends to make things look worse than what they actually are (very short CPU peak or intensive disk I/O…), more than 15 may make the capture inaccurate because some missing numbers. In most cases, 15 seconds will do fine.

Keep the format to binary (BLG): although not humanly readable, It’s way more compact and directly usable by PAL. Note: some tools can convert Perfmon logs whenever needed, I will discuss that at later time.

Finally, and if you run a multi-server farm (remote SQL for example), decide if you prefer to put capture from various servers into the same log file or if you which to use separate logs. Remember that in most cases, the footprint of Perform is usually negligible. if you chose for per-server capture, make sure you sufficient in control to run them simultaneously.

Happy performance troubleshooting!

Marc


Windows Server 2012: Performance Tuning Guidelines

Windows Server 2012

You might have missed the release of the Performance Tuning Guidelines document recently updated for Windows Server 2012

As reminder, the following guides are still available:

Not to mention their perfect companions, which might require an update as well:

Marc


Windows Server: Memory Pressure on Windows Server 2008 R2 File Server due to System Cache

Recent questions in TechNet Forums reminded me of an issue faced when building large file servers running on Windows Server 2008 R2. By large I mean serving a lot of files, from thousand to millions or more.

To improve performance, Windows Server makes intensive use of file system cache. With files located on an NTFS-formatted partition, this also means caching the additional information associated to the file or folder such as attributes, permissions and so on. Since with NTFS everything is a file, those information’s are stored under the form of metafiles, which are hidden to the user. For each file/folder, the matching metafile entry can have a memory footprint equivalent to at least 1K. Multiplied by the number of files cached, it starts counting on larger file servers. Thanks to Sysinternal’s RAMMap Utility, you can witness this behavior by looking at the line Metafile from the tab Use Counts:

RAMMap_Metafile

There is very little you can do to work around this issue except adding more RAM to the server. Since the amount of memory used depends on the size of files served and the number of files (Metadata), the amount of RAM needed can be relatively easily although roughly calculated.

While you can control the amount of memory used by the file system cache, you can’t prevent the metafiles from being cached.

Finally, a safe way not to get caught by surprise by this behavior once your file server is running in production is to benchmark it beforehand using the File Server Capacity Tool (FSCT).

[UPDATE] While File Servers are the most likely to be affected by this issue, web servers serving large amount of files or workstations used for large development projects might be too…

More Information


Leave a comment

Storage: Whitepaper over NTFS CHKDSK Best Practices and Performance

Windows Server 2008 R2

Microsoft has recently published this very useful white paper bringing best practices and guidance when sizing volumes and CHKDSK execution times: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=35A658CB-5DC7-4C46-B54C-8F3089AC097A

A few facts that caught my attentions:

  • The performance gain from Server 2008 to Server 2008 R2 is huge (I can’t imagine compared to Server 2003)
  • Available memory also greatly improves performance
  • Turning off 8.36 name generation has a positive though limited impact on performance
  • Server 2008 R2 shows more linearity in performance than Server 2008
  • Running CHKDSK in read-only mode is very helpful in predicting the duration of a downtime

Definitely a must ready for Windows Admins and IT Pro’s in charge of designing file services!

Marc


Leave a comment

SharePoint: SharePoint Designer, IIS Compression and Eugene Tooms

You all (should) know that since 1st April, SharePoint Designer is free. Since that, there also was a lot of buzz around it as well as interesting blog posts like the one from Mark Rackley SharePoint Designer – A definite Maybe, echoed by SharePoint Mogul Joel Oleson. While both were very interesting readings, one statement from Mark caught my attention: Using SharePoint Design would break IIS Compression. I don’t totally agree with this statement but the truth is (not only out there but) more complex than it seems.

Before diving directly into the bits and bytes, some background information over how HTTP compression works.

The effective HTTP compression by a web server is the combination of two thing: the client that sends a request specifying that it support compression and the server, receiving that requests, which is configured to support and therefore return as compressed response. That compressed response is conditioned by the type of compression supported by the client and the server (there are currently two: Deflate or Gzip) and the server’s configuration: compress all file extension, or only some of them. There are also, on IIS, two kind of file extensions: the static (simple files placed on the file system like a css, and image, an html page) or dynamic (dynamically generated content like ASP, ASP.Net pages, ASMX…). Additionally, IIS also includes more granular settings like the minimum file size for a file to be compressed (there is no gain in compressing very small files).

The important point here is: if the client does not explicitly says it supports compression, the server will never generated a compressed response, even if it was configured to do so!

While all modern browser do include the magic header informing the server they support compression, other client types (I mean other usual SharePoint client types), do not.

As a comparison, here is a request from a browser:

And here is one from the WebDav Mini Redirector (aka Explorer view in SharePoint aka Web Folders aka Web Client):

The conclusion is: most of time, only standard browsers do support compression. Clients like Windows WebDAV client, Office Application (using FPRPC, an MS proprietary extension of WebDAV), Colligo Reader and even the SharePoint Indexing process (aka Crawler) never include the necessary header in their requests.

The extra complexity brought by SharePoint: is the resource on the file system or in the content database?

With SharePoint, some resources requested by clients may be physically stored in two different locations: the file system (usually under the 12 Hive) and the content database(s). When they reside in the content databases, SharePoint uses its internal API to fetch them and present them to the client. In this case, the questions are 1) are those files compressible or not 2) how are they considered: as static or dynamic files (even if their extension is the one from a static file, like html for example).

Subsequently, a very good remark from MVP Eric Schupps on Mark Rackley’s blog was: when you customize a page using SharePoint Designer, it is no longer stored store in the file system but in the CDB instead and this makes it impossible to compress.

I am a bit surprised by this statement because for me, compression is an IIS function, not a SharePoint function. Therefore, as long as IIS received something to compress, it does not care and does the job. The truth (which is still out there) is different: none of use was right, the reality is more complex than that!

Getting the resources stored in content database compressed

For extensions considered as “dynamic”, it is pretty simple: configure them as usual in IIS and it will work

For extension considered as “static”, there is more fun. Let take the example of a pdf file stored in a CDB.

Out of the box, it will not get compressed. If you configure it as a “static” file, it won’t compress either.

How to get it compressed then? First you need to configure it as a “dynamic” extension (yeah, sound strange I know), in both compression providers (Deflate and GZip).

Then,you need to enabled Blob Caching for you web application where the content is stored. To do this, edit the matching Web.config as it is below

Finally, perform an IISRESET and attempt to doanload the PDF file. The PDF will get blob-cached then compressed

Configuration Summary Table for IIS Compression to Function

Content Type Content Location Configure IIS to Compress Static Ext. Configure IIS to Compress Dynamic Ext. Configure ASP.Net Blob Cache
Static (html, css, doc, pdf…) File System Yes No No
Dynamic (aspx…) File system No Yes No
Static (html, css, doc, pdf…) Content Database No Yes Yes
Dynamic (aspx…) Content Database No Yes No

Note: there is not use in activation compression for native MS Office 2007 format (docx, xlsx, pptx etc…) since they are already natively compressed, See Introducing the Office (2007) Open XML File Formats.

Conclusions

Content edited with SharePoint Designer can be compressed as long as the appropriate configuration is in place, regardless of their location.

Note: All the tests above were made on a standard WSS3 running on IIS6, it may behave differently on IIS7.

But who (the F or H, depending on your level of politeness) is Eugene Tooms then?

Eugene Victor Tooms is a fictional character appearing in two episodes of the TV series “The X-Files”. Tooms is able to squeeze and elongate his body in ways that are impossible for a normal human. This makes him similar to IIS in some ways 🙂

And Cut!