Marc Lognoul's IT Infrastructure Blog

Cloudy with a Chance of On-Prem


Windows: High CPU Utilization due to Access-Based Enumeration

Good evening,

On Windows Server 2008-R2 File Servers with Access-Based Enumeration (ABE) enabled, you might notice abnormally high CPU-usage when many users are opening session or browsing through shared folder (and sub-folders) at the same time. Obviously this is caused by ABE enumerating folders the active user(s) are actually granted access to. This may become problematic when the underlying folder structure involves many sub-folders.

In an attempt to improve the situation (not a real performance fix), Microsoft has released a fix which enables more granular control on the number of folder levels ABE will take into account when processing a request. This fix and the details about the extra control it brings are available for download at the following location:

While installing the fix will require a reboot, you will also have to restart the LanmanServer service for the ABELevel parameter to be taken into account if you set it higher than 0 (default value if omitted). Note: you can actually set a value higher than 2.

As reminder, the article hereunder will help you keeping your Windows Server 2008-R2 File Server up-to-date with hot fixes:

Marc

Advertisements


Windows Server: Memory Pressure on Windows Server 2008 R2 File Server due to System Cache

Recent questions in TechNet Forums reminded me of an issue faced when building large file servers running on Windows Server 2008 R2. By large I mean serving a lot of files, from thousand to millions or more.

To improve performance, Windows Server makes intensive use of file system cache. With files located on an NTFS-formatted partition, this also means caching the additional information associated to the file or folder such as attributes, permissions and so on. Since with NTFS everything is a file, those information’s are stored under the form of metafiles, which are hidden to the user. For each file/folder, the matching metafile entry can have a memory footprint equivalent to at least 1K. Multiplied by the number of files cached, it starts counting on larger file servers. Thanks to Sysinternal’s RAMMap Utility, you can witness this behavior by looking at the line Metafile from the tab Use Counts:

RAMMap_Metafile

There is very little you can do to work around this issue except adding more RAM to the server. Since the amount of memory used depends on the size of files served and the number of files (Metadata), the amount of RAM needed can be relatively easily although roughly calculated.

While you can control the amount of memory used by the file system cache, you can’t prevent the metafiles from being cached.

Finally, a safe way not to get caught by surprise by this behavior once your file server is running in production is to benchmark it beforehand using the File Server Capacity Tool (FSCT).

[UPDATE] While File Servers are the most likely to be affected by this issue, web servers serving large amount of files or workstations used for large development projects might be too…

More Information


Windows Server: Multiple Names for a File and Print Server Running Windows Server 2008 R2

This post was inspired by this one by Jose Barreto  (MSFT).

When migrating a server to new hardware, newer Windows Server Version or restructuring your files and print service infrastructure, performing the switch with limiting user impact is a primary goal. This is most of times achieved this is re-using the server’s original name to make user point to the new one. This post discusses the 2 main directions for doing it with Windows Server 2008 R2 and their implications.

1) (Re)Configuring Name Resolution Mechanism(s)

This method is the simplest: all you have to do is (depending upon your infrastructure):

  • Adding a DNS CNAME Record that makes point your old server name to the A record of the new one
  • Optionally, if you use WINS too, you may have to add a static entry for the old server’s name pointing to the new server’s IP address

While this is pretty straightforward, the name change will not be taken into by the following underlying Windows mechanisms:

Authentication

Kerberos authentication is solely based on names (server names, service names and so on). Since the Kerberos name databases (Active directory in the Windows world) is not informed about the the fact that the old server’s names is now associated to the new server. On 99% of system, the problem will not reach the surface because Windows will silently fall back to NTLM. Anyway, if you want to maintain a correct Kerberos configuration, you will have to register the appropriate Service Principal Names (SPNs): using tools such as SETSPN or simply the ADUC Console (thanks to the attribute editor), the following SPNs must be added to the server’s computer account MYDOMNEWSERVER$:

  • CIFS/OLDSERVER
  • CIFS/OLDSERVER.mydom.local

While you may think fall back to NTLM is not a big issue, take into account that NTLM is set to disappear from Windows’ future: Seven/2008 R2 both come with security options preventing its use. It’s off by default though.

CIFS/SMB Strict Name Checking

The protocol responsible for File & Print Sharing (CIFS or SMB) includes a security mechanism that will, by default, refuse to serve requests if the target server name is the not server’s actual one. To disable this feature, you’ll have to modify the LanManServer service configuration, see http://support.microsoft.com/kb/281308 for the details

Finally, you have also be affected by human mistakes such as DNS or WINS admins regularly cleaning up records and considering the CNAME or static entries as stale… Only a good configuration management could prevent this

2) Configuring Alternate Names Using the NETDOM Command

NETDOM Command offers a “Swiss army knife” solution to all the issues above while keeping the security as high as it can be. Example:

NETDOM COMPUTERNAME NEWSERVERNAME /ADD OLDSERVERNAME.mydom.local

Then reboot the system. See reasons below.

This simple command will perform the following, all at once:

Configuring the local computer to register it’s new alternate names in DNS Server and/or WINS Server

This configuration is stored locally in the registry under HKLMSYSTEMCurrentControlSetservicesDnscacheParameters, with the entry “AlternateComputerNames”, not present by default. See http://technet.microsoft.com/en-us/library/dd197418(WS.10).aspx for details.

This entry seems to appear only after the system has rebooted.

Registering the necessary Service Principal Names in the computer’s AD account.

In this case, all services actually running on the server will be registered with the alternate name. There is therefore not room granularity as there is with manual registration!

Of course, performing this action implies that the user account executing the command is granted the AD Validated Right on Service Principal Names, which is the case for a domain admin, for example. See http://technet.microsoft.com/en-us/library/cc728117(WS.10).aspx#BKMK_ValidatedWrites for details.

Reconfigure LanManServer service to support its additional name(s)

NOT by disabling Strict Name Checking but by making use of a new feature names “Optional Names”. See it’s registry entry “OptionalNames” right under “HKLMSYSTEMCurrentControlSetservicesLanmanServerParameters” key, not present by default.

This entry seems to appear only after the system has rebooted.

About Scalability

Up to my knowledge, the limit to the number of alternate names is registry’s MULTI_SZ data type data and AD’s ServicePrincipalName attribute (1024 entries per computer object AFAIK), which leaves plenty of room for multiple consolidation or migrations.

Retrieving the Current Configuration

To list all alternate names, simply use this command:

NETDOM COMPUTERNAME NEWSERVERNAME /ENUM

Note: the command will lists the computer’s primary name as well.

Compatibility

The NETDOM method also works if the computer runs Windows Seven

While It also works if the server is a domain controller, it will not if you want to perform this operation on a cluster resource group. In this case, you will have to use the cluster specific method (cluster admin console, configuration of resources, SPN and DNS registration…)

Finally, since Printer Sharing uses the same protocol, it works on print servers too.

Conclusions

Although the NETDOM methods has 2 disadvantages: 1) it seems to require a reboot 2) It registers SPNs for all hosted services, which may sometimes be too much and appear to be a waste at AD attribute level, it clearly wins over the manual name registration and its subsequent manual reconfigurations

Marc


Leave a comment

Windows Server 2008 R2: CIFS/SMB 2.x in Details

While Windows Server 2008 and Vista introduces the version 2.0 of the Server Message Block protocol (aka File and Printer Sharing in humanly readable words), Windows Server 2008-R2 and Seven both bring a refreshed version, the 2.1. Instead of drilling down right-here into the protocol’s details, I found more useful to post links to the most interesting resources on the Web.

Protocol Specification and Details

Tuning and Optimization

Support and Troubleshooting

Extra Goodies: Multi-threaded Robocopy and GUI

Although not directly related to SMB 2.1, Robocopy was also updated with the recent version of Windows. The main improvement is the support for multi-threaded operations, particularly interesting when massive file copy operations must take place against small files over WAN connections. I will cover this in details in a coming post. See http://technet.microsoft.com/en-us/magazine/dd542631.aspx for details. Important to note that this option is NOT compatible with the inter-packet gap option (copy using throttling).

Not so new, for those reluctant to learn and exploit all Robocopy command-lines params, there are cool GUI’s available out there, RichCopy being, like its names says it all, the richest one.

Marc


Leave a comment

Windows Server 2008 R2 and Windows 7: System Services Configuration Details

I am usually reluctant to mirror other blogger’s contents or just post links but in this case I decided to make an exceptional exception. “Black Viper” (sounds like a gamer tag isn’t it?) published a comprehensive guide over Windows Server 2009 R2 and windows Seven System Service configuration covering all OS flavors. Have a look at it prior starting any optimization or hardening work!

Previous Windows versions are also available btw.

From Hervé Schauer Consulting you might also find this (blast from the past, read “up to XP/Server 2003”) set of information useful:

Marc


Leave a comment

Windows Server: Comprehensive Guidance Implementing an End-User Data Centralization Solution with Windows Server 2008 R2

Microsoft has finally releases a truly great document late October: Implementing an End-User Data Centralization Solution. This documents covers in a very comprehensive while practical manner everything related to storing, protecting and and making accessible user data using Windows File Sharing.

Unlike many white papers out there, this one can be directly used for planning, building, implementing and operating with no room for improvisation. It comes with real-life metrics, group policy settings as well as scripts and tools.

Important to mention that it is also 2008 R2/Seven-ready and covers the use of FSCT.

Download

Marc


Windows Server: Tools for Windows Server 2008 R2 You can’t afford to miss: FSCT (Overview)

Since I do not like one-liner posts, I recently started writing an (ambitious?) serie of posts covering enterprise file services based on Windows Server 2008 R2 based on a presentation I gave to certain customers of mine . The first “shot” is dedicated to a new swiss-army knife-like tool from Microsoft: FSCT, standing for File Server Capacity Tool.

Until now, validating a Windows file server setup has always been a difficult task since very few tools were available on the market to adequately simulate a realistic user load. In past, tools like NetBench were considered as references. Nowadays, if you’re lucky, you can rely on your own scripting toolkit, if you’re not, you may have use Intel’s NAS Performance Toolkit, which is not bad, but farm from being “enterprise” ready. I’ve even seen some people trying to benchmark file services using SQLIO…

Architecturally speaking, FSCT is similar to other load-tests tools you would use of web application, for example: it is made of a controller, a server (the one to be benchmarked) and one or multiple clients. Optionally, you can also include an AD domain controller in the picture in order to simulate AD-based authentication. Nevertheless, FSCT is also compatible with workgroup environments, but in a degraded manner.

Note: Combining roles is a possibility but as you expect, it may negatively affect the tests. So if you’re short on machines, combine wisely and keep the other roles off the “server” role.

On the other hand, you can conduct test campaigns from more that one client simultaneously, that’s where the architectural choice is paying: up to my knowledge, no other tool can do that today.

Plan and deploy your test environment carefully…

  • First, take time to read the whole paper included in the package carefully. Everything you need to know about the tool is in there
  • Practise before conducting the “real” tests: since the tool is command-line based and due to the way it is distributed among systems, you may not get the result you expect from your first try (it’s not point-and-click)
  • Make sure all components involved are healthy: server & clients (network configuration, drivers…, but also network components (switches and if applicable, routers or access points…). A single component improperly working may severly affect the result of the tests (argh, hard-coded duplexing/link speed)
  • Copy FSCT to all systems involved (unless the server is not Windows-based) and build your own batch files to speed-up the configuration, the execution and finally the cleanup
  • Unless you want to reach the limits of an HP Proliant DL 58x, do not bump the client + user count to the maximum, plan then realistically. An in any case, it is not advisable to conduct your test in production, especially, for the network in general as well as for the sanity of the AD you would populate users in…

Don’t be afraid of command-line based execution

Okay, there is no UI but who cares? Me? Yes I’ve made my own little WinForm apps to save me time (I’ll post it in the coming weeks) but frankly, once your config files and batches are ready (it take 2 hours max), command-line rules supreme over the mouse;)

And before you ask, no, there is no PowerShell support, it sounds a little old-fashion I admit

Plan multiple test scenario’s keeping in mind important factors such as:

  • CIFS/SMB Version: depending on the client and server/Configuration OS version, the usage of SMB2 will greatly improve performances under virtually any circumstances. If you plan to use pre-Vista client OS or maybe have a mix of them, take this into account in your scenario’s
  • SMB-related security settings like signing and so on also affect performances
  • Other security configuration like TCP/IP stack hardening or IPSec
  • The presence of a file-based Anti-virus: it is wise to test with and without. You might be surprised by the performance loss an A-V implies, particularly on heavily used servers of course. BTW, since most of (if not all) A-V are implemented as file system filter driver, do not simply disable it during the tests, uninstall it, for certainty
  • Take into account the other side-activities, particularly server-side like: backups (using shadow copies or third party solutions), monitoring or other background processing tasks that may affect tests (reporting and so on…)
  • So called “performance-boost” tweaks like cache manager, NTFS tweaks, disk alignment, cluster sizes… All-in all, they may greatly affect the results. BTW, I will dedicate another post to those tweaks and debunk some myths at the same time as well

What do you get from the tests?

Besides generating the load itself, FSCT, assuming you’re working in a standard setup, will provide detailed tests results containing the following useful information’s retrieved from the server and client(s):

Data collected from the following performance counters:

  • Processor(_Total)% Processor Time
  • PhysicalDisk(_Total)Disk Write Bytes/sec
  • PhysicalDisk(_Total)Disk Read Bytes/sec
  • MemoryAvailable Mbytes
  • Processor(_Total)% Privileged Time
  • Processor(_Total)% User Time
  • SystemContext Switches/sec
  • SystemSystem Calls/sec
  • PhysicalDisk(_Total)Avg. Disk Queue Length
  • TCPv4Segments Retransmitted/sec
  • PhysicalDisk(_Total)Avg. Disk Bytes/Read
  • PhysicalDisk(_Total)Avg. Disk Bytes/Write
  • PhysicalDisk(_Total)Disk Reads/sec
  • PhysicalDisk(_Total)Disk Writes/sec
  • PhysicalDisk(_Total)Avg. Disk sec/Read
  • PhysicalDisk(_Total)Avg. Disk sec/Write

As well as the following metrics (correlated to the number of users simulated):

  • % Overload
  • Throughput
  • # Errors
  • % Errors
  • Duration in ms

Once the % Overload is higher than 0%, it will indicate the threshold above which your file server infrastructure does not scale anymore for the given number of users

Special Cases

Using FSCT against DFS-N

Can FSCT work against DFS-N: yes it does. But it will not allow you to stress-test the DFS part of your design since it has no knowledge of it and does no embark any technology to capture DFS’ behavior during the load test. Moreover, it may require to configure the “server” part as if it was a non-Microsoft file server (see below for details). Moreover, capturing performance counters on the server using FSCT itself might be an issue, the workaround being the good old “manual” capture using perform or logman.

Using FSCT against a failover cluster

Using FSCT against a failover cluster works perfectly but with one limitation identical as above: the tool will ne be able to collect performance counters directly. Instead, you will have to plan for manual capture on the node designated as owner of the file share resource, or on both if you wish to perform failovers during the tests.

Using FSCT against non-Microsoft File Server or NAS

Assuming you can leave with the same limitations as stated above, FSCT will work like a charm against non-MS file server, including SOHO devices. Depending on the server or device’ capabilities, you might be able to collect a reduced set of performance indicators using SNMP polling for example. Of course, FSCT itself does not include any SNMP but there are plenty of too
ls available and during the test I lead Cacti was very helpful.

Ready, Go?

Well, not 100% ready yet. The only workload scenario available at RTM release time being “Home Folders”, you might not be able to validate your setup realistically. But according to MS, an SDK is on the way in order to allow the creation of custom workloads. In the mean time, you can already start playing with the tool itself and with the customization of “HomeFolder” profile config file but you will not go far with that.

Additional Resources

In a coming post I will cover practical usage of FSCT.

Marc