November 16 Update: With more issues reported with the latest Windows 10 update, this post on maintaining best practices while patching is as important as it was last summer.
If you are reading this, there is a good chance you have undergone some form of hardship from a recent Microsoft (MS) patch release. Or, maybe a not-so-recent one? In this post, I discuss how to ensure you can reduce the risk and impact of a patch’s unknown consequences.
The irony of a patch, designed to protect systems and the data on them, causing system failures isn’t lost on me. Any risk-conscious decision maker is undoubtedly asking whether they should delay patching because it might cause instability with an app or the OS itself. On the other hand, the patch was issued for a reason (specifically, to close security vulnerabilities), and MS recommends applying the patch within 30 days of its release. This begs the questions of how often to patch for security? Also, perhaps more importantly, how to meet the security objective while still maintaining your operation?
That balance of having the latest in vulnerabilities removed, without the risk of unpredictable BSODs, isn’t a unicorn. You can make some simple process changes to mitigate the chance of system failure while patching in a reasonable timeframe. These five steps also serve as a solid foundation for a solid Patch Management Strategy:
1) Stagger Your Rollout
Independent of how often you patch, patching all your devices at once is a huge risk. By spreading out your patching across different groups of endpoints, you can mitigate it. Note that doesn’t mean patch by business unit; the risks associated with “all at once” does generalize from company to team. Imagine all the Sales Team’s computers went down at the same time.
2) Have a Rollback Plan
Just as you would with your disaster recovery plan, have a recovery option to revert to, in case the patch introduces instabilities (including catastrophic ones like BSODs!). Your program should ensure you can uninstall a patch and return workstations and servers to previous (functional) versions if something happens. Having a plan will not only help your company reduce organizational risk on paper, but also enable you to recover more quickly if there is an issue.
3) Check an Advisory First
Before you patch your systems, check to see if there were issues for others. The whole point of security is to minimize the risk your organization faces – it would be (maybe it was?) totally ironic if, in your effort to eagerly prevent systems going down, you caused a bunch of systems to go down. It doesn’t mean doing a deep-dive – even a simple reference like this one can inform you whether others are experiencing issues, or if there are specific elements within your environment that are known to cause problems.
4) Enable & Encourage End-user Reporting
Having a way for users to report issues is table-stakes these days; even if you don’t have an official process for this, your users will find a way to let you know pretty quickly. There are three other parts of this tip though: communicating that systems are being patched, encouraging your users to report issues, and reminding them how to do that. Of course, you need to be prepared to respond and react to such reports. When you set the expectation that patching is happening, users have fewer barriers (like feeling embarrassed) to report an issue, so you’ll know if you should hold off, or revert.
5) Capture Diagnostic Data
You wouldn’t want your doctor to give a diagnosis without knowing the underlying cause of your symptoms; ensure that you have access to the information necessary to identify the problem. It is hard to diagnose a BSOD unless you have data post-crash. So, capture the error codes you are seeing; have your end-users take a photo of their screen. Make sure Kernel-level debugging is turned on (here’s how).
These simple steps should help you avoid Microsoft Patch Hell and keep your systems up and running. That should leave a lot more time for you to ponder what your next step is in improving your cybersecurity prevention posture, another hellish source of system downtime (or worse). Hint: our managed detection and response service is detailed right here.