By now you have likely heard about Zombieload, the latest in a series of vulnerabilities impacting the Intel Core and Xeon processors that power your endpoints. Of course, it’s not the vulnerability itself (appearing in proc’s made after 2011) that impacts your computing power – it’s the ‘fix’ that Intel has issued to remove the risk of potential ‘cache voyeurs’ observing data recently pushed through the speculative execution side-channel – otherwise known as Microarchitectural Data Sampling (MDS). The reason you should be concerned, or even outraged, is that by some estimates the “fix” could sap as much as 40% of the processing power.
The reason ‘fix’ is in inverted commas, is that Intel seems to have disabled predictive compute, rather than addressing the issue at its core (no pun intended). If the concern is that threat actors can exploit a rule, that is supposed to keep data within the side-channel from being exposed, I call for a better rule, rather than simply disabling this critical feature.
In the context of having already lost so much compute to Foreshadow (10% - 30%), Spectre/Meltdown (another 30%), another 40% hit from ZombieLoad means I’m looking at performance that pales by comparison to what I was sold. This should be a huge blow to Intel, yet their shares closed 1% higher on the day of the news! Time will tell if that persists… Perhaps it's because the solution will be to buy new servers, either with cores that aren't vulnerable to predictive execution side-channel exploits, or simply 40% more power to offset the performance hit. This appears to be how some big cloud infrastructure providers are mitigating the impact to their business end-users – by dumping more iron behind them.
My real concern is for the small to medium sized organizations who can't readily afford to "just rip and replace most of their data center". Fortunately, if you're an IaaS shop, you won't see that cost (at least, not yet). But you can bet that cloud providers aren’t just going to take a 40% productivity hit, without it impacting their prices. Where does this leave the decision makers in the organization? The leadership in business, IT, and operations, who need to be sensitive to these risks, without losing that huge amount of compute (read PRODUCTIVITY), without the associated costs.
The patch is, for now, requisite – you need to protect your business. But what next? Buying more Intel and crossing your fingers? Hoping no similar vulnerabilities are unearthed after? This is the fourth one in as many years! Moving to the cloud, which comes with its own costs, of time, money, and potential integration issues. Ultimately, given the near monopolistic prevalence of Intel processors, at least the playing field is somewhat even – this is impacting most businesses.
Credit to the researchers who uncovered this - and ultimately, this brings us to the point of my post here. Research and testing like this must be a requisite part of the QA process for Intel, and any other provider of critical hardware components. And, I’m sure that there’s an arguably rigorous process that components undergo before they are brought to market… but when they miss something, there needs to be a recourse for their customers. Just like in the automotive sector, when QA fails to identify a critical issue, there needs to be a recall. If the brake pads don’t function correctly, I bring my car back and the issue gets fixed – and if they can’t fix the component, or can’t do it more cheaply than replacing the vehicle, the consumer get a new car.
Intel needs to do more, and it shouldn’t be customers eating the consequences while the vendor profits. If speculative compute didn't work without being vulnerable, the rule that allows data needs to change. Is it that it can’t be fixed? Or, just not fixed easily? Or is it that it’s more profitable to remediate the vulnerability without addressing the performance impact, leaving the onus on the customer to find a new solution.