One of the promises of cloud computing, particularly IaaS, is that the providers operate at a scale that affords certain benefits that hard to justify and implement for all but the largest private datacenter operations. For example, most cloud datacenters have physical security and power redundancy that is likely cost prohibitive for most companies whose primary business is not running a datacenter. Meltdown and Spectre highlighted some additional benefits of operating in the cloud, and also some potential downsides.
First, since manage servers is the very business of cloud providers and they tend to have very large numbers of physical servers, it seems that most cloud companies were able to gain early insight into the issues and perform early testing of patches.
Second, because cloud providers do have so many systems to manage and the name of the game is efficiency, cloud providers tend to be highly automated, and so most of the major cloud providers were able patch their estates either well before the disclosure, or shortly after the disclosure. That’s a good thing, as many companies continue to struggle with obtaining firmware fixes from their hardware vendors nearly two weeks later. Of course, to fully address the vulnerability, cloud customers also have to apply operating system patches to their virtual server instances.
There are some downsides, however.
First, Meltdown provided an apparent possibility for a guest in one virtual machine to read the memory of a different virtual machine running on the same physical server. This is a threat that doesn’t exist on private servers, or is much less concerning for private cloud. This vulnerability existed for many years and we may never know it is was actually used for this purpose, however once it was disclosed, cloud providers (generally) applied fixes before any known exploitation was seen in the wild.
Second, another benefit of cloud from the consumer perspective is “buying only what you need”. In the case of dedicated servers, we traditionally size the server to accommodate the maximum amount of load it would need to handle while providing the required performance. Cloud, though, gave us the ability to add capacity on demand, and because on a clock cycle-by-clock cycle basis, cloud is more expensive than a physical server in the long run, we tend to only buy the capacity we need at the time. After cloud providers applied the Meltdown patches, at least some kinds of workloads saw a dramatic increase in required compute capacity to maintain the same level of performance. One of the big downsides to cloud therefore, seems to the risk of a sudden change in the operating environment that results in higher cloud service costs. As problematic as that might be, firing an API to increase the execution cap or add CPUs to a cloud server is logistically much simpler than private physical servers experiencing the same performance hit and needing to be replaced, which requires the arduous process of obtaining approval for a new server, placing the order, waiting, racking, cabling, set up, and so on.
Man, that second point is something I hadn’t considered. That’s something that’ll need to be discussed in a lot of businesses over the coming months.