The Xen Project has published a security advisory that could affect millions of virtualized servers running in Amazon’s cloud and other public hosting services. A flaw in the Xen hypervisor could allow a malicious fully virtualized server to read data about other virtualized systems running on the same physical hardware or the hypervisor hosting the virtual machine. The malicious system could also potentially crash the server hosting the virtual machines. A patch, which was privately disclosed last week under embargo, has been issued to correct the issue.

Xen is used by a number of public and private cloud providers to support infrastructure-as-a-service (IaaS) offerings such as Amazon’s Elastic Compute Cloud, Rackspace, and some configurations of the OpenStack cloud provisioning environment. The flaw, discovered by Jan Beulich at SUSE, affects servers configured to support hardware-assisted virtualization (HVM) mode virtualization. HVM lets operating systems use hardware extensions that give them faster access to the physical server’s hardware, and it uses software emulation of other Intel platform hardware to allow those operating systems to run without modification. Windows virtual machines running on Xen require HVM support.

The bug, introduced in versions of Xen after version 4.1, is in HVM code that emulates Intel’s x2APIC interrupt controller. While the emulator restricts the ability of a virtual machine to write to memory reserved specifically for its own emulated controller, a program running within a virtual machine could use the x2APIC interface to read information stored outside of that space. If someone were to provision an inadvertently buggy or intentionally malicious virtual machine on a server using HVM, Beulich found that VM could use the interface to look at the physical memory on the physical machine hosting the VM reserved for other virtual machines or for the virtualization server software itself. In other words, an “evil” virtual machine could essentially read over the shoulder of other virtual machines running on the same server, bypassing security.

HVM isn’t the only virtualization mode supported on the Xen hypervisor. Xen can use “paravirtualization” (PV), a virtualization scheme that requires less hardware emulation, to create virtual instances of Linux and FreeBSD (as well as Oracle’s now-closed OpenSolaris). PV-based systems aren’t affected.

Update, 10/1 11:45 AM: A Rackspace spokesperson sent Ars the following statement regarding the vulnerability:

Rackspace was forced to reboot some of our Cloud Servers (Standard, Performance 1 & Performance 2) and Cloud Big Data/Hadoop customers’ to patch the security vulnerability affecting certain versions of Xen. This past weekend, our Fanatical Support specialists worked with customers and partners to remediate the vulnerability and complete the server maintenance. Now that the vulnerability embargo has lifted, we are able to report that, as of now, we have learned of no data compromises among Rackspace customers.

In addition, Rackspace CEO and president Taylor Rhodes apologized to customers in a blog post this morning. “We decided the lesser evil was to proceed immediately, at which time we notified you, and our partners in the Xen community, of the need for an urgent server reboot,” Rhodes wrote. “Even then, to avoid alerting cyber criminals, we didn’t mention Xen as the reason for the reboot. Another major cloud provider did attribute its reboot to security problems with Xen, which put all users of the affected versions of that hypervisor at heightened risk. But we’re relieved to report that, as of now, we’ve learned of no data compromise among Rackspace customers. Now that the vulnerability has been fully remediated, the Xen community has lifted its embargo on talking about it.”

As Ars previously reported, Amazon conducted a large-scale reboot of systems last week to correct the bug while it was under embargo. This morning, AWS Chief Evangelist Jeff Barr wrote a blog post explaining the reboots:

This Xen Security Advisory was embargoed until a few minutes ago; we were obligated to keep all information about the issue confidential until it was published. The Xen community (in which we are active participants) has designed a two-stage disclosure process that operates as follows:

  • Early disclosure to select organizations (a list maintained and regularly evaluated by the Xen Security Team based on a set of public criteria established by the Xen Project community) with a limited time to make accommodations and apply updates before it becomes widely known.
  • Full disclosure to everyone on the public disclosure date.

Because our customers’ security is our top priority and because the issue was potentially harmful to our customers, we needed to take fast action to protect them. For the reasons mentioned above, we couldn’t be as expansive as we’d have liked on why we had to take such fast action. The zone by zone reboots were completed as planned and we worked very closely with our customers to ensure that the reboots went smoothly for them.

Barr said that the reboot of AWS systems affected less than 10 percent of EC2 servers.