Xen Virtualization and Cloud Computing #03: Key Features of Xen
The previous articles in this series introduced virtualization and showed how Xen is designed to provide it efficiently. Here we’ll delve into some interesting features and their importance. A larger list can be found at the appropriate project page on features. At time of writing this article, the most recent version of Xen Project is 4.13.
The Meltdown and Spectre processor vulnerabilities, which exploit complex performance-enhancing features of modern microprocessors, have presented formidable challenges to the developers of operating systems and application. Meltdown and Spectre were officially discovered in January 2018. This section describes two enhancements to Xen to mitigate against these difficult vulnerabilities.
Meltdown, which affects Intel x86, IBM Power, and some ARM microprocessors, allows a malicious process to read data from any address that is mapped to the current process's memory space. Effectively, the process can read all memory without permission. The malicious process accomplishes this by finding a timing flaw in the execution of several processor features (such as the cache and pipeline) that are individually secure. At the time of disclosure, this vulnerability affected many products, with impacts on an enormous number of servers and cloud providers. Companies began writing patches to block the Meltdown vulnerability, causing performance losses between 5 and 30 percent.
On March 15, 2018, Intel reported that it will redesign its CPU processors to help protect against Meltdown and Spectre. On October 8, 2018, Intel added firmware to its latest processors to mitigate against these attacks.
Hypervisor changes to mitigate against Meltdown and Spectre
The Xen hypervisor, like other products, was affected by these vulnerabilities, specifically:
- “Rogue Data Load” (aka SP3, “Variant 3”, Meltdown, CVE-2017-5754)
- “Branch Target Injection” (aka SP2, “Variant 2”, Spectre CVE-2017-5715)
- “Bounds-check bypass” (aka SP1, “Variant 1”, Spectre CVE-2017-5753)
There is no way to completely prevent risks from these vulnerabilities, but adding execution boundaries and other checks to code can partially plug the holes. Thus, we talk about “mitigating against” the vulnerabilities.
The initial focus of the Xen Project was on fixes for Meltdown, then Spectre Variant 2, and finally Spectre Variant 1. SP1 and SP2 affect Intel and AMD processors, but ARM processors vary by model and manufacturer. SP3 affects only Intel processors. To mitigate against Meltdown, the Xen Project published three solutions with the names Vixen, Comet, and PTI. Unfortunately, the fix to mitigate against SP1 requires microcode updates from Intel and AMD. Currently, therefore, there is no mitigation for SP1. But its attack surface can be reduced through technology contributed to the Xen Project by Citrix. It works by branch hardening.
- SP2 can be mitigated by a combination of microcode, compiler, and hypervisor changes.
- SP3 can be mitigated by page-table isolation (PTI).
For more up-to-date information about these vulnerabilities and the Xen Project’s responses, see our Advisory 254.
This technology, contributed by SuSE Linux, helps to contain the negative effects of a Meltdown or Spectre breach. Normally, every virtual CPU could be scheduled on any physical CPU, and could move between physical CPUs for efficient scheduling. This increased the risk that information could be leaked from one VM to another, just as travel between cities allows an infection to spread faster. The only way to completely mitigate against this vulnerability is to disable hyper-threading, which would cause tremendous performance hits.
The core scheduling feature allows Xen to group virtual CPUs and schedule them on a limited set of physical cores. With this technology, users can keep hyperthreading enabled. Initial benchmarks have shown lost performance for many workloads. SUSE and Citrix are working on the feature, and in upcoming releases we hope to see better trade-offs between security and performance.
Hypervisor-based Memory Introspection (HVMI)
This is technology donated by Bitdefender to the Xen project on July 30, 2020 to protect against malware in the operating systems that run on Xen. HVMI has a key advantage over malware detection systems on guest operating systems: while smart malware can take over a whole guest and disable detection or prevention mechanisms on the guest, the malware has no way to reach into the underlying hypervisor.
Malware has become extremely dangerous and hard to fight for several reasons:
- It can enter the system whenever a single unaware user on the system visits an infected web site or opens a file received from a trusted person.
- It can exploit operating system vulnerabilities to gain superuser privileges and take over the whole system. Very few operating systems divide privileges in order to limit malware to one area.
- It has gotten sophisticated enough to hide its files or other traces from administrators, and to disable measures designed the thwart it.
A remarkable story showing the power of malware concerns an attack known as Carbanak, which infected more than 100 banks in thirty nations and did $1 billion worth of damage globally. In late 2013, an investigation of a bank in Kiev revealed that stealth malware injected by Carbanak monitored the internal systems of the bank for several months successfully covering its tracks. The malware recorded every employee’s activity and sent back videos and images to the intruder without drawing any attention..
The Bitdefender name is familiar to all IT staff. It is a leading global cybersecurity company, protecting over 500 million systems worldwide. Bitdefender and Citrix collaborated on Citrix Hypervisor. As we know, the hypervisor isolates VMs from each other and provide clean, low-level information about the memory used by each virtual machine. The result of this collaboration is a new security layer that can see everything happening in your infrastructure, but which Malware cannot reach. Bitdefender’s Hypervisor Introspection (HVI) technology detects suspicious activities by working directly with raw memory. At this level, malware can’t hide.
Bitdefender HVI assumes that your systems are not clean, and you can command it to inject cleaning tools into the live virtual machines. The HVI already detects and blocks the most famous attacks, including Carbanak, Turla, APT28, NetTraveler, and Wild Neutron, without knowing the vulnerabilities used by the attackers.
When Bitdefender decided to release HVI to Xen as open source, they called it Hypervisor-based Memory Introspection (HVMI). The HVMI technology understands and applying security logic to memory events within running Linux and Windows VMs. It examines the memory in real time for signs of memory-based attack techniques that used to exploit known and unknown vulnerabilities.
Along with this, Bitdefender open sourced its ”thin” hypervisor technology, known as Napoca, and donated it to the Xen Project. The Napoca hypervisor was used in developing HVI technology. A distinctive feature of Napoca is that it virtualizes CPU and memory, not all hardware, and therefore allows hypervisor introspection on machines that don't run a full hypervisor.
These features reduce the burden of managing hypervisors.
Late uCode loading
Microcode, often shorted to “uCode” (where the “u” stands for the Greek letter mu), is chip manufacturer firmware, The uCode typically contains mitigations for HW vulnerabilities and is typically updated during system initialization or kernel boot. The update formerly required a reboot and a long down-time. Xen Project 4.13 lets the Xen Hypervisor deploy a uCode update without any reboot. This feature was contributed by Intel.
This is a mechanism for replacing small sections of code in a running hypervisor, so that you don’t have to shut down the hypervisor and terminate all the VMs running on it. The feature is generally used to deploy critical security fixes.
Live-patching has been around for a while in several Xen-based products, and was included as a tech preview feature since Xen 4.7. Now it is a supported feature on the x86 architecture. The patching does need all activity to be paused, but this pause time should be small. Amazon is working to improve this feature further. We plan to extend it to other architectures besides x86.
Recent improvements to live-patching include the capability to patch inline assembly code, improvements to stacked modules, support for module parameters, additional hooks and replicable apply/revert actions, extended python bindings for automation, and additional validation of live patches.
Live-patching is not the final goal for live updates, because it is limited to small, localized code changes. The Xen Project team is also working on a broader live update feature. When it’s finished, an administrator will be able to upgrade a Xen hypervisor and its tools to a new version without stopping and relaunching the guests.
Embedded and safety-critical application features
These features support particular settings that need to run the hypervisor and VMs in unusual ways.
TrustZone is a security feature of ARM processors, allowing privileged users to run a process is memory shut off from access by other processes. Because there is only one trusted zone on each chip, sharing it among multiple VMs is difficult. Therefore, Xen did not originally offer TrustZone access to guest VMs. Thanks to a feature contributed by EPAM, starting with Xen 4.13, all guests can concurrently run applications on Arm TrustZone without conflicts. More work need to be done on this feature, though.
Renesas R-CAR IPMMU-VMSA driver
Automobiles rely increasingly on software. Their multiple, concurrent software processes call for virtualization in order to protect the high-stakes security required in automobiles. Thus, many automotive systems use Xen hypervisors. Access to GPUs is valuable for the virtual processes, in order to achieve the real-time performance needed when the car is in motion, but this requires access to ARM's Virtual Memory System Architecture (VMSA). Renesas has added this VMSA support to its ARM-based chips in Xen 4.13, and a driver contributed to the Xen Project by EPAM makes that access available to automobiles’ computing systems.
Dom0-less passthrough and ImageBuilder
An earlier article in this series described the central role of the privileged domain, Dom0, in Xen. Because the presence of Dom0 adds significant time (measurable in seconds) to the loading of each VM, some embedded system developers have asked for a Dom0-less architecture. Many embedded systems need to have several VMs up and running in less than a second after the user boots the system. The code to implement a Dom0-less architecture was contributed by Xilinx in 2018. The feature does not yet work with Paravirtualization, but works with other forms of Xen virtualization.
Because there is no privileged process and no userspace tools in a Dom0-less Xen, systems using it must load guests using U-Boot, an open-source boot loader. The guest images must contain all the required binaries, such as operating system kernels and ramdisks. Thus, a new tool named ImageBuilder, whose code is on GitLab, is provided to automate the building of Dom0-less configurations for U-Boot.
Figure 4 shows a Dom0-less architecture.
The next component of this series examines the interesting relationship between Xen and some other forms of virtualization, notably containers.