Guest Expert
Simon Crosby
CTO
XenSource
Simon Crosby

Over the last two years, virtualization has skyrocketed into the collective conscious of enterprise IT professionals. And it would be hard to identify a vendor that's generated more interest or excitement than XenSource - the Palo Alto, CA-based vendor that "plays the dual role of leading the open source Xen community, while simultaneously selling value-added enterprise solutions based on Xen."

In October, Globus Consortium Journal readers learned why Xen hypervisors are currently the preferred virtual machines for Grid computing environments. This month, the GCJ had the opportunity to speak with XenSource CTO, Simon Crosby, who entertained various questions about the synergies / differences between Grid computing and virtualization.

GCJ: What's your take on the synergies between Grid and virtualization technologies?

Crosby: The general philosophy behind virtualization is an application agnostic approach and is focused on providing a virtual machine as the basic construct that can be placed on real data-center compute resources. Virtual machines run full OS and application stacks, and are therefore somewhat more heavyweight than the traditional Grid view of computational resources, which really amounts to the harvesting of spare CPU cycles on OS-identical machines. Our customers use virtualization to gain efficiencies in their overall operations, by consolidating multiple (potentially different) OS/app combinations on to a single server, saving hardware and management cost, increasing utilization, and realizing the flexibility of being able to instantly provision a new instance of a particular OS/app. They also benefit through greater robustness of their enterprise infrastructure - if a server fails, the VM can be instantly re-provisioned on another server. Grid technologies today, particularly in the financial services world we've been serving, tend to be based around a particular application. A Grid is used to harvest spare compute resources on many identical servers, by farming out instances of application logic according to a schedule. That is quite different from the basic driver of consolidation used in server virtualization.

Virtualization and Grid technologies clearly need to be married at some point. We've been involved in one pilot project with a large financial services organization (using the Globus Toolkit) that wants to integrate with Xen to utilize virtualization. The aim is to use virtual machines as the unit of resource allocation, rather than the current grid approach of spawning applications across the computational resources.

But the business of doing a great job of server virtualization is actually pretty fierce. That is, continuing to develop and improve a hypervisor so it's the world's fastest, most flexible and most powerful is indeed more than a startup's worth of effort. And whereas early on in XenSource, where we had ambitions to use the hypervisor as a way to solve problems related to the Grid, I no longer believe that that is easily accessible to us simply because it would require us to be experts in Grid technologies as well as experts in building hypervisors. As a consequence, integration with toolkits such as Globus is something we are very keen to encourage in the open source community. A couple of grid solution vendors are already working to integrate Xen.

GCJ: So can you tell us your perception of what the Globus Toolkit might bring to the picture that complements what you're trying to accomplish at XenSource?

I think what Globus can offer is all the ability to marshal resources across disparate compute resources and make that integrate with virtualization - essentially solving that link which is currently missing. Grid technologies are good at scheduling units of computation across the infrastructure. If instead of scheduling applications they schedule the placement of VMs, that solves a key need in optimal use of data center infrastructure. The critical need is optimally placing work on available compute resources and integrating that with the ability to dispatch a virtual machine or set of virtual machines to the ideal set of servers or clients. That problem is actually quite hard to solve.

When you deal with a broader set of enterprise applications, you'll find that there are dependencies which are extremely difficult to resolve in terms of solving the allocation problem. For example, a particular app may need access to a particular dataset that lives on a particular SAN on a particular LUN, and you've got to deal with the storage virtualization world to solve that problem. The Grid's I've seen in action don't deal with complex storage architectures - the data sets are local to the computation or are made available from a centralized server. But for a broader class of enterprise applications, the storage problem must be addressed. Many applications are now interlinked, for example a Web service today is composed of a Web server, an app server, and a database. And creating an instance of the service and placing it on the infrastructure requires the placement of all three running components. These problems need to be solved by leveraging Grid technologies and making them applicable to a broader set of enterprise applications.

The challenge for the Grid community as it integrates with virtualization, is to move away from being application-focused to being compute resource-focused and to start to schedule VMs on to available resources rather than to focus on farming out application logic on to available servers.

GCJ: This month we're looking at data management issues... how do Xen hypervisors approach enterprise storage?

Crosby: What's critical in the storage world is the ability to tie a LUN or some other form of virtualized storage to a guest, and as the guest moves around the data center, making sure that its storage requirements continue to be met. This requires integration of virtualized storage management and virtual machine management/control.

As far as databases are concerned, that's a higher level application layer issue. We really care that database applications run well in VMs on Xen and get very high performance. However, it is not within the scope of control of a hypervisor to provide transactional integrity semantics to databases. We simply do not have access to the application or database notion of transaction.

There are several specific storage challenges which have to be addressed that relate to the unit of provisioning in the virtual machine world. The ability to leverage the power of storage virtualization to get more performance and more features in server virtualization is critical to us. We can instantly clone a running VM and distribute it across multiple serves in the datacenter. The Grid world understands how to use such features because the Grid community understands the concept of quickly creating multiple copies of an application on multiple systems. The key difference between scheduling an application to harvest resources from the Grid, and scheduling VMs on to computational and storage resources is that applications in the Grid world today tend to be of finite life-span (for example, consuming spare CPU to run a simulation) and will terminate at some point. But VMs are instances of operating systems, that essentially run forever until they are "turned off". VMs are more fundamental, perhaps more elemental than applications.

GCJ: When you have a virtual machine running, depending on what's going on with the Grid or whatever the larger quanta of collection of machines is, there are a lot of complexities specifically with relocation, networking and storage. Is that something where you see the Globus Toolkit being an effective moderator and mediator to that?

Crosby: Our challenge is to ensure that the customer's storage and networking world works with server virtualization. There are numerous use cases that have to be dealt with, from local storage, to networked storage, and from file system level through databases. The Grids I've seen in production data centers don't solve storage problems per-se, but must be aware of constraints in terms of application placement, to ensure that applications will be able to get hold of key data. So the Grid technologies in use have already captured a key requirement - the notion of application specific constraints (affinities) relating to scheduling. Now that capability must be applied more broadly, and to a wider range of storage technologies, to enable its use with virtual machines.

Your question raises yet another difference between the Grid world thus far and the VM world. A virtual machine is something that never stops unless you decide to shut it down. In general, the Grids are used to harvest additional "spare" CPU for applications that can usefully use more CPU, such as running a simulation overnight on underutilized machines. The notion of the time period for which you wanted to harvest the resources is almost implicit, because the application being scheduled is certainly not going to run forever on the machines in the Grid.

When managing virtual machines, the challenge is to optimally place sets of running VMs based on their time-varying consumption of resources, but with the implicit assumption that running applications don't suddenly "end" and virtual machines disappear. Therefore VM scheduling is about optimal placement of virtual servers on physical servers to ensure, for each specific VM SLA, that all VMs are achieving their SLAs. It's an optimal packing problem with severe constraints.

Networking is actually a lot easier than storage. You just need to solve the problem of whether or not you hold your IP address constant as the VM moves. Whether, in effect, the hypervisor is doing that for its guests or whether an IP address remains with the guest once it moves from a particular physical server to another. VMs can migrate on layer 2 networks or on layer 3 networks. In the latter case, re-routing of packets must be done to ensure that traffic is not lost.

GCJ: The Globus Toolkit originated in e-Science and academia, and that's largely where the development community is at this point. Can you tell us about the characteristics of the Xen community and how XenSource works with them?

Crosby: The Xen community is composed of over 20 major enterprise system vendors and a few universities. The core community who has made changes to their code base comprises about 250 people. There are a number of very large vendors committed to Xen, such as IBM, Intel, HP, AMD, Dell, Sun, SuSE, Red Hat. So, it's very commercially driven.

GCJ: Talk about your paravirtualization model. Are you platform-agnostic?

Crosby: We're basically cross-platform from a processor architecture and operating system perspective, because with hardware virtualization we get to virtualize the entire legacy OS and Windows world. Any OS vendor can utilize Xen and deliver a paravirtualized high performance version of their software that would take advantage of Xen. Xen today runs on x86, x86_64, IA64 and Power 5 processor architectures, and Sun is rumored to be porting Xen to SPARC.

GCJ: Any upcoming milestones for XenSource that we should be aware of?

Crosby: We'll be working on getting the enterprise-grade product versions of Xen 3.0 to market. The user community of Xen 3.0 is pretty interesting. We're getting about 12,000 downloads a month. The requirement is essentially to make the Xen consumable, and integrate it into the data-center infrastructure. The other key milestones will be inclusion of Xen in Solaris 10, SLES 10, and RHEL 5. Obviously managing Xen is a key thing, and plugging Globus in at the appropriate point to serve that need is of great interest to us.

close window