No one has had a greater impact on the evolution of Linux clustering than Donald Becker, CTO of Penguin Computing, and pioneer of the Beowulf Project -- the seminal project that put clustering on the map in the HPC community. This month the Globus Consortium Journal asked Becker to enlighten readers with a quick history lesson on Linux clustering.
GCJ: Tell us a little bit about the genesis of the Beowulf Project, and how Linux clustering has evolved since those early days.
Becker: Beowulf started out as a way for people to use collections of commodity, off-the-shelf Linux machines for high-performance computing ... as an alternative to using purpose-built, specialized machines.
The real key to doing that is providing a software layer that hides, as much as possible, the "ugliness" of machines not designed for high-performance computing. So it's about a software system and a methodology to put together machines that can be used effectively in high-performance computing.
From the beginning of the Beowulf project in 1994, we targeted Linux as the platform for the software we were deploying. At the time Linux had a very small presence in high-performance computing and the market in general. I like to think Beowulf had a strong influence with Linux becoming popular for doing clustering for the purpose of high-performance computing. This eventually led to Linux being a popular platform for doing all sorts of things in the HPC realm.
GCJ: So what were some of the specific challenges for clustering commodity Linux boxes for HPC?
Becker: In the early days, the challenge was as simple as getting the machines to talk to each other - so my background on the Linux side was contributing to the networking side of the Linux kernel. Getting the machines to communicate meant figuring out a lot of the high-throughput, low-latency communication requirements for clusters. From there, the focus turned to communication libraries and managing large sets of machines. One of the things about scientists is they'll put up with quite a bit in terms of complex systems. The rest of the world, however, wanted to put complexity in the background -- to minimize the complexity -- because they're much more focused on their own specific applications.
GCJ: What are some of the unique management issues in Linux cluster environments?
Becker: Our focus on the cluster management side is consistency. We're trying to make it look like a single system from the point of view of the end user and the administrator. We want to guarantee that a process being run remotely will return the same results as one left running locally, even in the face of library updates, application updates, user setup updates.
To accomplish this you must administratively control what's installed on the machine. Inside of a cluster, we focus on dynamically provisioning machines, making certain that we control every detail of how the machines are installed. So we go the whole way down to loading kernels and managing device drivers and up to the level of making sure the right versions of libraries are there.
One of the things we do within a cluster is we try to guarantee consistency -- that an application you run on a remote machine will run exactly the same as it does on a local machine. It's a lot easier to guarantee consistency within a cluster than it is over a Grid. When you have local-area high-bandwidth communication and have administrative control over the machines, you have a lot more opportunities for consistency. The challenges are the same for a Grid, but we were able to pick an easier set of problems to solve within a local cluster.
GCJ: What sorts of applications are better suited for a cluster than a Grid?
Becker: One of the reasons to select a cluster instead of a Grid is to run applications that require low-latency communication, and you're pretty much constrained to do that on local machines. You can get very low-latency interconnects for clusters. That's difficult to accomplish with a Grid. So there are some application characteristics that preclude them being run effectively over a Grid unless you're doing it at the coarsest grain level.
One class of applications, "spectral methods," requires all-to-all communication with each time step. And the length of the time steps often are determined by the latency of that communication, so many of these applications really only effectively run on machines that are local to each other.
Now, on the other side of that, a surprising amount of computation today is parametric execution. I think a decade ago, people didn't really expect this. Today you have machines powerful enough to run pretty big simulations. What you need is to run hundreds of thousands of similar simulations, but with different input parameters. For workloads like that, wide schedulers - wide-area scheduling systems in Grids - are very effective. Of course these are also very easy tasks for clusters to do.
GCJ: Is Penguin Computing doing anything with Xen hypervisors or virtual machines from VMware ... and are you seeing any momentum for scripting languages, the "P" languages in the LAMP stack for instance?
Becker: I gave a talk just yesterday at the Beowulf users group here in the DC area on how we're using virtualization and what our future plans are for that. Virtualization is a really interesting opportunity. It allows you to completely control what you load on a remote machine. It's something that might enable machines on the Grid to guarantee consistency in remote execution. Today most machines running on Grids aren't running virtual machines where you can specify which kernel to run for which specific application.
With respect to the question about the interpretive languages, one of the reasons why you would run an interpreter is so you can handle heterogeneous hardware, heterogeneous operating systems. If instead you put the level of interpretation at the level of emulating an X86 architecture you don't have to worry about all the library versions and the kernel versions. You just pass over the environment you would like to run in. If you're executing on a remote machine, you would be able to request a specific kernel version with specific features loaded.
The challenge here is executing something remotely and knowing that the results returned are what you expect. And there are several different ways to do it. Carefully specifying the protocol is one way. Another is going down to what kind of execution environment it is. If you have Perl or PHP, you've only partially specified the execution environment. Anyone who has updated their Perl interpreter has probably discovered that many applications no longer work -- just saying you have Perl available doesn't mean that a specific application will run. And the same is true of many of the other interpretive languages, Java included.
GCJ: Speaking of Java, are you finding anyone shying away from Java because it's too difficult to integrate into a particular system?
Becker: Java was very appealing back when there were fewer system implementations and thus fewer variations. Most people who are doing high-performance computing don't write their applications in Java. It's amusing that well over half of the serious applications are still written in FORTRAN. That was the case ten years ago and five years ago, and it might be the case five years from now. We did see an upswing in interest in Java, but most of the people writing Java are implementing low computational requirement Web services.
GCJ: Tell us a little bit about the typical Penguin Computing customer engagement.
Becker: We have a few different kinds of customers. The first is the traditional HPC customer. They generally already have their applications, typically written in FORTRAN, frequently using MPI or a similar library. So the application and the language are already set and there is usually a site-wide scheduler in place. All they need are execution platforms. Something that I would have expected by now is that HPC users would be using machines interactively; however, most sites still use scheduler submissions. To these users, the Grid and the site scheduler are synonymous. They still want consistent execution and they know what kind of machines their applications are likely to be running on, but from the users' point of view, they're dealing primarily with submitting jobs into a site-wide scheduler that deals with where the job executes.
The second kind of customer we often deal with is one that's setting up new servers, generally to talk to the outside world. These customers are often focused on the security of their system -- they're looking for configurations that can be locked down to run a single application, yet be easily updated. So that's a category of customer that we characterize more as "hosting services." For this scenario, general purpose Grids are not ideal. I really see this as an area in which application-specific provisioning -- which can be done with Penguin's cluster architecture -- can really outshine Grids. You can expect to see Penguin Computing doing things in this area in the coming months.
close window |
|