|
Globus GRAM
One of the most fundamental requirements in Grid computing environments is the ability for applications to negotiate the usage of underlying compute resources -- for each application to have the best available resources (CPU, storage, networking, etc.) to ensure an appropriate level of performance. Even at the genesis of the Globus Project, it was clear to the Globus Toolkit (GT) community that Grid environments must include some general job startup mechanism, and a way to introspect the conditions of the Grid to facilitate application adaptation.
In Globus Toolkit Grid environments, the Globus Resource Allocation Manager (GRAM) was developed to facilitate processing requests for remote application execution and managing active jobs. GRAM became the protocol interaction between the administrators of each application and the administrators of the available resources.
GRAM is not a scheduler itself -- but a standardized, front end interface to different existing scheduler components, such as PBS (Portable Batch System) and Platform's LSF (Load Sharing Facility). Left to their own devices, Grid applications would try to maximize resource usage -- perhaps to the detriment of other applications' resource requirements. Grid schedulers play the role of enforcing global policies to mediate between applications, and GRAM supports the communication that must take place between different schedulers and the applications they support.
"You might think of GRAM as RPC (remote procedure call) on steroids," said Karl Czajkowski, Software Architect at Univa, and a key contributor to the Globus Toolkit since 1997. "The application requirements are sent to a specific scheduler for consideration and possible execution, as is appropriate for the target scheduler's status and operating policy."
According to Czajkowski, over the last ten years GRAM has undergone "evolutionary improvements" that mirror the new development styles and requirements of the Grid users it supports. In the early days, GRAM developers were primarily focused on writing C-based APIs that Grid developers could use directly in programs. But then users wanted to create, for example, a Java binding, or some other kind of client application that could talk to GRAM services. So the GRAM team started putting more effort into documenting protocols so that users could do that without having to use the default client side implementation. Similarly, the requirements for file-staging have changed from simplistic scenarios with a few files to very large numbers of potentially large files, and so the file-staging capabilities have evolved from ad-hoc and simple mechanisms to the current use of GridFTP and Reliable File Transfer (RFT) services. While the architecture requirements have not been fundamentally new, GRAM development efforts in recent years have focused on applying the same set of patterns to support various different sets of technologies, as demanded by users, and the requirements of the community at large.
In the new era of SOAs and the "Web service-ification" of GT through the Release of GT4.0, the engineering efforts to keep GRAM current have intensified.
"Ironically, this whole shift to Web services has made the problem harder, because it takes a lot more engineering work to try to make it fast enough than when you're just writing your own ad hoc C-protocol processing," said Czajkowski. "Tools used by the typical developer today are based on the Web services stack making it a lot easier for someone to throw together a quick application or client. So it's much easier on their end, but we have to put a lot more effort into the service end to make it efficient, because the messages are much larger and more verbose now that they're in XML. Messages are more complex than when we just had our own compact (but proprietary) format and the processing is lot more robust, but also more "CPU cycle intensive". So we've had to do a lot of work that, in some sense, we could have avoided if we just stuck with our own custom protocols and custom protocol processing but that wouldn't have supported the changing user requirements."
Does GRAM's functionality overlap with that of other enterprise open source systems management and monitoring tools? According to Czajkowski, there is a flood of open source tools that are emulating the commercially available vendor system consoles, but they have a very impoverished notion of job management and job submission. These tools are really more about turnkey deployment, helping systems administrators set up servers and providing a GUI front end to what was always a manual decision-making process involving the administrator. But GRAM is really something very different, because it's a front end to a service that's user-facing, not administrator-facing.
While Czajkowski does not believe that there is an overlap between GRAM and competing open source systems management projects, he does see a greater social challenge at the architectural level before technologies like GRAM break into mainstream enterprise production use.
"These systems have been known for a long time in the high-performance computing field, but what's been hard is getting people to start looking at their own applications as jobs," said Czajkowski. "They're so used to looking at an application as a statically deployed server component on statically allocated hardware, so it's hard for them to think about the fact that, no, really, it's just like all these other jobs -- and to accept that they can serve it dynamically in a batch system and let some smarter scheduler decide when to run it or how many resources to allocate to it, relative to the other applications that also have priority in the enterprise."
What will be the true measure of GRAM's success? According to Czajkowski, the Grid community will have succeeded when the GRAM technology is no longer needed, and schedulers adopt the Web services protocols and implement interoperable protocols themselves. "In some sense, the GRAM software stack is a transient artifact in production systems," said Czajkowski. "It's there to provide an interoperability bridge until such time as the schedulers support them natively. However, GRAM also serves a purpose as a proving ground for new job-management facilities. In this capacity, GRAM will always play a role in augmenting existing production schedulers with the additional features needed by a particular user community."
close window |
|