GCJ: What is GridWay?
Llorente: GridWay is a meta-scheduler for Globus grids. A meta-scheduler is a manager or supervisor of local resource managers, which control the use of individual resources such as clusters, computing farms, servers or supercomputers. The meta-scheduler is therefore a key component of a computational grid as it is responsible for optimizing the use of grid resources. In our case, the local resource managers are interfaced through the Globus toolkit services. In fact, GridWay does not require the installation or deployment of new services in the remote resources, apart from Globus services, which is a welcome advance for system and grid managers.
GridWay scheduling instances are used as building blocks of several types of grid scheduling infrastructures. A common Grid scheduling scenario is to install a single scheduling instance that provides a single entry point to the whole enterprise infrastructure. Such site-level meta-scheduling, also known as Enterprise Grid, allows managers to apply centralized usage policies and access to global reporting and accounting. Partner Grids are built as an extension of Enterprise Grids, there is one scheduling instance in each organization, and all scheduling instances compete with each other for the available resources.
GCJ: What is the history of the GridWay project?
Llorente: The GridWay project started in September 2002. The first releases of the meta-scheduler were developed for research purposes in adaptive and dynamic scheduling and were only distributed on request in binary format. The first open source version, GridWay 4.0, and the project website were released in January 2005. The code is currently distributed under Apache license, version 2.0. Last release, GridWay 5.0, is the result of the knowledge and experience gained through years of research and development and the feedback from our user community.
The GridWay project is being developed by the Distributed Systems Architecture Group from Universidad Complutense de Madrid. The project has been mainly funded by Spanish research grants. We have just started two European projects, EGEE-II and BEinGrid, where GridWay will be used for application porting and middleware integration.
GCJ: What are the goals of the project?
Llorente: GridWay is a Research and Development effort that seeks to advance the technology for meta-scheduling on grid environments. The research aspect of the project can not be separated from the technological development. Last releases of the GridWay products, mainly the GridWay Meta-scheduler, are used to perform research and the results are incorporated as technological innovations into next releases. The products are transferred to the industry and scientific environments, being compatible according to developed technological standards and complementary to the technologies under development by the rest of the grid community.
The goal of the GridWay Metascheduler is to enable reliable and efficient large-scale sharing of computing resources managed by different local resource managers within a single organization or scattered across several administrative domains. In order to reduce the gap between Grid middleware and users, the meta-scheduling technology should give end users, application developers and managers of Globus infrastructures functionality similar to that found on local resource managers. That means that an end user should be able to use the Grid in the same way as he is able to use a local computing cluster.
GCJ: What are the benefits of the project?
Llorente: GridWay on top of Globus provides decoupling between applications and the underlying local management systems, thanks to the orchestrated integration of non-interoperable independent computational platforms, also known as vertical silos. That establishes a uniform and flexible infrastructure that achieves greater utilization of underlying resources and higher application throughput.
From the system manager perspective, GridWay supports the existing platforms and resource managers, allocating grid resources according to management specified policies, determining trends in usage and monitoring user behavior. From the end-user perspective, a friendly command line interface and a standard application programming interface allow to submit, control and monitor high throughput computing applications and abstract workflows, which may require file transferring and/or database access.
GCJ: What are the major features of GridWay?
Llorente: Related to scheduling, GridWay incorporates advanced scheduling capabilities, fault detection & recovery capabilities, accounting facilities and support for array jobs, job dependencies and new scheduling policies. Related to user interface, GridWay provides full support for C and JAVA DRMAA (Distributed Resource Management API) GGF standard for the development of distributed applications and a command line interface similar to that found in local resource managers. Finally, and in relation to installation, its deployment is straightforward as it does not require new services apart from those provided by the Globus Toolkit and its modular design allows an easy incorporation of new grid services and so interoperability between different grid infrastructures (Globus WS, Globus pre-WS and EGEE).
GCJ: Is this project used in the enterprise? How?
Yes, companies include integration and consulting companies, and end users in industries such as technology, finance, aerospace, and life sciences. We obtain that information from download statistics but unfortunately we do not know how, and at what level, GridWay is finally used. You know, companies do not usually provide details about their in-house deployments. From the support mailing lists, we assume that GridWay is mainly used for the deployment of Enterprise Grids. That is to integrate computing clusters managed by different resource managers.
GCJ: How does GridWay integrate with other applications in the enterprise?
Llorente: Integration of existing applications with GridWay is straightforward as the meta-scheduler provides full support for DRMAA. DRMAA is a set of standard API developed by Global Grid Forum for application builders, portal builders and ISV's that constitutes a homogenous interface to different resource managers to handle job submission, monitoring and control, and retrieval of finished job status. An application running with DRMAA will be also compatible with Sun N1 Grid Engine, Condor, PBSpro and other products that are adopting DRMAA. GridWay is the only grid meta-scheduler that complies with the DRMAA standard. Moreover, users of computing clusters find very easy to migrate to grid environments managed by GridWay as the command line interface is quite similar.
On the other hand, the modular design of GridWay allows its integration, by developing new drivers, with the resource management, file management and information services available in a given infrastructure. Moreover, it could be extended or used as a building block for more complex architectures.
GCJ: Are there use cases/case studies that you could talk to?
Llorente: As example of Enterprise Grid, we could mention the campus grid deployed by Universidade do Porto and Sun Microsystems, whose aim is to offer high performance computing facilities to the academic and research community. They are using the Globus Toolkit and the GridWay meta-scheduler to interconnect several clusters in the university campus and are currently working on the integration of this computational environment in a pan-European grid infra-structure for e-Science. Let us take into account that GridWay is also completely functional on other grid middlewares, such as EGEE, even allowing the simultaneously use of Globus and EGEE services. The infrastructure under development by the European Space Astronomy Center for scientific data processing is another good example of Enterprise Grid based on Globus and GridWay. That is also a very interesting case because the infrastructure consists of both pre-WS and WS Globus services. As GridWay is able to simultaneously use both services, it allows a gradual migration from pre-WS to WS, and even, the long-term coexistence of both.
GridWay is also used as meta-scheduler in Partner Grids, for example in the Spanish Research Grid Infrastructure, the Croatian Grid Infrastructure, the Computational Chemistry Virtual Organization, the Sun Solution Center World Grid or the EGEE grid infrastructure as application porting tool for fusion applications. We are also collaborating with one of the world's largest telecoms companies to develop a pilot Outsourced Grid, based on Globus and GridWay, to provide on-demand computing power.
GCJ: How many organizations use GridWay today? Is there a projection for future adoption/how will you get organizations to adopt?
Llorente: The information about our user community is mainly obtained from three sources: download statistics, user support and references in research papers, tutorials and presentations from research sites and consulting companies. Since January 2005, release date of its first open-source version, the GridWay Metascheduler has been downloaded more than 500 times in 54 different countries; 25% are private companies and 75% are universities and research centres. Unfortunately, most of users do not fill in the optional fields of the download form and, moreover, they do not provide the email address of their home institution. As a result, we can not thoroughly assess the number of real users and their level of usage. We are currently implementing a process to improve the gathering of information about organizations that are using GridWay.
As University research group our aim is to perform applied research and transfer the results to the industry and the scientific community. That is our way to create value, by developing leading-edge technology that incorporates our intellectual contribution to meet the demands of enterprises and research centres in components for grid meta-scheduling and utility computing. We do not have the commercial and consulting staff to reach all the end customers that could be interested in GridWay, and of course that is not the aim of an academic institution. We see ourselves as a middleware provider that could reach business users with the help of consulting and integration companies.
We think that standard adoption and a healthy open source community are crucial for the adoption of grid technology. We have always tried to incorporate into the technology the latest standards, requirements and best practices. The adoption of standards generates an environment in which final users and technology providers can undertake investments with greater confidence, speeding up adoption of technology innovations and cutting down technology costs. In our particular case, we participate in several GGF working and research groups, mainly in the compute area. On the other hand, open-source communities provide long-term stability and support to the project development and a greater variety of technical visions, solutions, and features. In this sense, our participation in the Globus Incubation Process is a great opportunity for benefiting from Globus governance model and infrastructure. We are grateful to the Incubator Management Project, in particular to its current chair Jennifer Schopf, and to our Mentor, Lisa Childers, for their invaluable support.
GCJ: As GridWay has been deployed, are there any issues or problems that have been discovered? If so, how are they handled?
Llorente: Probably, one of the most challenging problems that we found in large-scale partner grid infrastructures is the high fault rate and the rapidly changing resource availability and load. We have to consider that typically the grid meta-scheduler itself has no direct control over resources The core of the metascheduler has been redesigned several times to take into account those conditions, allowing now dynamic adaptation of job scheduling and execution, and fault detection & recovery.
Another important issue is the perception that enterprises have about grid computing. While research institutions are more interested in Partner Grids that provide access to a higher computing performance to satisfy peak demands and support to face collaborative projects; enterprises understand grid computing as a way to address the changing service needs in an organization. They are interested in in-house resource sharing, to achieve a better return from their information technology investment, supplemented by outsourced resources, to satisfy peak or unusual demands. The Outsourced Grid provides pay-per-use computational power when Enterprise Grid resources are overloaded. Such hierarchical grid organization may be extended recursively to federate a higher number of Partner or Outsourced Grid infrastructures with consumer/provider relationships. We are working on the new integration components that, together with Globus and GridWay, will allow this utility model for computing services.
GCJ: What should we look for in the next few months/year? What is the roadmap?
Llorente: Well, despite thinking GridWay is mature enough for production use and meets most of the demands of our user community; there is a long road ahead. Besides improving the reliability, scalability and performance of the technology and extending its functionality, we will focus on incorporating the enhancements of future Globus Toolkit releases. We are also working on the integration of GridWay with other Globus projects, such as Virtual Workspaces to allow the execution of jobs into predefined execution environments. Finally, as I said before, we are developing new integration components for the deployment of utility computing solutions. GridWay is now a community project, welcoming code and support contributions from individuals and corporations around the world.
Editors Note:
Be sure to see the GridWay project presented as part of Topics in Grid Management at GlobusWORLD / GridWorld this month.
close window |
|