OGSA-DAI Opening Doors for Data and Storage on the Grid
Historically, Grids have tended to deal with "blob data" (unstructured data) very well, but had trouble with relational data. In this transcript of an earlier Globus Consortium Journal podcast, the University of Edinburgh's Neil P. Chue Hong explains how the OGSA-DAI project is making an increasing scope of enterprise-class storage and data systems available to participate in Grid environments.
GCJ: Start off by telling us a little bit about the OGSA-DAI project and how it addresses storage issues for Grid environments.
Chue Hong: In the UK, it was identified that there was this niche when it came to data, and particularly in terms of structured data resources. Six or so years ago, there was a lot of work going on around computational Grids, but there was relatively little work being done on data Grids, except for storage of files. So what we saw was that there was a huge diversity of different data resources, relational databases, people starting to use XML collections, etc. -- and there were still a lot of structured files just sitting around from various projects.
We also saw that there were increasingly large amounts of data storage available. So as hard disk prices came down, and as tape storage became easier, people were just storing more and more data. It became apparent to us that there needed to be some way of managing this data, and particularly to make it useful on the Grid.
So there was a database task force set up in the UK, and OGSA-DAI ("Data Access and Integration Services") was born out of the desire to produce Grid components that allowed you to access and integrate various data resources.
Today, OGSA-DAI provides a framework that handles all the basics - so connecting to the databases, how you get back information about the databases, metadata and so on. And then it allows you to write simple services much in the same way as Photoshop allows people to produce third party plug-ins which can then be made available to other groups, OGSA-DAI tries to promote the idea of different projects producing plug-ins for particular specialist data sources, or a particular specialist pieces of functionality.
GCJ: One of the criticisms of the Grid is that it doesn't accommodate structured data very well. How does OGSA-DAI address relational databases on the Grid?
Chue Hong: OGSA-DAI treats everything as if it were a data resource. So it's treated as something that you can query and which has a structure. So in particular, it provides a uniform access point to relational databases. What we're trying to do in terms of this is make it easy for you to access relational databases, be they local or remote or DB2 from IBM or Oracle or Microsoft SQL server, and essentially just provide them as relational data resources.
GCJ: Presumably this would make Grid a lot more appealing to a broader range of commercial users?
Chue Hong: Right. Particularly, we've found the financial industry is interested in seeing how this access to relational databases can tie up with a lot of the simulations that they have running on, say, Windows platforms.
GCJ: Where else are you guys seeing traction?
Chue Hong: We've seen a number of groups from the science side pick up OGSA-DAI in things such as weather prediction. We've done some commercial projects with transportation companies. We're looking at the applicability to geographical data as well. And so it's really wherever you have existing data sets that you're looking to try and tie, quite often in interdisciplinary ways, to other data sets.
In the UK, there's also a lot of interest from the medical community, the biological community. For example, we are working with the caBIG consortium of cancer sites in the U.S. to define a data management architecture. Really, they're looking to produce a data architecture that will enable them to take the data from the individual cancer centers across the U.S. and effectively federate it in some way. So they need to be able to use the data all across all the centers. They're trying to do is produce common schema - being able to map everything to a common structure. In essence, they're trying to do something similar to this virtual data warehousing concept that we've coined in the past.
GCJ: How does OGSA-DAI work with the Globus Toolkit?
Chue Hong: We use GridFTP as the mechanism of choice for transferring large amounts of data. And we recently had a visit from some of the MDS developers to ensure that OGSA-DAI worked with the monitoring and detection systems services.
GCJ: OGSA-DAI received almost $2-million in funding a year or so ago. Can you tell us a little bit about where the project has been applying the funding, and new opportunities you plan to pursue?
Chue Hong: The funding was gained as part of a consortium, along with the Universities of Manchester and the University of Southampton. And it's a part of the Open Middleware Infrastructure Institute. So this is similar, I guess, to the U.S.'s NMI initiative. The idea here is to create a repository, an integrated collection of useful Grid tools and applications. We're working with them to help integrate OGSA-DAI with some of the other components that are being produced by Manchester and Southampton, particularly focusing on things such as work flow and registries, and also looking to improve things such as the performance of OGSA-DAI, working on creating more data integration tooling, and also working with specific projects that we've identified, such as caBIG, to try and produce project-specific applications.
Recently, we've been planning the next major release of OGSA-DAI, due in 2007, which will be focused at addressing a number of enterprise issues such as scalability, resilience, and interoperation with existing security systems. We're also porting OGSA-DAI to a number of other middleware infrastructures in conjunction with the OMII-Europe project.
GCJ: What's your level of interaction with the vendor community? Are you working with any big systems vendors or storage vendors?
Chue Hong: The previous two projects that have developed the OGSA-DAI software were in conjunction with IBM and Oracle, two of the major database vendors, and so far we've also been in contact with other database vendors, such Sybase and other storage vendors. I guess ultimately what we're looking to do with OGSA-DAI is not only come up with this open source software, but also try and push the standardization process so that these kind of Grid data interfaces become more and more prevalent at the database vendor's side of things. So you will see, for instance, hopefully DB2 with a data service interface on it at some point in the near future.
close window |
|