Guest Expert
Mark Green
Grid Computational Scientist
University of Buffalo
Mark Green

Scientific applications require specific compute resources when participating in a Grid environment. For enterprise Grid adoption - in particular - the ability for application vendors to easily 'Grid-enable' their applications has been cited as a major initial obstacle to widespread commercial Grid adoption. But the good news is that research and science have been making some great headway in techniques to more easily Grid-enable applications. This month, the Globus Consortium Journal spoke with Mark Green -- Grid Computational Scientist, University of Buffalo Center for Computational Research -- to learn more about some of the techniques that his organization has developed.

GCJ: Tell us a little bit about the GAT ("Grid-Enabling Application Template") project that CUNY Buffalo has been heading, and what it means to 'Grid-ify' applications.

Green: A GAT essentially takes research code or even a commercial code and builds a small, very simple GUI on top of it. That GUI is presented within a grid portal. So as soon as you port your application to the grid portal, you can immediately use all the backend grids that the portal has incorporated. That includes Grid 3, Open Science Grid, Open Science Grid ITB, TeraGrid, ACDC Grid, Western New York grid, and ultimately New York State Grid. So you port it once and you can use all the resources that the portal uses.

GCJ: Who's using GAT?

Green: There are about 12 different applications that use this right now. We support the whole university here at Buffalo. There are many different science disciplines. And to give you an idea of some of the GATs that are currently incorporated, there are two earthquake engineering applications, there are some numerical methods applications, there are two quantum chemistry suites of applications, and there are several different bioinformatics, structural biology applications.

Shake-and-Bake is one application that's widely used in the community, for example. It's a structure determination program. And in order to actually determine protein structures that are much larger, there's a program called B&P, which is an interface for complete protein phasing.

Beyond Buffalo - SUNY-Albany has a couple applications that they're working on GAT with now. And there's a group of 35 biomedical research organizations that Columbia University is the lead institution for ... they're also porting an application into the grid also through our portal. We've worked closely with Cornell University on a lot of earthquake engineering simulations, and we are definitely interested in bringing Cornell onto the Grid here. So we have identified about 30, 40 universities within the state, and institutes, that we are actively talking to.

GCJ: Run us through the set-up of GAT a little bit ... what's the process look like for an organization that wants to put their research application on your Grid?

Green: It starts with us interviewing an application manager. So you come to me and you say that you want to collaborate with Buffalo. And you might be from, say Bingham University in New York. So as a preliminary step, we interview you and ask what your application does, how it runs, what kinds of input files it takes, and what kind of output it gives. And then from that we can develop a base GAT -- a generic GAT that we copy into a specific area for these people to develop on. We actually mount this on a different machine, and they get access to the PHP, and about 70,000 to 80,000 lines of code for easily porting their application.

GCJ: Are there any types of applications that cannot be transformed for the environment?

Green: Interactive applications that need input during the execution are very tough to port to the Grid. But otherwise no.

GCJ: One pain that's often cited by enterprise developers is how difficult it can be to move an application from development into production. Is it similarly difficult to move a Grid application from development into production?

Green: The way we architected our portal at Buffalo -- there's a very distinct interfaces between what's exposed to the user community and what is not exposed to them as far as all the backend infrastructure. That's why it's very easy for us to put a half a dozen portals online in front of the backend infrastructure. And for any one of those, we'll use all the same backend infrastructure. So we could have many development portals, if you wanted - and they'll seamlessly migrate to production portals.

GCJ: So your development environment looks almost exactly the same as the production?

Green: It's exactly the same, right down to all of the database tables. So obviously, the way the portal works is it uses a main database as a state.

It was clearly important for us to set it up this way, so that periodically, maybe once a quarter, we can push applications that are in the developmental portal out to the production portal. And that being said, there are new developments going on in the portal all the time as far as the data grid. So the data grid is tightly coupled with the computational grid. We have the ability here to push out dynamic storage elements to any of the grid resources that are integrated on the backend. So it doesn't matter whether they're TeraGrid, OSG, or whatever ... it doesn't matter. We can push out our own storage elements to these resources. And it gives the user the ability to manage that storage element, where it won't impact anyone else within the portal at all. So it's very specific space within a storage element that we dynamically push out.

GCJ: Where does Globus fit in to your efforts?

Green: Globus is integral to what we do here. We're an entirely Globus-based shop here. And right now we're evaluating GT4 GRAM with Web services.

GCJ: Are you currently running pre-GT4 GRAM?

Green: Yes, but not for long. Lots of the other Open Science Grid folks have pre-GT4 in place too. TeraGrid's using GT2.2, for example. We were running GT2 and now we're running GT3... but we plan to move to GT4. Internally, we have machines that are large enough to handle the load created by the old GT2 or GT3 GRAMs. So there was no need to move forward on GT4 GRAM before the rest of our major collaborators did.

And quite frankly, you need a little breathing time to actually build infrastructure. It's nice to have a little bit of breathing room in between so that you can actually see where you should be going, because it's no good to put out infrastructure without really putting into production and testing it under load.

close window