Guest Experts
Steve Moore, Arnie Miles and Chad La Joie
Advanced Research Computing Team
Georgetown University
Georgetown University

Georgetown University Talks Grid Security in Medical Research

Five years ago, Georgetown University -- home to numerous leading medical research facilities, such as the Lombardi Comprehensive Cancer Center -- did not have a core computational facility. But in a short while, their Advanced Research Computing (ARC) team has not only created a shared computing infrastructure for Georgetown researchers -- but is also one of the leading contributors to the National Cancer Institute's caBIG collaborative research Grid project.

Getting Georgetown on an accelerated track to Grid computing has been a challenge, said ARC Program Director, Steve Moore.

"When we started to get into the Grid space, we knew that we had to make some quantum leaps," said Steve Moore, Program Director for ARC. "Georgetown wasn't known for Grid technology - yet our researchers needed HPC and computational resources."

Georgetown's shared computing facility kicked off when Arnie Miles, Senior Systems Admin/Architect and ARC associate, Dr. Woonki Chung stood up some Beowulf clusters and started working with Globus to support the Depts. of Physics and Chemistry. Today, ARC maintains eight clusters (for a total computational resource of 380 CPU's) for Georgetown. To review the basic stats / performance of these clusters, check out the ARC Web site.

Grid Security Not so Easy to Scale Quickly

But as researchers quickly piled on to use the Computational Core Facility's Beowulf clusters and workload management tools, Miles and the ARC team immediately started to recognize scale problems for security.

"Every time we'd stand up a cluster, it would have its own user base," said Miles. "We were using local accounts, and creating tiny administrative domains. And we quickly realized that while (Condor) has functionality to span across these Beowulf clusters -- it was all IP or host database security, and it was too labor intensive. And after administrative configurations were complete, it didn't meet our security requirements."

Georgetown tackled the security scale issue by hiring an identity management expert to work with the ARC team. Chad La Joie had previously spent years developing and managing an identity management infrastructure at Virginia Tech. With the ARC team, he saw an opportunity to apply Shibboleth in the healthcare arena.

"We saw Shibboleth's potential for importing and making available to the Grid the identities, attributes, and credentials of the researchers participating in the Grid at Georgetown," said La Joie. "Shibboleth is a way to make it easier for them to participate -- not having to know about all of the certificates or deal with all of those issues."

Shibboleth has a trust fabric mechanism based on the SAML 2.0 metadata file. It's a public key infrastructure that allows for simpler trust negotiatoin with a service provider. As it goes to connect to the identity provider, certificates get passed and verified. The response that gets sent back is an XML document which is digitally signed, again using the public and private key pairs to sign and verify the data. And this metadata describes each service provider, each identity provider, and all of their PKI information, and so it really forms a simple basis for the necessary trust.

According to La Joie, it's the crossing of administrative boundaries that makes Grid security different than plain old security in a distributed computing world.

"In Grids, your services and your people live in different administrative domains," said La Joie. "So there's a whole issue of how do I know to trust this other service, how does this other service know that these credentials that I'm passing really come from Georgetown and are valid? So there's all this extra policy and technology that has to be in place to negotiate and verify, in a legal sense, that this person / service / entity is what it says it is ... and is acting on behalf of what it's saying it's acting on behalf of."

Georgetown sees Shibboleth making it much easier to provision systems with proper security.

"Once you have this fabric set up, we envisioned a systems administrator who wants to stand up one or 100 or even 10,000 computational resources can go in and relatively quickly set up a machine to allow specific users, groups, or federations to access the resource," said Miles. "Even if the users are not on my campus, even if they're at Purdue and want to access Georgetown Grid resources. I could say that anybody in the federation is allowed to access my machine -- and if there are ten people at Purdue who are in the federation and one of them quits, I don't have to know about it. It's handled on Purdue's end, and becomes automatic to the federation."

Security Standards Evolving Fast in Grid

In addition to the SAML 2.0 spec, Georgetown sees a lot of activity today with the WS-* specs in Grid security environments.

"The SAML 2.0 spec is really the core spec for passing around identity information, but laid on top of that is the need to be able to secure and protect that when you're transporting it between various Web services," said La Joie. "And that's really where you see the WS-* specs come in - WS- Security, Liberty WS Messaging, that sort of thing."

But while WS provides some much needed plugs for specific security holes, the specs aren't necessarily up to snuff yet, says La Joie.

"If you've ever been masochistic enough to read the SOAP specification, it actually says security is wholly outside of the realm and the scope of this spec. So people have done very proprietary things, they're not very interoperable, and what we're seeing now is these specifications come and say this stuff really does need to be interoperable now. And you're seeing the first attempts to do that. Some of them, I think, are good attempts. WS-Security security is pretty decent, for example. But some of the other ones really do need to go through another iteration before they get it right."

Georgetown Contributions to caGRID project

In November, The Globus Consortium Journal ran a Q&A with Peter Covitz, Director of Infrastructure for NCI Center of Bioinformatics -- and leader of the caBIG (Cancer Biomedical Informatics Grid) effort. caBIG is the effort to integrate and share heterogenous data types -- and make it easier for mass scale collaboration for medical researchers.

As caGRID (the actual Grid infrastructure that supports caBIG's 1,000+ Grid services) takes off, Georgetown has played an enthusiastic role in the effort.

"As caBIG set out to stand up solutions to help researchers share data, they realized very early on that the semantics issues were huge," said Miles. "While speaking with the various cancer research centers, they started to understand and support 'Grid' technology -- and Georgetown leveraged funding opportunities, because of our own early experiences."

As a result, the National Cancer Institute (NCI) sponsored Georgetown's Lombardi Cancer Center for integrated cancer research and clinical trials (Principle Investigator, Dr. Robert Clarke). NCI also sponsored Georgetown's ARC Group to collaborate on the cross-cutting architecture component.

Georgetown's GridsWatch Project

In an effort to help other would-be organizations building Grids to better understand the landscape, late last year the ARC team kicked off GridsWatch -- a portal of information that tracks interesting news, projects and personalities in the Grid community. Some parts of the site are still in development (the "discussion forums," for example, are not yet on). Other parts, however, are extremely useful. The "People" section is a massive list of Grid projects and the associated leaders running those efforts. To dig up this type of research from scratch would be daunting -- so GridsWatch has provided a helpful starting point that will help other organizations embarking on Grid to avoid re-inventing the wheel.

"We thought we'd just go for it and try to build a central resource that helps build relationships, helps to track and summate the emerging technologies -- and to include breaking news, and encourage training," said Moore. " That was the genesis of GridsWatch, and so far it's been received very favorably. People are registering and they're giving us ideas. So we hope to continue to grow this as a central resource -- a vendor-neutral place where people can track, discuss, and influence where grid is going."

close window