Georgetown University Talks Grid Security in Medical Research
Five years ago, Georgetown University -- home to numerous leading medical research facilities, such as the Lombardi Comprehensive Cancer Center -- did not have a core computational facility. But in a short while, their Advanced Research Computing (ARC)
team has not only created a shared computing infrastructure for
Georgetown researchers -- but is also one of the leading contributors
to the National Cancer Institute's caBIG collaborative research Grid project.
Getting Georgetown on an accelerated track to Grid computing has been a challenge, said ARC Program Director, Steve Moore.
"When we started to get into the Grid space, we knew that we had to
make some quantum leaps," said Steve Moore, Program Director for ARC.
"Georgetown wasn't known for Grid technology - yet our researchers
needed HPC and computational resources."
Georgetown's shared computing facility kicked off when Arnie Miles,
Senior Systems Admin/Architect and ARC associate, Dr. Woonki Chung
stood up some Beowulf
clusters and started working with Globus to support the Depts. of
Physics and Chemistry. Today, ARC maintains eight clusters (for a total
computational resource of 380 CPU's) for Georgetown. To review the
basic stats / performance of these clusters, check out the ARC Web site.
Grid Security Not so Easy to Scale Quickly
But as researchers quickly piled on to use the Computational Core
Facility's Beowulf clusters and workload management tools, Miles and
the ARC team immediately started to recognize scale problems for
security.
"Every time we'd stand up a cluster, it would have its own user
base," said Miles. "We were using local accounts, and creating tiny
administrative domains. And we quickly realized that while (Condor)
has functionality to span across these Beowulf clusters -- it was all
IP or host database security, and it was too labor intensive. And after
administrative configurations were complete, it didn't meet our
security requirements."
Georgetown tackled the security scale issue by hiring an identity
management expert to work with the ARC team. Chad La Joie had
previously spent years developing and managing an identity management
infrastructure at Virginia Tech. With the ARC team, he saw an
opportunity to apply Shibboleth in the healthcare arena.
"We saw Shibboleth's potential for importing and making available to
the Grid the identities, attributes, and credentials of the researchers
participating in the Grid at Georgetown," said La Joie. "Shibboleth is
a way to make it easier for them to participate -- not having to know
about all of the certificates or deal with all of those issues."
Shibboleth has a trust fabric mechanism based on the SAML 2.0 metadata file. It's a public key infrastructure
that allows for simpler trust negotiatoin with a service provider. As
it goes to connect to the identity provider, certificates get passed
and verified. The response that gets sent back is an XML document which
is digitally signed, again using the public and private key pairs to
sign and verify the data. And this metadata describes each service
provider, each identity provider, and all of their PKI information, and
so it really forms a simple basis for the necessary trust.
According to La Joie, it's the crossing of administrative boundaries
that makes Grid security different than plain old security in a
distributed computing world.
"In Grids, your services and your people live in different
administrative domains," said La Joie. "So there's a whole issue of how
do I know to trust this other service, how does this other service know
that these credentials that I'm passing really come from Georgetown and
are valid? So there's all this extra policy and technology that has to
be in place to negotiate and verify, in a legal sense, that this person
/ service / entity is what it says it is ... and is acting on behalf of
what it's saying it's acting on behalf of."
Georgetown sees Shibboleth making it much easier to provision systems with proper security.
"Once you have this fabric set up, we envisioned a systems
administrator who wants to stand up one or 100 or even 10,000
computational resources can go in and relatively quickly set up a
machine to allow specific users, groups, or federations to access the
resource," said Miles. "Even if the users are not on my campus, even if
they're at Purdue and want to access Georgetown Grid resources. I could
say that anybody in the federation is allowed to access my machine --
and if there are ten people at Purdue who are in the federation and one
of them quits, I don't have to know about it. It's handled on Purdue's
end, and becomes automatic to the federation."
Security Standards Evolving Fast in Grid
In addition to the SAML 2.0 spec, Georgetown sees a lot of activity today with the WS-* specs in Grid security environments.
"The SAML 2.0 spec is really the core spec for passing around
identity information, but laid on top of that is the need to be able to
secure and protect that when you're transporting it between various Web
services," said La Joie. "And that's really where you see the WS-*
specs come in - WS- Security, Liberty WS Messaging, that sort of thing."
But while WS provides some much needed plugs for specific security
holes, the specs aren't necessarily up to snuff yet, says La Joie.
"If you've ever been masochistic enough to read the SOAP specification, it actually says security is wholly outside of the realm and the scope
of this spec. So people have done very proprietary things, they're not
very interoperable, and what we're seeing now is these specifications
come and say this stuff really does need to be interoperable now. And
you're seeing the first attempts to do that. Some of them, I think, are
good attempts. WS-Security security is pretty decent, for example. But
some of the other ones really do need to go through another iteration
before they get it right."
Georgetown Contributions to caGRID project
In November, The Globus Consortium Journal ran a Q&A
with Peter Covitz, Director of Infrastructure for NCI Center of
Bioinformatics -- and leader of the caBIG (Cancer Biomedical
Informatics Grid) effort. caBIG is the effort to integrate and share
heterogenous data types -- and make it easier for mass scale
collaboration for medical researchers.
As caGRID (the actual Grid infrastructure that supports caBIG's
1,000+ Grid services) takes off, Georgetown has played an enthusiastic
role in the effort.
"As caBIG set out to stand up solutions to help researchers share
data, they realized very early on that the semantics issues were huge,"
said Miles. "While speaking with the various cancer research centers,
they started to understand and support 'Grid' technology -- and
Georgetown leveraged funding opportunities, because of our own early
experiences."
As a result, the National Cancer Institute (NCI) sponsored
Georgetown's Lombardi Cancer Center for integrated cancer research and
clinical trials (Principle Investigator, Dr. Robert Clarke). NCI also
sponsored Georgetown's ARC Group to collaborate on the cross-cutting
architecture component.
Georgetown's GridsWatch Project
In an effort to help other would-be organizations building Grids to
better understand the landscape, late last year the ARC team kicked off
GridsWatch -- a portal of
information that tracks interesting news, projects and personalities in
the Grid community. Some parts of the site are still in development
(the "discussion forums," for example, are not yet on). Other parts,
however, are extremely useful. The "People"
section is a massive list of Grid projects and the associated leaders
running those efforts. To dig up this type of research from scratch
would be daunting -- so GridsWatch has provided a helpful starting
point that will help other organizations embarking on Grid to avoid
re-inventing the wheel.
"We thought we'd just go for it and try to build a central resource
that helps build relationships, helps to track and summate the emerging
technologies -- and to include breaking news, and encourage training,"
said Moore. " That was the genesis of GridsWatch, and so far it's been
received very favorably. People are registering and they're giving us
ideas. So we hope to continue to grow this as a central resource -- a
vendor-neutral place where people can track, discuss, and influence
where grid is going."
close window |
|