Guest Expert
William Johnston
Senior Scientist and Manager
US DOE, Energy Sciences Network
William Johnston

As one of the original founders of the GGF and current head of the Department of Energy's national network, the ESnet - Mr. Bill Johnston at Lawrence Berkeley National Laboratory has witnessed firsthand the evolution of Grid computing in the science and research world. In a recent interview with the Globus Consortium Journal, Mr. Johnston shares his views on some security and networking projects that he's affiliated with today.

GCJ: Tell us what you're up to these days...

Mr. Johnston: I'm running the DOE's national network, the ESnet. Among other efforts, we run the DOEGrids Certificate Service - the PKI authentication infrastructure that many of today's largest U.S. Grid projects use. We provide identity certificates for not only all the DOE Grid projects, but several NSF projects as well. And, in fact, for some CERN Grid projects.

GCJ: How does the DOEGrids Certificate Service work?

Mr. Johnston: A potential Grid user must present a token - a cryptographically signed document that says who they are. When they present that token, the people who are accepting it check back with the certification authority to find out if they really issued that certificate to you. Who gets certificates is structured around how the various virtual organizations are allowed to access a particular resource.

There's a common set of policies for the CA, in terms of what kind of identity is required. But every VO has its own appendix, in which they describe how they will vet people that they will authorize issuing certificates for. So there's a common set of policies that everyone has to accept, and then there are some VO-specific policies that the individual certificate authorities set as well.

Public key infrastructure is the authentication mechanism for the Globus software - and almost all Grids use a PKI infrastructure of some sort. PKI is also widely used in the federal government. For example, there's a new PIV (personal identity verification) standard that's coming out, covered under HSPD12, that will require all federal employees to carry PIV tokens. And PIV tokens are nothing but a token which contains a public key certificate, an encrypted private key, and some biometric information.

GCJ: What does the infrastructure that supports the DOEGrids Certificate Service look like?

Mr. Johnston: The certification authorities that issue public key certificates are a very formalized, stylized sort of thing. ESnet runs what's called a root CA. The root CA finds its certificates for the subordinate CAs, and subordinate CAs are the ones that actually issue certificates to users. And these are machines that are on their own subnets. They have firewalls protecting them, intrusion detection systems, etc. They're kept in locked racks that have alarms on them and physical intrusion alarms, secured machine rooms, and so forth.

The root CA is never used online. The root CA that signs the identity certificates for the issuing CAs is kept in a media vault that requires two people to show up simultaneously in order to get into the vault to check out the root CA. It gets the certificate of the signing CA to be signed, or the subordinate CA to be signed, and that's carried on a floppy to the root CA, which then signs the certificate. It's closed down and then put back in the vault. And that's a typical procedure for root CAs. Root CAs are never online. So all of this is part of the infrastructure that's necessary to issue certificates.

Most Grid communities have their own authorization infrastructure. In the physics community, for example, they use one called VOM, Virtual Organization Membership Service. So the scope of their Grid is, in effect, defined by the people they authorize to use their services.

Will we ever have a Grid that requires no authorization? No, of course not. Services aren't free. I got asked this question a lot when I was building the NASA Grid, and the answer is, it's a meaningless question. And the reason it's a meaningless question is you can say that we have a universal automobile infrastructure, but that doesn't mean that everybody gets to drive your car. So yes, someday we'll have a universal Grid infrastructure, but that doesn't mean that everybody gets to use your Grid. (However, Jack Dongara's NetSolve/GridSolve - is arguably a counter example to this argument.)

GCJ: So tell us about some of the interesting networking efforts that you're working on that have significance to Grid computing...

Mr. Johnston: We're currently working on a new network service that provides guaranteed bandwidth virtual circuits. You can read more about ESnet's "On-Demand Secure Circuits and Advance Reservation System" (OSCARS) here: http://www.es.net/oscars/index.html

But in a nutshell, virtual circuits provide guaranteed bandwidth typically between two sites or two hosts. So it's a virtual circuit in the sense that you're not mixing your traffic with the commodity Internet traffic. Your traffic is isolated in a virtual way. This is standard Internet stuff. Virtual circuits are generally produced by a technology called MPLS, multiprotocol label switching. But you can also construct virtual circuits with VLANs or optical channels. The virtual circuits are set up through a reservation mechanism - you can very specifically define who gets to use what parts of the network, and in what way.

This functionality is important for science because guaranteed bandwidth is important for remote experiment data analysis, on-line instrument control, distributed Grid workflow systems, etc. Many of these applications will not need lambda paths (10 Gbps optical circuits) but will need 100 - 5000 Mbps guaranteed. We will not have enough lambdas available any time soon, so we have to manage fractional lambda paths.

GCJ: What is the big bandwidth hog in Grids?

Mr. Johnston: One is certainly just moving enormous amounts of data. And right now that's the predominant one, because for the last couple of years, we've had large-scale instrumentation facilities turn on and start pumping enormous amounts of data to remote collaborators. So the Department of Energy, for instance, builds very, very large facilities -- typically the ones that universities can't build. But because these are large, unique facilities, they all have collaborators not only in the U.S., but all over the world. And so these facilities all crank out enormous amounts of data which is not typically stored and analyzed locally. It's sent to the experimenters in the collaboration, and then analyzed by the collaboration.

In the case of the Large Hadron Collider in Switzerland, at CERN, there are two major experiments. Each one of those experiments involves some 2200 scientists that are located in 60 or 70 countries around the world. And that data all has to be distributed so that it can be analyzed. And that's the only way to analyze the data. There's much too much data for any one group to analyze. And that's typical, whether it's NASA satellites or whether it's the new big telescopes or whether it's the large-scale electron microscopes. There's lots and lots of science scenarios that have that sort of requirement.

GCJ: What are some common ways that you envision Grid professionals using virtual circuits in the future?

Mr. Johnston: The science Grid community requires virtual circuits for a few specific scenarios. A common one is when you have to have guarantees in moving data from one place to another. You may have an experiment that produces a petabyte of data per year - and you know that you have to transfer a couple of terabytes per day from, say, CERN to Brookhaven. In order to do those transfers, you need virtual circuits with bandwidth guarantees, because otherwise you might not get enough of the bandwidth along the path to keep ahead of the data that's arriving.

Another scenario is if you're going to put an instrument online and control it remotely, then you may say that I absolutely have to have a guaranteed 100 MB/sec between 8:00 and 5:00 tomorrow, because that's when the instrument's running and I can't control it unless I get all of the data from that at 100 MB/sec - from that instrument, unless I get 100 MB/sec. So in that sense, it's sort of quasi-real-time control issues. So the virtual circuit management mechanism provides for managing the bandwidth. So what you do is you install what are called the ingress filters that impose bandwidth limits on the data that you can inject into that circuit. And so you will configure your router so that for that circuit, it will only accept 100 MB/sec of data and no more. And then it's up to the user to make sure that they only put out 100 MB/sec.

The other area which I think virtual circuits will be needed for bandwidth guarantees, which is a completely different area, is Grid-based workflow systems. The high energy physics community, for instance, to analyze all the data that's coming off the LHC, they're putting together large-scale Grids that involve hundreds of machines scattered across dozens of institutions. And they will have workflow systems that manage the movement of work and the various steps at which it's being processed. And, of course, many of the processing steps are in different machines, so the work - the data flows into one machine, gets transformed, sent out to another machine, gets transformed, sent out to another machine, etc., etc. So you get these networks of workflow systems. And unless you have sufficient bandwidth between those systems, which may be scattered around a number of different institutions, you won't be able to keep the workflow moving steadily. And so I think that's another example. That is, if you're going to have high performance, high throughput workflow systems, you will probably have to have bandwidth guarantees between the work systems, the systems that actually do the computation, the data storage. Otherwise, you're going to get lockups in the workflow system and inefficiency.

close window