Guest Experts
Richard Wellner and Scott Koranda
Univa
Wellner and Koranda

Two of Univa's brightest and most experienced - Richard Wellner and Scott Koranda - join the GCJ this month to answer some specific questions about network considerations for the Grid.

GCJ: What do Grids demand of the network beyond plain vanilla enterprise networking demands?

Wellner: Basically, there's a lot of networking technology in place today that was developed decades ago, and needs to be accommodated for by the Grid layer. So, for example, if you look in particular at data transfer on a Globus Grid, there's a set of tooling associated with the Globus Toolkit that works around some of these issues with the network layer as it exists today.

One example of that is GridFTP, which is this tool for making effective use of the bandwidth that is available. The TCP/IP standards in particular go back for several decades, to the '70s. And they were designed for a different era, where you were lucky to get multi-tens of kilobits of transfer speed, and you really were paranoid about errors consuming the space that was available. So there were all kinds of provisions built in to allow for fair sharing of the network even when things were getting dropped.

Unfortunately, the way they went about doing that was such that as the network speeds themselves ramped up, some of those workarounds didn't translate well to the new higher bandwidth domain. But GridFTP is a tool that takes advantage of the bandwidth that's available even in high latency situations. So you have companies or enterprises that go across multiple continents, and they have pretty laggy connections as a result of that. GridFTP has -- intrinsic to the standards it was built upon -- the ability to put more data down that pipe, even given that high latency environment.

Another area is simply making use of all the networking that is available. For I/O optimization, we can use things like RLS to be able to find data that's been distributed throughout the organization. So instead of having to move all of the data, and instead of having to be constrained by the networking that's available at any given site -- you can use RLS to find data locally or at a closer site so that you can make more efficient use of the networking that you do have available.

And then finally, addressing reliability as opposed to latency or bandwidth constraints, we have a tool called RFT, which allows you to do "fire and forget" kinds of file transfers. So the user can define a large set of files that they're interested in moving from one location to another, tell the RFT service to take care of that, provision the resources necessary and make that transfer happen. Then the user can walk away, and the next day or the next week even, in some cases, they come back and they find out that their data has been moved between those locations. These are all different ways of addressing various shortcomings in the network world today associated with both latency and bandwidth constraints.

GCJ: Going back to GridFTP - it originated as a spin-off of the original FTP protocol, correct? Is it a protocol that's still evolving today? Any recent evolutions or ongoing development issues related to GridFTP that the community should be on the lookout for?

Wellner: Yes, there's a lot of work in the Grid community right now in terms of how GridFTP can be either extended itself or act in conjunction with other services to be a storage platform instead of what's considered to be a data transfer platform.

Koranda: And I think one of the places that GridFTP will start getting more attention is as a development platform. GridFTP, the Globus implementation at least, offers a really rich API. So that allows people to integrate data movement or file movement capabilities closely into their application. The old way, you had to basically wrap around some FTP client, and that could be really messy. With the GridFTP API, you can very cleanly add data transfer capabilities into your application. And I've seen that done a couple times, and it makes for a really nice way of sort of Grid-ifying your workflow, by adding that capability directly into your application.

GCJ: One of the trends that's going on in networking today is a tighter integration between the systems level and the network level. So what are your thoughts on this trend, and how does it affect what's going on in Grid today?

Koranda: Yes, I agree that we are seeing a tighter integration between the systems level and the network level. One interesting development that comes to mind is the recent release of Fedora Core 5, which actually includes different TCP/IP stacks that an administrator can use to dynamically control how the machine is responding and working on a TCP network.

Now, I don't think we'll see those capabilities being exploited quickly in a lot of communities, but I think the Grid community will be one of those where we'll see that functionality exploited sooner rather than later. And that's evidenced by GridFTP, for example, giving people knobs to turn with TCP to tweak performance. And now we see that functionality starting to be pushed out further and actually become part of the operating system. So I do expect we'll see this trend continue, and more and more applications will give the savvy users new knobs for fine-tuning performance. And we'll also start to see more automated tuning of the parameters as time goes on.

Wellner: I think you could say there are opportunities ahead for the Grid community to provide all the necessary knobs and handles that you want to provide to people, and to begin to allow them to manage those interfaces in ways that really make sense for the entire picture of what work they're trying to get done. And when the network layer becomes manageable in the same way as other parts of the Grid, that's when I think we'll really start to see the capabilities being built up and some really neat things being possible at that layer.

GCJ: So when that happens, are we going to see a strain on the public Internet as we know it today? Is it adequate to carry the load of this traffic?

Wellner: Well, the concept of a "public Internet" is itself a bit of a red herring. Univa does business in a number of different countries on several different continents, and the idea of what the public Internet consists of varies from region to region. Because of that, there isn't really a single answer. So it depends on both the quality of connectivity that's available in the region, who the operators are, all of the Internet traffic for that region, sensitivity of the data that you're trying to push across it, where your peers are located, etc. All of these come into play.

So the blanket answer is, as a general statement, the core technology of the Internet can be up to the technological challenge, but there are clearly reasons why people might go in other directions. The Internet is obviously less expensive than going with a dedicated pipe. So that's huge right there, and a lot of people, a lot of companies go in that direction just for cost savings issues. The Internet, however, is missing end-to-end network management capabilities. If you're talking about even a local connection, much less an international one, the likelihood is that you're going to pass through several domains of control. And right now, there are no pervasive standards for doing network management or quality of service guarantees across those multiple administrative domains.

Dedicated pipes have the advantage that they frequently do have end-to-end manageability and predictability. Even if you as a customer to a long-haul network service provider don't have access to their manageability layer, they're going to give you some quality of service guarantee that at least says you're paying for a 10 MB connection, we will make sure that you get a 10 MB connection. Now, for certain kinds of applications on the Grid, you might get some additional capabilities by being able to manage streams within that 10 MB, but at least you're always going to get that 10 MB.

The downside, of course, to that is that the dedicated pipes are very expensive. You have operators that have certain Carrier Routing Systems guarantees. You're always dealing with one entity, so there's very limited potential to pass the buck. It is truly a balancing act. The public Internet has the technology necessary to do everything that the Grid needs, and there's some regional variation in terms of whether that's the best choice or not.

GCJ: Are today's storage environments really well-suited to getting a certain quanta of data to an application somewhere on the network when and where it needs it? Do you see any storage access issues for Grid computing environments in the enterprise?

Wellner: The Grid itself, if you look at how it came about, was designed to allow global access or Grid-wide access to locally administered resources. As a result of that, you end up with storage solutions that are generally point solutions. Any particular site might have a storage facility that they expose via GridFTP, SRM or some other standard, but there aren't a lot of storage facilities or storage solutions across an enterprise that are themselves federated.

Today, at Univa we have Univa Globus Enterprise (UGE) that has the capabilities to bring stronger control to distributed data storage through tools like GridFTP, RLS and RFT. And by virtue of that, different classes of users get different benefits. GridFTP, for example, makes things faster by moving data between these storage facilities with greater efficiency over a given network pipe.

Because these tools were all designed to operate on the public Internet, it's cheaper, because you have lower hardware costs, you don't have the need for Storage Area Networks and things like that, and you have reduced networking requirements. You don't have to go with dedicated networking in order to get a certain class of performance for the application suite that you need to host. All of this stuff is built off of open standards, which means that there's a built-in marketplace. And if you're unhappy with one implementation, you can get another one. There are other GridFTP implementations out there. It happens to be that GridFTP and the Globus Toolkit is the de facto standard, probably because it's the best one, but that doesn't mean that there isn't an opportunity to switch to another application set if need be.

Koranda: I'll add the "V" word - virtualization -- because that's a trend that I think will continue to grow. You have this issue with geographically distributed data centers, so an enterprise might have a handful of different sites with storage. And at some point they decide they want to start really leveraging all of the storage at these sites and use the Grid. Well, when you do that, there's inherently a problem with finding the data. Where is it across my Grid?

The first attempts to wrestle this problem took the position that, at each site we're going to impose a particular directory structure. And from the top-down you have somebody saying this is how we're going to lay out the files, and it's got to be the same at every site, and we're going to have to be in lock-step, and that way all applications and all users will know how to find the file that they need to find. Of course, that situation almost never works. You have different people managing sites, these different geographic centers grow up in different ways, and they have different hardware. So imposing those types of tight constraints is really very difficult.

The Grid offers a better approach, where you're really adding this virtualization layer so that you can provide a mechanism for users and applications to find the files that they need and then access them, and do it in reliable ways, where you don't have to worry about any of the details, where the actual file lives. So this combination of Globus RLS with GridFTP and RFT is really a very powerful mechanism for the virtualization that will allow an enterprise to get away from this problem of how do we actually help our users or our applications find what they need to find.

close window