GCJ: As the enterprise continues to evolve to distributed computing environments and Grid computing, are networking (latency/bandwidth/performance) issues for distributed applications going to get trickier?
Aiken: I think there's a fallacy that speed is a major concern for all applications. If you're doing intra-cluster communications for IPC (interprocess communication) kind of stuff, latency might be a bigger issue than for other types of applications, where latency may be able to go out to 100 milliseconds and not be affected.
One of my focuses all along has been to understand why enterprises don't adopt Grid at a faster rate - and I think it's because whenever people talk about Grid, it's always done in the context of one of the high performance computing applications. I started in supercomputing before I got into networks. Prior to Cisco, I worked for DOE labs - Argonne, Lawrence Livermore and Sandia Labs. I funded Ian Foster for his initial middleware work, pre-Globus, when I was at DOE headquarters, so I've heard this discussion with respect to always focusing on the highest end applications for a very long time. I've noticed that when the industry talks about these challenges, the discussion almost always gets pigeonholed into the context of big science applications. I am not diminishing high end science applications, but I do think we all need to look at a broader set of Grid applications if we expect Grids to go mainstream.
A lot of times when we're talking about Infiniband or Ethernet or HIPPI (high performance parallel interface) or anything else at a lower protocol level, we talk about physical limitations of a wire speed or a protocol as its done in silicon. A lot of times, that's not the problem. When I talk to many supercomputing centers now, the bottleneck is not the network. It's getting the information out of the boxes which involves end system NICs, Operating systems, and end system communication stacks.
With other types of applications, like in gaming, some of the problems they worry about are different. The problem they're primarily concerned with is the latency of how they do cache coherency which has the players state information. They have information about the players and information about their locations and state in the game, but you can't have that all in one server - so they put it on multiple servers, and you have to have the cache coherancy associated with that person playing the game at that time and it needs to be distributed quickly. It's not necessarily a lot a lot of data that they're pushing, but the latency is important-and the way they're caching it addresses how to move a player from one kind of a node to another node, and it's all done internal to the system. In this case, the operating system and the network have merged to become a gaming system.
That's a different model that has different stress points than if we're doing high energy physics and have to worry about yadabytes. When you look at all of that, you're thinking about moving huge amounts of data from one datacenter to another. At some point in time, the reality is going to sink in that even if I push that data, you're not going to be able to do a darn thing with it anyway because it's just too much raw data What they do on some of these high end science experiments, after they collect the data, they have to go through an analysis process and throw out data that they don't think is relevant. As we head into the future, bandwidth is going to be an issue, but the applications are also going to have to change the way they think about moving data and analyzing data as well. Some of the intelligence is going to go into the middleware, some will go into the network - but applications are going to have to become more network aware, and the networks are going to have to become application aware.
GCJ: So how is the network evolving to meet new requirements being ushered in by Grid?
Aiken: There are two trends that are going to happen. One is that you're going to get a lot more intelligence in the network, which has already been going on for a long time, but will accelerate. The other is that you're going to get a blurring of the boundaries between operating systems and networks and middleware - it's going to be hard to find out where you're putting information, how information is being tracked, and how decisions from a policy perspective are going to affect what traffic goes where.
If I were an enterprise, I'd look at Grid if and only if I had a secure and predictable bandwidth allocation. The applications that enterprise runs, and the way they think... they need to have that predictability. There are a lot of ways of getting that predictability of the network. You can set up a VPN, or you can use MPLS (multiprotocol label switching) to virtualize access services-like frame relay, fiber channel or Ethernet at the edges that get hooked in over IP. MPLS can and is being used to set up the appropriate VPNs over the infrastructure to connect the appropriate enterprises into a virtual collaboration.
GCJ: Is the Internet going to be good enough to carry the Grid traffic, or will organizations drop dedicated pipes for their Grids?
Aiken: If you look back at the Internet in the late 80s and early 90s, there were a lot of different networks, and we figured out how to put them together and interconnect them. And when we did so, we basically created one large virtual network - and we even referred to it as a 'network of networks.' This is something that the Grid community is going to have to wrap its arms around. You're not just going to have one big Grid that everybody connects to. You're going to have Grids that are based on specific applications, affinity groups, and other kinds of business models. Depending on what's driving the business models and the applications, bandwidth may or may not be an issue. Latency may be an issue, and it may be what kind of latency - and how small or large it can be. Some of the applications may have to change the way that they think about architectures.
Google is going out and getting their own fiber. There are some companies that will do that because they have to eek out the high efficiency that they need, but the majority of people will contract service providers. The reality is that service providers today are all basically turning stuff into a converged backbone over IP. At the access sites, you're coming in with different types of access technologies, but you're going to end up going over an IP and probably an MPLS service as you go through service providers. MPLS is a traffic engineering tool that the majority of service providers use. And when you talk about enterprises, there are a lot of different types of enterprises. When you take a multi-site, national scale enterprise, like a Boeing or Ford, for example, some of these enterprise networks are even bigger than the service provider networks.
As I mentioned before, the Internet is really a network of networks, at varying levels of interconnectivity. That's already made up of multiple networks that are interconnected in various ways - from service providers to ILECs to metro to campus, etc. Grids will be based on applications or affinity groups. In the case of the Internet, it did not take off [commercially-speaking] until we did the NAP (network access point) design (the NSF initiative that commercialized the NSF net ) - which was a way for commercial and R and D to peer at layers 1 through 3 as they wished.
The challenge of Grid will be to do this at a different level. You've got your middleware that's got to peer at different places. There are going to be some entities that want to peer with other autonomous Grids, and there are going to be others that want to keep their own (ie. a private GRID). The majority of them are going to only acquire services from service providers, because the complexity of running networks today is going up, even for dark fiber networks. There are going to be some types of large science applications-they tend to be the lead-that will build their own networks to support their Grid infrastructures and this is a valid approach for them. But on the whole, there's not going to be one big Grid that everybody taps into, because everybody has different business models. And the laws of economics will largely drive which options they select to carry their Grid traffic.
GCJ: What will be the predominant standards for interconnect in Grid environments?
Aiken: One of the misconceptions that we frequently run into in networking is the idea that there will be one thing other than IP that everything will converge on. Throughout our time, we've had things like HIPPI and fiber channel - both developed at the same time for very niche types of applications and configurations. Infiniband, for example, falls into that type of area, and so do other types of protocols. But the thing is that the most common interface out there today is Ethernet.
You're constantly going to have debate about how much time/effort one spends, when building networks, on protocols that are not cheap, common, off-the-shelf types of solutions. Ethernet itself is going to have challenges as it scales. Any protocol you pick is going to have challenges as you go outside certain boundaries, and that's just a fact of life. They're going to have advantages and disadvantages. From an Infiniband perspective, there are limitations as well. All protocols have challenges. There is no panacea. But there are also limitations and challenges of doing 40 to 100 gig on certain optics - and that doesn't stop us from building networks with them. We figure out a way to build an architecture that satisfies them.
With Infiniband - when you talk about distributed storage or distributed clusters or any type of computational service - the big question is whether you're talking about loosely coupled or tightly coupled resources. If it's tightly coupled, then latency may be an issue. If it's loosely coupled, then latency might not be as much of an issue.
A bigger question is whether the end applications will remain with TCP/IP as a unifying and interoperable protocol or develop or use a different set of protocols.
close window |
|