GCJ: We heard from Charlie Catlett in September of 2005 just after the big NSF grant award. What are some noteworthy items that TeraGrid has been working on since?
Skow: 2006 has been a really great year for TeraGrid. It was our first full production year, having completed the transition from a construction project which ended in early 2005. We had pretty aggressive goals for growth in both the technology and resource side as well as growth on the user side. And glad to say that we met and exceeded both of those goals. We integrated the core computing resources from NCSA and San Diego in the spring, and added on NCAR in late summer, and are now in the middle of a number of upgrades on some of the other facilities. We have completed the first round of those and are anticipating a really big one next July in Texas, as the first of these track two resources comes online with the latest HPC funding initiative from the NSF.
And on the user side, we've now had about a fivefold growth. From the starting point of about 600 users in 2004, we're now over 3200 regular users of TeraGrid. So we've really seen remarkable growth.
We're now paying even more attention to the operational side of things, trying to focus on getting our instrumentation and our process streamlined and clarify some of the things that have been a little cumbersome, essentially dealing with the types of growing pains one might expect as the result of success.
GCJ: Let's talk a little bit more about the user side. We'd like to make a call for more information on grid applications and what people are actually using these grids for. We've talked a lot in the past years about construction and getting these grids built, but we want to know what people are doing with the grid. So to that effect, what are some of the application trends that you've been seeing on TeraGrid? And are there any commoditized applications or generalized applications that you see being shared or being reused?
Skow: I think we're seeing several trends. First, the pathfinders, as you say, that are using grid with custom codes and have had long investments in particular fields, they're still there and very present and using grid effectively. But we're now starting to see the uptake of several other kinds of things. The first is characterized by what we call our science gateway project, where it's basically a provider providing a portal to a community where they pre-provision certain codes, applications or environments for people to do their analysis or do their exploration in their field of science.
Probably one of the most successful ones in TeraGrid in the aspect of taking components and making them available then to a broad community, would be the RENCI bioportal, where they've taken over 140 standard applications from the bio-community, bundled them all together, pre-provisioned them across the TeraGrid, made a common interface so that the researchers can get access to any of those instances from a single gateway, and then gone out to the community and worked with them to iterate on what applications work well in that environment, and what applications would they like to add. And so they're almost acting like value-added resellers, where they are putting in place this infrastructure and then making it available to the community and getting the feedback on the usage and the uptake of that pre-provisioned technology. And so it really lowers the barrier for people coming in and trying to make use of these parallelized machines and optimized codes.
I think that's very exciting. We've got probably three or four other examples of that, but hey're the largest one in terms of the number of applications they've packaged.
GCJ: Right, that's exactly what we need, these libraries of applications. Can you elaborate on some of what those applications do?
Skow: Some of the more "headline" applications that are being made available include EMBOSS (European Molecular Biology Open Software Suite), GLIMMER (Gene Locator and Interpolated Markov Modeler), HMMER (Hidden Markov Modeler), the NCBI (National Center for Biotechnology Information) toolkit and PHYLIP (PHYLogeny Inference Package). Other Standard databases available through the TG-Bioportal include NCBI Aggregate, PDB, Prints, RepBase, UniProt, PFam, ProSite, and TransFac. These embody everything from sequencing code for going in and doing genomics to assembling together pieces of genetic information, to even more specialized things.
We just are completing an analysis of 50 codes that are taken from the top computational users of the HPC machine. And these codes were used as part of the development of the benchmarks for the procurement of these next rounds of machines. So we've gone to take that data and try and understand how widely ported these codes are, how difficult it is to port to different machines. And I'm really looking forward to getting that analysis back and getting a little bit more insight into how broadly we can take this kind of approach of pre-provisioning code that's useful to the community across there.
GCJ: Yes, we talked about this interview being a tee-up to get some more information on that later. That's going to be a really interesting study. I think you're going to see a great deal of pertinent information out of this study, very interesting market data.
Skow: In fact, from the technologist's point of view, I think this has been one of the hidden benefits of the grid, one we certainly have already noticed - even with the rudimentary instrumentation that we have in place today, that we get a much better view of the overall usage and usage types of what people are doing than you can get with practically any other method of collaboration across institutions. You get a pretty good view of what people are doing if you own the batch queue and you're looking at all the information inside a single environment, but getting that across the multiple machines without something like a services approach was just extraordinarily difficult. I'm thinking that the next year is going to yield a number of interesting insights as to what really is working, what people are doing, and their intrinsic value.
GCJ: We'd like to call this year to be the year of the grid application. What are your thoughts on what that means from a general sense. You talked a little bit about what that means from a TeraGrid sense, but how do you think we can make that happen in this loosely defined grid computing market?
Skow: I think there are a couple things that are moving forward that I'm excited about. The first is that we're recognizing a pattern. When you talk to the users, the people that are running these applications or using these grid applications, what they value most is the ability to handle much larger and more complex workflows than they had been able to do before. And this is either because they're using some workflow tools directly or because they're using something buried inside their application that has become more efficient in a grid environment.
I think that this transition of helping people and learning how to describe workflows, particularly the complex and the repetitive computational workflows in workflow languages or tools, and then laying that on top of the grid is probably a good pedagogical approach that we're going to use start pushing forward. I think a number of people are starting to introduce grid that way by teaching those tools and learning those tools, discovering the work patterns first, and then bringing in the technology and applications to support workflows, rather than the other way around.
The other thing that I'm seeing, in fact we spent a good deal of time in this past year working with our international peers in this, is something we called Grid Interoperation Now. It was a project for this past year to see if we could establish a basic level of technical interoperation between the major scientific grids, and then present that ensemble to the user community and see whether or not there was a real need or a desire to take advantage of resources from multiple grids and really create this kind of Worldwide Web equivalent of the worldwide grid.
There is a group of people that are very keenly interested in that and spending some time researching this and we are really beginning to see some successes. This sort of a demand for drawing on resources from multiple grids, drawing on these more complex workflows is going to really push forward the standardization efforts, the commonalities of interfaces, and the direct resources of efforts that are necessary in order to really grow the global grid concept, both commercially and in research.
In the commercial space, I think you're starting to see real adoption of Web services and people making businesses off of them and really taking the kind of approach that we've seen from companies such as 37 Signals and their collaboration tools.
I believe these are the things that will really help with the creation and adoption of grid applications.
GCJ: We've talked about the GIN project before in the Globus Consortium Journal, and flowing from that topic is a question regarding the types of platforms you're seeing for grid applications, in the sense of commonalities between grids and commonalities in the use of grid middleware. What did you see come out of that project? Are we there? Do we have the needed commonalities? Are we becoming more interoperable? Or do you think there's a lot more work to be done?
Skow: I think what we've found is that there are relatively few implementations of key services like job submission, file transfer, authentication, that people are consolidating on. And at this point, the interoperation is based primarily on finding people that have chosen the same implementations to solve their problems. Currently applications are more built around the code and the code base than the standards but we were able to get more into what we call the building islands, finding islands where people were doing the same things that were compatible.
For example, in the storage space, there's two strong players out in production grids today in terms of data management above the file transfer layer at the GridFTP level, which is the lowest common denominator and pretty much universal. There is SRB for the storage resource broker and then SRM for the storage resource manager. The grids tend to go one way or the other. There's probably 20 or so that have federated in both of those islands, and now we're talking with people about how do you bridge to share data from one island to the other and what can be done. And both the SRM and the SRB developers are now talking directly. They've seen the scale of their respective markets, and so it kind of helps them understand the value of that work.
GCJ: What about grid enabling the applications, you don't just take an application, throw it on the grid and everything's up and running. There's usually a lot of work going on in the background to get applications grid-enabled. What is your perspective on what it takes to do this? Is a lot of this done in-house by the people that are putting their applications on the grid, or are they going outside and saying, hey, I've got this application and I'd really like to get it grid-enabled, can you help me?
Skow: I think what we're seeing with our community is there are people that are saying they need that help, but they aren't really ready and saying "come give it to me in droves". There are a few, but the community is still relatively small. Most people that are engaged in creating the applications are still tending to pull together a team and do quite a lot of grid enabling themselves and build up the technology themselves. It's kind of like the transition from FORTRAN to C++ in the high energy physics community that I was familiar with. So me of the researchers are looking at this as a paradigm change or a large change in their strategy for computing, and so they're wanting to get in some of their key people and learn it in more detail before they really make any big investments.
That said, I think that there are a couple other things that are coming around that may change this view such as Globus Web Services deployment with the standard container, with efforts to create an environment where you can wrap standard code to create grid applications. I know Mark Green at Buffalo was doing some things like this. Peter Coveney in the UK at Imperial College London is doing something similar. There's some folks in the European Community with the GridKa project where they provide a standard environment, you bring your code in and just plug it in, and it becomes a grid service.
Commercially, you see things like the EC2, the Elastic Compute Cloud at Amazon, where they're making available a virtual machine image, that you can run whatever you like in an environment you completely control and then present it as a service on the network. These are very interesting technologies and capabilities that may help folks that don't really want to invest a lot of time in understanding Web services or any of this stuff. They just want to take standard code and make it available into this framework.
I think from a clients' perspective, one of the things that is still a problem that's being grappled with in the commercial space as well, is how do you make a buck out of it? How do you get the credit for the work that you're doing by making this application available on the network to other people? In the academic environment if you've done this work and now you have the grand honor of providing maintenance to 10,000 people, this is not very rewarding for most people trying to get tenure. So we need to figure out some kind of an equivalent of a citation for a publication for these grid services.
GCJ: Interesting reward concept there. What about within TeraGrid itself? Do you offer professional services? How do you offer assistance with folks that say, hey, this is great, I want to take my application and run it on TeraGrid?
Skow: So we've done a couple things. One is we have a couple of high-end consulting programs that we offer, both for these science gateways and for individual applications. We have a method whereby people can make requests, and we review the applications for what they need and whether or not we've got the right expertise in the house to address the problem, whether this is something that will be leveraged by many other people, and how big the demand for it is.
We're also starting to try and pull together the tools for the community to go in and create their own exchanges so that they can do things like publish how-tos in a common wiki, where people can upgrade them and contribute and say "this worked for me, and if anybody else is doing something similar, I'd love to share notes with you". So I think that that's one thing that we can provide to the community, to go off and help folks share the wealth. There's these time-honored tools of mailing lists and other things that are being used, but we're already at 1000 different groups using TeraGrid, and it gets a little noisy when you start having things at that kind of scale. You need something a little bit larger.
GCJ: Going back to the grid market in general and the buzzwords that travel around, I've seen in the last year less use of the word "grid," and more of the breaking down into its component structures, like virtualization and SOA and similar buzzwords and technologies. How have you seen those technologies and those ways of describing grid fit into the whole application mix?
Skow: Well, the buzzword in our community is cyber infrastructure, and people have started to move over to talking about that. And I think that, in many ways, folks are still trying to understand how that involves them and what that means. I think the one thing that has come through very clear and seems to be well understood is that cyber infrastructure or grids or distributed computing, or whatever name you want to use, is much more than just the hardware. It requires a persistent software base and a persistent set of services that are offered to communities, and you have to have a plan that allows for maintenance and transition. That has been a very important step forward in understanding and supporting the concepts of a persistent infrastructure.
GCJ: We talked about some of the bioscience applications and those being some of your marquee case studies, is there anything else from different aspects of science or different environments, that you think are marquee case studies for what's been done in TeraGrid?
Skow: Well, I think that there are probably four that I would show as really wonderful examples of what can be done.
There's the work in molecular dynamics and a lot of effort in something called NAMD-G at NCSA. NAMD is a standard molecular dynamics code that's used by a very wide reaching community, and now here's a grid version that's enabled people to deal with much more complex workflows than they've been able to do before. Klaus Shulten is one of the biggest users of that.
The nanoHUB folk at Purdue, where they have 15,000 users from all across the country, from K-12 schools all the way up to world-class researchers, engaging in a community about nanotechnology and trying to build up tools and exchange efforts and exchange teaching materials. This has been a wonderful example of a community forming around cyber infrastructure and sharing the wealth. The TeraGrid engagement is quite small. There are well under 100 users that have specifically used the TeraGrid resource, but we provide some spot capabilities that they wouldn't have otherwise, and this can lead to a potentially very large user pool. So we find that to be very exciting.
Then there are the SCEC folks at the Southern California Earthquake Center, which are very aggressive in using the kinds of technology where they're creating a virtual cluster on the fly. They basically are going out and finding cycles that are available from a whole portfolio of resources across the country, splitting up their work and getting it done across whatever ensemble they can use. And this has members coming and going all the time and hundreds of thousands of jobs being done. So, very dynamic, very robust kind of computation, where individual elements of their virtual cluster may come and go at any time, the overall ensemble is extremely stable and productive. Those are energizing examples.
GCJ: Very interesting examples indeed. To close out, let's look forward, what's on the docket for TeraGrid next year? Any big announcements or developments? What are we going to see out of TeraGrid in 2007?
Skow: Well, we have, I think, three major pushes. One is - as I mentioned, there is a very large machine coming online next summer in Texas. This is the first of four, coming one a year, large machine investments that the National Science Foundation is making. So this one machine, when it comes up to full strength, is going to be 400 teraflops itself, which is more than twice the total capacity of the TeraGrid today. So bringing in that very large single resource into the mix is going to open up a whole bunch of possibilities as well as augment the top-end capabilities for the users that are constrained on the number of compute cycles they can get a hold of today. So that'll be very interesting to see how that fits into the mix and what kinds of advances the science community can make out of just throwing more muscle into the mesh.
We're also really expecting that this is going to be a year where the entry point into using TeraGrid through science gateway kind of approaches really takes off. We're at the point now where there are probably about 150 regular users coming in through the gateway approach. And we're hoping to double or triple that over the next year and really start to see the turn-on of adoption of that kind of usage pattern.
The third is we are putting a push on to try and deal with our success, and make the usage easier and more robust and more transparent, so that people can understand what is going on with the system and so we can all use it better.
GCJ: That's a good problem to have, working to maximize - being the victim of your own success.
Skow: Yes, that is a good problem to have.
close window |
|