Globus Toolkit Developer's Forum
Ravi Madduri
Software Developer
Argonne National Laboratory
Ravi Madduri

GCJ: You're one of the major contributors in the Reliable Transfer Service (RFT) work. Give us a little context for why this service exists, and the technical issues that it solves in Grid.

Madduri: RFT is a default data transfer service in the Globus Toolkit.

To provide a brief history, RFT started as a student project in 2001 - when I first joined Argonne as a summer intern working for Bill Allcock. At that time, clients in a Grid could transfer files from one machine to the other, but they always required an actual person to make sure that the transfer did complete. And if the transfer failed for some reason, the users had to restart the whole transfer from scratch. This might not be that big of a deal if you're transferring a small number of files ... but if you're transferring terabytes of data, you can't be realistically expected to baby sit the transfer for a few days and restart it from scratch whenever transient failures (like network meltdowns, file system failures etc) happen . What you want is something that does that for you, like a service, that you can "fire and forget."

GridFTP of course, is commonly used robust protocol to transfer files from one location to another in a grid environment. But it does not include a "fire-and-forget" mechanism. If you want to make sure the files have reached their destination, you really have to be on the machine and monitor the transfer continuously and restart if there are any failures. So essentially GridFTP requires that the client maintain an open socket connection to the server throughout the transfer.

RFT is a Web Services Resource Framework (WSRF) compliant web service that provides "job scheduler"- like functionality for data movement. You simply provide a list of source and destination URLs (including directories or file globs) and then the service writes your job description into a database and then moves the files on your behalf. Once the service has taken your job request, interactions with it are similar to any job scheduler. Service methods are provided for querying the transfer status, or you may use standard WSRF tools (also provided in the Globus Toolkit) to subscribe for notifications of state change events. We provide the service implementation which is installed in a web services container (like all web services) and a very simple client.

GCJ: When it retries the transfer, does it have to start from the beginning again?

Madduri: No, that's one of the great things about using RFT - it leverages the restart markers on GridFTP. They're markers that tell you how much data has already been transferred, and RFT stores them in a database. So when you say resume, it will resume from the point where it left it before. No matter what failures happen - network failures, file system failures, anything, RFT resumes the transfer from the last known point.

Say you're transferring a million files. And you are transferring the data from, say, Argonne to ISI, at the University of California. And somewhere during the transfer, some router in between the University of Chicago and Argonne and University of California goes down. The user does not have to do anything. RFT tries to transfer the file and it gets a transient failure, saying that I'm not able to reach this host because network is down somewhere. And RFT will wait and retry after some time. And the number of retries is a configurable option. You can tell RFT to retry forever, or at pre-determined intervals.

GCJ: How does RFT communicate with databases?

Madduri: There are two types of databases that RFT can use. One database is an embedded database, which will be part of the container, for the next Globus Toolkit release. It's not part of the current, 4.0 release - but it will be part of the next release. The embedded database lives inside the GT4 container. And RFT talks to it using JDBC, which is a Java database protocol that almost all the major database vendors support, like Oracle, MySQL, and others. the access to the database from RFT happens through JDBC.

So with that, what happens is you can plug in any database you want. You can swap in any database that supports JDBC, and RFT will just work. So in the embedded database case, the communication does not go through a network layer. It just happens within the container itself.

The other type of database is your normal database like Postgresql, MySQL and Oracle. RFT works well with all of them using JDBC.

GCJ: Is RFT only available in Globus environments, or does it work with other types of environments as well?

Madduri: Right now, it just works with Globus, with GridFTP. But we do have plans to extend it to work with other types of data transfers, like FTP, normal FTP, HTTP, officially anything that you can use to transfer a bit from one location to another.

GCJ: Who's using it currently?

Madduri: Right now, the GT4 GRAM users are one of the biggest users of RFT, because they use it for staging in and staging out of files and the cleaning up after the job is done. Data Replication Service in GT4 uses RFT under covers to perform the data transfer.

And TeraGrid is using RFT. There's a tool called TGCP in TeraGrid that - it's like a copy command for grid. It uses RFT underneath to do the transfer Other potential users are LIGO, which is a big science project. They're planning to use RFT in a big scale sometime soon. There's also the Sloan Digital Sky's usage of RFT. They had a 1 million file, 4 terabyte data archive that they transferred with RFT, using one single command line invocation. The transfer took 4 weeks to finish during which time RFT automatically recovered from a set of failures (Network, file system etc) and finished the transfer.

GCJ: Now, when GT4 came out and some of the Web services-enabled protocols were more formally instantiated, how did that change RFT? Did you have to make any adjustments?

Madduri: RFT, right from its inception, has been a Web service. So, essentially - I evolved RFT along with the Globus Toolkit. In GT3,it used OGSI Framework that the GT3 core provided. And now in GT4, RFT uses WS-RF Framework that GT4 core provides - it's always been a Web service.

GCJ: Any other features you're trying to build in or new directions?

Madduri: Yes. We are actively talking to communities and trying to get more requirements... trying to understand more of the problems that they face in the data transfer area, and trying to include hooks in RFT to design and make high level services possible. We are starting a working group at GGF to standardize data transfer interfaces in Grid environments, which I think, is a major step forward. There are couple of other nice features that I am planning to include in RFT, I'm trying to add this feature to pause and resume transfers. It's essentially an operation that would let a higher-level service or a client to pause some transfers because of some policy decision. You can say I want these transfers to stop for now and resume later. Other features we are trying to add are priorities for transfers. So you can potentially say if the transfer is from an important experiment , it would have more priority than anybody else. So it's more policy-based, like how a job scheduler performs for computational jobs. We are trying to put that functionality in for data transfers.

All services should be a mirror of the policy at every level. So what I'm trying to do with RFT is to provide hooks for people to plug in these rich policies.

close window