Guest Expert
Dave Pearson
Oracle
Dave Pearson

SOA and Storage Virtualization Trends Meet Grid

Dave Pearson - Oracle's Grid Program Director in the UK -- explains how organizations' desire to have easier access to their information assets is driving today's SOA and storage virtualization trends, and the different storage / data demands between the research and academic world, and the enterprise world.

GCJ: What's your take on Grid's progress in enterprise? And how do you see the Grid evolution relating to information management trends in enterprise?

Pearson: I think a lot of the progress in Grid is still happening in the scientific and academic world, but I also think that enterprise Grid adoption is beginning to take off in the commercial and public sectors. In the context of databases, Oracle thinks that's very important, because there are a number of industry drivers -- business and technical -- which make Grid infrastructure very important.

We're all familiar with the key Grid concepts, virtualization and the ability to pool and share compute resources, provision them, and manage them in an automated fashion. For most companies a major challenge is the complexity in managing and making available vast amounts of data so a lot of the discussion in the Enterprise is about the role of Grid as a solution to that problem.

The evolution and commoditization of storage technologies over the last five years has resulted in a big trend away from DAS to SAN and NAS. At the same time a whole range of networking technologies such as iSCSI, 10Gb Ethernet and Infiniband are making it far easier to integrate storage systems together and manage storage effectively. More enterprises are virtualizing their storage for these reasons. You can put all of your storage into a single pool, allocate and manage it dynamically. You can also do things like automated striping, loading, repositioning and copying in the background without taking systems down, so there's no impact on service levels.

So the overall enterprise trend is really being driven by organization's desire to have easier access to all of its information assets all of the time. Obviously, with SOA, you can federate data effectively and access it through Web services, and that's great in the context of online transaction processing or customer-facing systems, where you're just looking at small amounts of data.

But if you were to draw a comparison with the scientific world - here you're dealing with large amounts of data that you want to move around quickly, from one computation point to another. That's necessarily happening much in enterprise at the moment with Grid environments. In the enterprise, data tends to be closely aligned with applications, and it's going to be some time before people virtualize their data across multiple applications, whole-scale across the data center. There are good gateways technologies and interfaces for moving large volumes of data around today, but that tends to be done within the context of general work processes - data warehouse feeds, end-of-day and month-end processing, or for consolidating results for quarterly reporting, and those sorts of specific scenarios.

In the scientific and academic world, it's more a case of doing a project, wanting to do a certain simulation, and needing to get at a big set of data. So you bring it in, run the jobs, and then replace that data with the next set of data associated with the next problem that you want to look at. That doesn't tend to happen yet in enterprise so much, where the information architecture strategies are still very much around operational data and warehouses.

One thing Oracle and other vendors are working on is breaking down the difference between operational data and historic warehouse data by merging the two together so you can do real-time monitoring of business processes, to report on indicators, trends and predictions based on both historical and current data.

GCJ: So what are some of the specific storage and data management challenges that you're seeing in enterprise today?

Pearson: One example is the massive growth in multimedia information. If you look at mobile computing and portable devices, there are players in the market who want to deliver information-rich services to a large number of consumers. To do this you need to have large amounts of data online as well as lots of processing power available to make sure that people get the right information at the right time and in the right place. This is one area where Grid's obviously very important.

I think another major challenge IT departments in the commercial sector are facing at the moment is the large volume of data they're now required to hold and maintain online. Part of that is due is result of the business desire to exploit all information assets to the full but it's also due to financial regulatory requirements, things like Sarbanes-Oxley and Basel II in Europe. But there are a number of industry-specific regulatory requirements coming in which mean that companies need to hold data for longer periods of time and have it available online for access.

Just as an example, there's another financial service initiative in Europe, called MiFID. It's a requirement to retain an audit log of trade information from the point at which prices are quoted before a trade to what happened during the settlement. This information must be available online for a five-year period and people in Europe are talking about this challenge in the context of the data snowstorms this regulation will create.

GCJ: It seems like there will be a number of new security considerations as data gets more federated in Grid environments.

Absolutely, I was at a life sciences event recently, hosted by Oracle, and one of the major demands that there is in this area is to try and link together medical information with pharmaceutical information. The objective is to try and improve phenotype identification so that you can produce drugs which are more effective on a smaller target population, rather than using a scattergun approach of giving a drug to everybody even though you know it's probably only going to work on 30% of them. So security in the scientific and healthcare world is very important.

There are a number Grid-related projects, for example caBig in America and a number of Biobank projects in Europe, where there's a large volume of medical information which is being collected. If this information can be anonymized securely it can then be made available to the pharma companies for this type of clinical research and discovery. The development of good anonymization techniques is also important because it would dramatically increase the amount of information on events and outcomes that can be made available for medical research.

GCJ: Last month on GRIDtoday there were some interesting discussions about the perception that Grid users in Europe seem to be outpacing their North American counterparts. Care to comment on that?

Pearson: It's an interesting discussion. I agree with what some of the people have written. I saw at least one opinion that Grid adoption started a bit later in Europe. I suspect CERN would disagree but that's not the point that I would want to make here.

The big difference between Europe and the U.S. at the moment, particularly in the scientific and academic Grid world, is that in Europe, the funding tap is very much still on. The EU has been running its framework programs for 20 years, through which it allocates significant amounts of money to for research and development across all scientific and technology domains. Grid has been a high priority topic for the past four or five years, so currently there's a good number of high profile, well funded research projects in Europe, and more are on the way. In contrast, I think the U.S. was doing the very big Grid projects slightly earlier than Europe, with the exception of CERN, but these have now either completed are nearing the end. At the same time there's been a cut back in funding so there are fewer newer research projects. The nature of the framework programs means the Europeans are very experienced in large collaborative projects, which typically last for three to five years and can involve more than twenty partners. They've applied their knowledge and experience in Grid-based projects which focus on collaborative working and resource-sharing across virtual organizations.

You also need to bear in mind that in addition to EU funding, each country has its own national funding organization. So in the UK, for example, there are the science research councils, each supporting a scientific domain. There's also the Department of Trade and Industry which has provided funding support to the highly successful UK e-Science Program that Tony Hey set up and ran. I think Tony did a brilliant job in promoting the program, in involving industry and in exploiting Grid for the development of e-Science. It produced a lot of successful projects, one of which, OGSA-DAI, I was lucky to be involved in.

Interestingly, if you look at Enterprise Grid activity I believe there's currently wider adoption by commercial organizations in the U.S. than in Europe, or for any other part of the world for that matter. This was illustrated again recently through Oracle's latest Grid Index report and is also consistent with the views of the main industry analysts.

close window