Fri 12 May 2006
Grid Computing is a term that has arisen in the last few years to describe a number of computer architecture approaches based on some simple but powerful principles.
The initial definition, as posed in Ian Foster’s “The Anatomy of the Grid” [1], encompassed “coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations”.  As indicated in a subsequent paper, “the key concept is the ability to negotiate resource-sharing arrangements among a set of participating parties (providers and consumers) and then to use the resulting resource pool for some purpose” [2].
Since then, the term has broadened to refer generally to the use of shared (commodity) computer components — processing and storage — in a distributed networked architecture. In essence, an architectural alternative to the development of monolithic, centralized computation and storage architectures.
That having been said, there are at least three common uses of the term “Grid” in the present IT lexicon:
-
Technical Computing Grids employ rack-mount computer systems in scale-out configurations to bring the aggregate processing power of many CPUs to bare on problems of interest.
-
(Enterprise) Utility Computing Grids provide an agile, on-demand model for application provisioning and migration based on sharing of common infrastructure resources implemented through commodity computer components (CPUs, networking, storage).
-
Data Grids provide for the distributed capture, management, and sharing of information (and sometimes instrumentation) — typically across mutliple authority domains.
Technical Computing Grids
This artchitectural approach is common in High Performance Computing (HPC) applications — ones employing large amounts of computing power to solve computationally intense problems.
Traditional HPC apps in the scientific computing space include:
-
Energy research and simulation, and high energy physics research
- Earth, ocean, and atmospheric sciences, global change, and weather prediction
- Complex multi-physics simulations for aerospace design
- Seismic data analysis
- Large scale signal and image processing applications
These are the types of applications that used to run on large supercomputers, such as those developed in the early 80s by Cray and others. As commodity-based cluster computing (Beowulf clusters) emerged, many of these application were re-hosed on large (1000+ node) compute clusters (often called “grids”).
Additionally, many new applications have arisen, in part due to the increased availability of these new commodity supercomputers. These applications are increasingly at the core of business critical operations, and include:
- Drug discovery (computational chemistry, genomics research)
- Circuit design simulations
- Automobile design simulations (aerodynamics and crash analysis)
- Risk analysis of financial portfolios
- Digital media applications (animation and rendering)
Utility Computing Grids
Utility computing is all about leveraging modular compute, network, and storage components to improve resource utilization, increase enterprise agility through rapid application provisioning (and re-provisioning), and simplify IT operations. In short, creating a more nimble, more cost effective IT organization.
The terms Grid, Utility Computing, and On-demand computing are often used almost interchangeably to describe a wide variety of approaches that are generally aimed at these objectives. These approaches are typically based on two key principles - the foundational pillars - of utility computing: Consolidation and Virtualization.
These two principles go hand-in-hand to support hosting of multiple applications - either concurrently, or in a time-share model - on the same physical resources. E.g. server virtualization, as exemplified by VMWare, Xen, and Microsoft Virtual Server supports the hosting of multiple (virtual) servers on a single physical host. The physical resources of the host - cpu, memory, I/O, network connectivity - are shared amongst a number of virtual servers. Each looks like a standalone server - with its own IP address(es), its own network and security settings, and its own OS and applications - but shares the underlying physical resources. Server virtualization improves utilization by consolidating multiple applications onto a common physical hardware platform - eliminating capex and opex costs associated with deploying multiple physical servers. This approach is particularly effective in containing the “server sprawl” that has occurred in many IT organizations where every application instance required its own server (and local storage).
Similarly, network virtualization strategies (including vlans) and storage virtualization strategies serve to allow applications shared use of those infrastructure resources, often employing Quality-of-service (QoS) provisions.
Storage virtualization strategies, in particular, aim at delivering logical storage containers (filesystems and LUNs or volumes) that transcend the physical nature of storage systems – disks and controllers. Together with tiered storage strategies (using different classes of storage for different types of data) and transparent data migration, they deliver a “storage as services†model where those services are provided in the storage network and not by individual physical devices or servers. A realization of what many have been calling information lifecycle management (ILM).
Data Grids
The notion of a data grid may be the closest concept to the original Grid concept developed by Foster et al. It represents a physically distributed set of information resources (services) contributed by multiple authorities under a common set of protocols. In some ways, the World Wide Web represents the first generation data grid, in which information is “published†by individual sites, indexed by crawlers, and accessed via search engines and explicitly represented links.
The San Diego Supercomputer Center’s Storage Resource Broker is another data grid model that presents cataloged data “collections†presentable to a community of interest [3]. SRB “presents the user with a single file hierarchy for data distributed across multiple storage systems. It has features to support the management, collaboration, controlled sharing, publication, replication, transfer, and preservation of distributed data.â€
In many ways, these efforts represent first generation data grids – with static or quasi-static data “published†into web-based documents or files, and consumed by browsers and file-savvy applications. The next generation of data – or information – grid is one based on Web 2.0 technologies that affords a richer model for compositing individual information sources into a rich set of distributed web services.
References
[1] Ian Foster, Carl Kesselman, Steven Tuecke, ”The Anatomy of the Grid”, http://www.globus.org/alliance/publications/papers/anatomy.pdf
[2] Ian Foster, “What is the Grid? A Three Point Checklist.”, http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf
[3] SDSC Storage Resource Broker, http://www.sdsc.edu/srb/index.php/Main_Page
December 31st, 2006 at 2:52 pm
Hello, the definitions serve to create a foundation for discussion but I feel that many customers need to understand much more of a value proposition perspective before they can evolve beyond a “that’s interesting” reaction. The two definitions that represent the best customer-acceptable options from those you listed are the Utility Computing and Data Grids.
The utility concept is the most understandable because of its reference to our typical utility consumption. Key to adoption, however, is the type of services, both in terms of applications and in quality, that a customer would be interested in. So what are the most viable applications for consideration? CRM, OLTP or perhaps an environment that can adopt to custom applications. Some of these are already available but may only be adopted by customers in the SMB space. To evolve, these type of grid environments will require a variety of security techniques as well as data integrity and processing gaurantees. So ultimately the value propositions include providing an application environment that only needs to be used on an as-needed basis, is easy to use in terms of transporting business data into and results from, and that has a very understandable pricing structure.
When it comes to data grids, there are perhaps some real and artificial examples based upon data context. However, context is not only defined by data consumption/usage, it has to have additional considerations for risk, cost/value, and destruction. I feel that a true data grid must have mechanisms that enable this contextualization and provide data handling tools that make all of these machinations transparent to the end user (depending on role, of course). So accessing data requires all of the authentication mechanisms that are in place but must extend across all copies of the data as well as to the physical location of the information. There must also be methods for moving the data to environments that may be more cost effective or to a combination of media types that retain data integrity but at the lowest cost. Finally, the data grid should have a Policy Engine overlay that can be customized based upon business rules. By providing this final mechanism, we get to an important value proposition caonsideration: business relevance. This engine requires a cross-industry collaboration such as those started within the SNIA based upon Content-Addressable Storage (CAS), in order to provide assurances to customers that they are not moving down the path towards vendor lock-in.
There are many requirements and many approaches but ultimately the customer wants the best solution at the lowest cost with the lowest risk - can we add to these to the Grid definitions?