Sun 1 Oct 2006
Distributed Resource Management (DRM) solutions had their origin as simple batch schedulers capable of dispatching jobs to “free†cpu resources for technical computing applications. Over the years, these systems have evolved to provide a wide range of job management services that include sophisticated rule-based dispatching mechanisms, environment replication, preemption, checkpoint/restart, and reporting. Now, as some of these technical computing approaches find their way into the core of the enterprise, DRM is establishing itself as a key technology to enable Enterprise Grid computing models.
Technical Computing Roots [1]
The rapid adoption of commodity-based servers and the Beowulf distributed computing architecture has led to large compute farms in many of today’s production technical computing applications – including electronic design automation, digital animation, seismic processing, bioinformatics, and complex simulation. Facilities with 1,000 or more hosts are not uncommon. Â
As these compute farms have continued to grow, the cpu bottleneck that once limited system throughput, was replaced by other resource constraints that reduced utilization. Software licenses, for example, became the limiting factor in many EDA implementations in the early part of this decade. As a result, license-based scheduling criteria became a key part of many DRM systems.
As compute servers become more capable (64-bit processing, large memory bandwidth, GbE- or IB-based connectivity), and as networks expand to deliver increasing bandwidth, many organizations are encountering throughput limiting bottlenecks in other areas of the Grid Computing “stack†(see Figure).

One of the most frequently cited bottlenecks is networked storage, which now has to service concurrent requests from hundreds or thousands of data-hungry compute servers. Scalable parallel file systems are being deployed to satiate some of the increasing I/O demand. However, even with these systems in place, I/O bottlenecks that reduce system throughput often occur.
To handle the increasing pressures being placed on networked storage systems by scale-out computing architectures, some of these DRM systems are evolving to include storage monitoring and storage resource management as part of their scheduling policies.
Heading for the Enterprise
This expansion in scope is part of a generalization of the DRM model that is well aligned with some of the needs of enterprise (utility) computing grid environments. Utility computing relies heavily on the principle of consolidation, where physical resources are shared by a number of applications. In such an environment, it is essential that system resources be effectively managed to ensure adequate application service levels. “Binding†an application to a set of resources based on the application’s requirements, current resource utilization levels, availability of application data, and application service level requirements, is essentially the same problem that DRM systems have been solving for years.
As a result, we are seeing systems like Platform Computing’s Enterprise Grid Orchestrator (EGO) and Qlusters’ openQRM move squarely into the enterprise data center management space.
As Enterprise Grids continue to be deployed, and service-oriented architectures (SOAs) mature, look to DRM-derived solutions to form the foundation of enterprise application management suites.
References
[1] Drawn from “Storage-aware Job Schedulingâ€, a presentation to the Platform Grid Conference 2006, San Francisco, CA.