I knew it couldn’t possibly end there …

I recently sent Jon Toigo (DrunkenData) a formal response to questions he and some of his readers had about Data ONTAP GX, NetApp’s scale-out storage architecture.

In a followon comment, one of Jon Toigo’s readers, who failed to indicate his affiliation, but whose name links to LeftHand Networks :-), wrote:

“It’s nice to see NetApp being open about its GX architecture. This being said, the architectural deficiencies were clearly spun in a positive light with marketing hype.”

So, in the interest of educating that reader (and perhaps some other readers) about scale-out storage and the scale-out computing revolution, I thought I’d take the opportunity here to share some of my experience and personal observations.

And … sorry to disappoint, but no “spin” here.

In that post, Jon’s reader missed two key points, imho.

First, the *primary* purpose of almost all scale-out storage architectures is to provide scalable AGGREGATE performance for a large number of clients - not to speed single stream performance to a single host.  This is true both in technical computing apps and in largescale enterprise deployments.

In fact, most hosts in cluster computing configurations (or enterprise server environments) have only a single GbE (or 2Gb FC) into the client fabric - so you’ll never get more than a wire’s worth of bandwidth (90 or 100 MB/s for GbE) to a single host anyway.

For higher performance interconnects - including 10GbE and IB, it *is* possible for single hosts to obtain higher throughputs, but you still won’t exceed the single wire (or single “logical wire” if you’re using and can leverage link agg) performance from host to storage subsystem.  So the fact that a host goes through an N-blade in the GX architecture to get the data is irrelevant as long as the N-blade can deliver the single session bandwidth expected by the host - which, of course, it can.

That having been said, technical computing apps requiring high aggregate I/O for “single problem” scenarios do exist.  But those applications are structured in one of two ways to leverage parallel computing - either as a set of “embarrassingly parallel” processes, or as part of a large parallel job (using, e.g. MPI).

In the former case, this is simply an aggregate I/O problem with lots of hosts accessing lots of files on the shared storage system (perhaps in the same directory, perhaps in multiple directories).  In the latter case, the individual “workers” in an MPI communication group (node/process set) will EACH perform their own I/O - either accessing individual files or accessing (typically) disjoint portions of large files (perhaps through a parallel I/O layer such as MPI-IO).  For either case, GX does just as it is supposed to - it spreads the accesses across *BOTH* the N-blades and the D-blades to provide scalable I/O.

This is basically the manner in which *all* parallel / clustered file systems are employed for scale-out computing applications.  There are URLs to many of the companies developing such systems in my aforementioned post on The Scale-out Evolution.

As I indicated in my response to Jon, pNFS is an NFSv4 proposed extension that will effectively allow you to take the VLDB functionality “out-of-band”, so that a pNFS-savvy client can get a “map” of the locations of all of the file segments through the metadata server (at time of open or in response to file extend callbacks), then go directly to those segments for its “data access” commands (read, write) - in essence, eliminate the “hop” that Jon initially asked about.

There are other performance/efficiency options that can be effected in such architectures as well - both in an out-of-band metadata scenario AND in the current N-Blade implementation.  These have to do with how those components may choose to do read-ahead across distributed segments of a large shared file - effectively issuing requests for multiple portions of the file at the same time to “fill” a larger pipe back to the requestor.  With a pNFS client, the multiple segment read-ahead could be effected by the host.  With the current GX implementation, by the N-Blade.

Bottom line … yes, it’s true that a single host’s I/O will be limited by the *smaller* of: (a) it’s network ingress capability (typically 1 GbE); (b) the network egress capability of the N-Blade to which the host session (mount) is connected.  Not a problem for most apps I’ve ever seen given the hardware on which GX currently runs.

The reader’s second point has to do with the functionality that is part of the current ONTAP GX release.  The initial release of ONTAP GX, as I had indicated, is targeted at technical computing applications, for which the set of features currently provided are critical.

Technical computing applications work almost exclusively against files and filesystems.  What I think Jon’s reader is missing is the difference between a block storage platform (in which you can add bricks and controllers under a single management domain and provide striped LUNs - stuff many of the vendors’ volume managers have been doing for years) and a scale-out FILESYSTEM, which ONTAP GX presents.

As many users of parallel and clustered filesystems know, this is a very important distinction.  Large storage systems can effectively be crippled by a filesystem that hasn’t been adequately designed for scalability.  The old “metadata bottleneck” in SAN filesystems.

ONTAP GX provides both a scale-out storage platform (distributed storage “bricks”, distributed controllers, single management domain) as well as a parallel filesystem with DISTRIBUTED metadata - alleviating file system bottlenecks that would otherwise prevent full utilization of the underlying architecture.

What you’re seeing at NetApp, and I believe elsewhere in the storage industry, is the gradual maturation of scale-out storage technology (and particularly parallel/clustered filesystems) - across the industry, as companies look to leverage these innovations first in production technical computing applications (digital animation, EDA, bioinformatics, automotive/aerospace design, …), and subsequently as highly scalable enterprise storage systems (e.g. for enterprise grid deployments).

The truth of the matter, despite what some marketing departments will tell you, is that (a) building parallel/clustered filesystems is hard (especially in delivering scalable, full featured, robust, enterprise-class services); and (b) we (as an industry) are very much in the early days of the scale-out storage revolution.  Just ask someone who’s tried to deploy any of these systems, or do an in-depth survey of the scale-out storage solutions and ask them (and their customers) about feature sets and robustness.

From my (admittedly biased) perspective, the HUGE advantage NetApp has is a very successful enterprise business based on a proven scale-up approach to unified storage (SAN, iSAN, NAS) with Data ONTAP, and an emerging scale-out storage platform based on the Spinnaker legacy — all based on the same physical hardware components,  same core storage “container” architecture (WAFL), and same data management services (snapshots, mirrors, etc.).

Getting all of that right requires significant innovation, and takes time.  Hence the evolutionary, multi-release rollout of ONTAP GX.