Appistry CloudIQ Storage Hadoop Edition

Appistry CloudIQ Storage Hadoop Edition ships with Hadoop Distributed File System (HDFS) drivers, enabling it to be easily deployed in place of HDFS for applications where reliability and throughput are key considerations.

Why an Alternative Storage System for Hadoop?

Apache Hadoop has become a popular open-source framework for processing and analyzing large amounts of data. The Hadoop framework includes MapReduce, a software framework for distributed processing of large datasets, and HDFS, a distributed file system designed to support MapReduce. HDFS architecture is built around a single metadata repository, called the NameNode.

Because the NameNode is not clusterable, it represents a single point of failure for the entire system. NameNode failures result in loss of service and possible loss of data, and Hadoop users go to great lengths and expense to protect against them. In addition, the NameNode stores the location of every block of file system data in a single machine's memory, limiting the system's ability to scale to support large numbers of files. Finally, the single NameNode must be consulted on every HDFS read or write, resulting in a performance bottleneck for high-throughput systems.

Appistry created CloudIQ Storage Hadoop Edition to address these challenges and make mission-critical Hadoop deployments possible.

Greater Reliability from a Fully Distributed Architecture

Unlike HDFS, CloudIQ Storage Hadoop Edition has no single point of failure and no centralized bottleneck, making it more suitable for deployments where scalability, reliability and throughput are key considerations.

With Appistry CloudIQ Storage, file system metadata is distributed across the storage system, with no centralized metadata repository or other subsystem serving as a single point of failure or performance bottleneck.

HDFSCloudIQ Storage
ReliabilityNameNode (NN) is single point of failure.System is fully decentralized with no single point of failure.
ScalabilityNN holds all block locations in memory in single machine.Metadata is distributed throughout system.
PerformanceNN is consulted on all reads/writes.Lookups are distributed across system.
ComplexityNN issues and hierarchical architecture add complexity.System is fully symmetric and easy to deploy and manager.

Ease of Integration

Appistry CloudIQ Storage Hadoop Edition ships with HDFS drivers providing plug-and-play compatibility with Hadoop MapReduce, your existing Hadoop applications and a wide variety of 3rd party tools. Using CloudIQ Storage in place of HDFS requires only a few lines of XML configuration changes.

Supported by Hadoop Ecosystem

CloudIQ Storage Hadoop Edition was designed to meet the unique needs of the Hadoop user community. Appistry has developed partnerships with leading Hadoop vendors -- including Concurrent, Datameer and Kitenga – ensuring compatibility, simplified deployment and full support of these 3rd party tools. Without any changes to their applications, users of the Concurrent, Datameer and Kitenga products can choose Appistry CloudIQ Storage as a more robust file system for their enterprise-grade Hadoop deployments.

Each of these vendors’ products has been validated against CloudIQ Storage Hadoop Edition, and each vendor will work with Appistry to support joint customers using CloudIQ Storage in place of HDFS.

Summary of Benefits

  • Unmatched performance, reliability and availability
  • Decreased capital costs
  • Simplified deployment and configuration
  • No change to Hadoop applications
  • Reduced complexity
  • Increased application uptime
  • Accelerated time-to-market
  • Works with existing Hadoop tools