Cloudera
vs Hortonworks vs MapR: Comparing Hadoop Distributions
For
all those looking to harness the potential of big data, Hadoop is the platform
of choice. This open source software framework enables processing of huge data
sets by distributing them across commodity servers. Thus, it eliminates
dependency on high-end hardware and makes the entire process economical for
businesses to implement. All of the big data enterprises today use Apache
Hadoop in some way or the other. To simplify working with Hadoop, enterprise
versions like Cloudera, MapR and Hortonworks have sprung up.
In
its original version, Hadoop was designed as a simple write-once storage
infrastructure. But it has evolved through the years to expand beyond mere web
indexing capacity. Based on Google’s MapReduce model, Hadoop is designed to
store and process large amounts and variety of data that may reside in multiple
computer servers.
While
Hadoop’s distributed file system (HDFS) helps break down all incoming data and
store them across multiple nodes, the MapReduce component facilitates the
simultaneous processing of data across multiple nodes.
Hadoop
is by no means an out-of-the-box solution. In order to build a truly
information- driven enterprise, where decisions are based on data and not guess
works, the companies would require a data management solution that not only
offers robust data governance, but also is easily manageable and seamlessly
integrates with existing enterprise infrastructure.
The
flexible, modular architecture of haddoop allows for adding new functionalities
for the accomplishment of diverse Big Data tasks. A number of vendors have
taken advantage of Hadoop’s open-ended framework and tweaked its codes to
change or enhance its functionalities. In the process they have been able to
fix some of the inherent drawbacks of Apache Hadoop. So far as Hadoop
distribution is concerned, the three companies that really stand out in the
completion are: Cloudera, MapR and Hortonworks.
Comparing top three
Hadoop distributions: Cloudera vs Hortonworks vs MapR
Cloudera
has been here for the longest time since the creation of Hadoop. Hortonworks
came later. While Cloudera and Hortonworks are 100 percent open source, most
versions of MapR come with proprietary modules. Each vendor/distribution has
its unique strength and weaknesses, each have certain overlapping features as
well. If you are looking to make the most of Hadoop’s immense data processing
power, it makes sense in making a comparative study in the top three Hadoop
distributions.
Cloudera
Cloudera
Inc. was founded by big data geniuses from Facebook, Google, Oracle and Yahoo
in 2008. It was the first company to develop and distribute Apache Hadoop-based
software and still has the largest user base with most number of clients.
Although the core of the distribution is based on Apache Hadoop, it also provides
a proprietary Cloudera Management Suite to automate the installation process
and provide other services to enhance convenience of users which include
reducing deployment time, displaying real time nodes’ count, etc.
Cloudera Overview
Hortonworks
Hortonworks,
founded in 2011, has quickly emerged as one of the leading vendors of Hadoop.
The distribution provides open source platform based on Apache Hadoop for
analysing, storing and managing big data. Hortonworks is the only commercial
vendor to distribute complete open source Apache Hadoop without additional
proprietary software. Hortonworks’ distribution HDP2.0 can be directly
downloaded from their website free of cost and is easy to install. The
engineers of Hortonworks are behind most of Hadoop’s recent innovations
including Yarn, which is better than MapReduce in the sense that it will enable
inclusion of more data processing frameworks.
Hortonworks Overview
MapR
In
its standard, open source edition, Apache Hadoop software comes with a number
of restrictions. Vendor distributions are aimed at overcoming the issues that
the users typically encounter in the standard editions. Under the free Apache
license, all the three distributions provide the users with the updates on core
Hadoop software. But when it comes to handpicking any one of them, one should
look at the additional value it is providing to the customers in terms of
improving the reliability of the system (detecting and fixing bugs etc),
providing technical assistance and expanding functionalities.
All
three top Hadoop distributions, Cloudera, MapR and Hortonworks offer
consulting, training, and technical assistance. But unlike its two rivals, Hortonworks’ distribution is claimed
to be 100 percent open source. Cloudera incorporates an array of proprietary elements in
its Enterprise 4.0 version, adding layers of administrative and management
capabilities to the core Hadoop software.
Going a step further, MapR replaces HDFS
component and instead uses its own proprietary file system, called MapRFS.
MapRFS helps incorporate enterprise-grade features into Hadoop, enabling more
efficient management of data, reliability and most importantly, ease of use. In
other worlds, it is more production ready than its other two competitors.
Through
a recent partnership with Canonical,
the creator of Ubuntu operating system, MapR is offering Hadoop as a default
component of Ubuntu operating system. Under the terms of the
partnership, MapR’s M3 Edition for Apache Hadoop will be integrated into Ubuntu
operating system.
Upto
its M3 edition, MapR is free,
but the free version lacks some of its proprietary features namely, JobTracker
HA, NameNode HA, NFS-HA, Mirroring, Snapshot and few more.
MapR Overview
Cloudera and
Hortonworks: The Similarities
Cloudera as well as Hortonworks are
both built upon the same core of Apache Hadoop. As such, they have more
similarities than differences.
·
Both
offer enterprise-ready Hadoop distributions. The distributions have stood the
test of time as well as consumers, ensuring security and stability. Besides,
they provide paid training and services to familiarize the newcomers treading
the path of Big Data and Analytics.
·
Both
have established communities that actively participate and help with the
problems faced as well as demonstrations needed.
·
Both distributions have master-slave
architecture.
·
Both have a shared-nothing computing
framework.
·
Both support MapReduce as well as
YARN.
Cloudera vs.
Hortonworks: The Differences
That
being said, the differences are the ones that play a deciding role of choosing
one vendor over the other. Broadly, Cloudera and Hortonworks differ in the
following aspects:
·
Cloudera
has announced that its long
term goal is to become an “enterprise data hub,” thus diminishing the
need of data warehouse. Hortonworks, on the other hand, remains firmly a
provider of Hadoop distro, and has partnered with data warehousing company
Teradata.
·
While
Cloudera CDH can be run on windows server, HDP is available as a native
component on the windows server. A Windows-based Hadoop cluster can be deployed
on Windows Azure through HDInsight Service.
·
Cloudera
has a proprietary management software Cloudera Manager, SQL query handling
interface Impala, as well as Cloudera Search for easy and real-time access of
products. Hortonworks has no proprietary software, uses Ambari for management
and Stinger for handling queries, and Apache Solr for searches of data.
·
Cloudera has a commercial license,
while Hortonworks has open source license. Cloudera also allows the use of its
open- source projects free of cost, but the package doesn’t include the management suite Cloudera
Manager or any other proprietary software.
·
Cloudera
has a free 60-day trial, Hortonworks is completely free.
Cloudera has been the oldest player
in the market, with more than 350 customers. But Hortonworks is fast catching
up and has made more innovations in the Hadoop ecosystem in the recent past.
Cloudera has several enterprise softwares overlaid on its open source distributions
to aid the consumers, whereas Hortonworks strives to provide a framework
comprising only of open source projects.
No comments:
Post a Comment