Hadoop is Apache’s open source data processing framework based on Map Reduce programming concept. Written in Java, it uses distributed file system on commodity hardware to process and store data. Over the last two years, it has gained popularity and acceptance among enterprises and is becoming a de-facto framework for data processing and big data analytics.
Hadoop works over a distributed network and requires a paradigm shift in programming. Installation and management of Hadoop over a cluster is challenging.
To reduce the complexity of installation, version management and administration, many companies have been working on their own versions of Hadoop distribution. Notable companies which are early entrants include Cloudera, HortonWorks, MapR.
This week saw 3 major players announce their own distribution of Hadoop.
1. Greenplum introduces Pivotal HD Hadoop distribution
This integrates Greenplum’s Massively parallel processing (MPP) database technology with Hadoop.
2. HP partners with Cloudera, HortonWorks and MapR to provide Integrated Hadoop Distribution
HP has partnered with Cloudera, HortonWorks and MapR to provide Hadoop distributions which will help ease the deployment and management of Hadoop over clusters
3. Intel announces its own Hadoop distribution
This distribution is optimized for Intel Xeon platform. Intel is trying to retain (and probably increase) its share in datacenter market through this distribution.
With the 3 big players entering the race, the contribution to the open source Hadoop project will increase and also more and more enterprises will adopt Hadoop for their data analysis. The Hadoop distribution segment looks crowded with so many players. Reminds me of Linux which has a number of distributions. We will have to wait and watch who will emerge as the leader.
What are your thoughts?