摘要
收集cdh可以安装的软件列表。
HBase
Apache HBase provides random, real-time, read/write access to large data sets (requires HDFS and ZooKeeper).
HDFS
Apache Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute hosts throughout a cluster to enable reliable, extremely rapid computations.
Hive
Hive is a data warehouse system that offers a SQL-like language called HiveQL.
Hue
Hue is a graphical user interface to work with Cloudera’s Distribution Including Apache Hadoop (requires HDFS, MapReduce, and Hive).
Impala
Impala provides a real-time SQL query interface for data stored in HDFS and HBase. Impala requres Hive service and shares Hive Metastore with Hue.
Isilon
EMC Isilon is a distributed filesystem.
Key-Value Store Indexer
Key-Value Store Indexer listens for changes in data inside tables contained in HBase and indexes them using Solr.
MapReduce
Apache Hadoop MapReduce supports distributed computing on large data sets across your cluster (requires HDFS). YARN (MapReduce 2 Included) is recommended instead. MapReduce is included for backward compatibility.
Oozie
Oozie is a workflow coordination service to manage data processing jobs on your cluster.
Solr
Solr is a distributed service for indexing and searching data stored in HDFS.
Spark
Apache Spark is an open source cluster computing system. This service runs Spark as an application on YARN.
Sqoop 2
Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. The version supported by Cloudera Manager is Sqoop 2.
YARN (MR2 Included)
Apache Hadoop MapReduce 2.0 (MRv2), or YARN, is a data computation framework that supports MapReduce applications (requires HDFS).
ZooKeeper
Apache ZooKeeper is a centralized service for maintaining and synchronizing configuration data.