CDH 可以安装的软件列表

2018/04/03

摘要

收集cdh可以安装的软件列表。

HBase

Apache HBase provides random, real-time, read/write access to large data sets (requires HDFS and ZooKeeper).

HDFS

Apache Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute hosts throughout a cluster to enable reliable, extremely rapid computations.

Hive

Hive is a data warehouse system that offers a SQL-like language called HiveQL.

Hue

Hue is a graphical user interface to work with Cloudera’s Distribution Including Apache Hadoop (requires HDFS, MapReduce, and Hive).

Impala

Impala provides a real-time SQL query interface for data stored in HDFS and HBase. Impala requres Hive service and shares Hive Metastore with Hue.

Isilon

EMC Isilon is a distributed filesystem.

Key-Value Store Indexer

Key-Value Store Indexer listens for changes in data inside tables contained in HBase and indexes them using Solr.

MapReduce

Apache Hadoop MapReduce supports distributed computing on large data sets across your cluster (requires HDFS). YARN (MapReduce 2 Included) is recommended instead. MapReduce is included for backward compatibility.

Oozie

Oozie is a workflow coordination service to manage data processing jobs on your cluster.

Solr

Solr is a distributed service for indexing and searching data stored in HDFS.

Spark

Apache Spark is an open source cluster computing system. This service runs Spark as an application on YARN.

Sqoop 2

Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. The version supported by Cloudera Manager is Sqoop 2.

YARN (MR2 Included)

Apache Hadoop MapReduce 2.0 (MRv2), or YARN, is a data computation framework that supports MapReduce applications (requires HDFS).

ZooKeeper

Apache ZooKeeper is a centralized service for maintaining and synchronizing configuration data.

目录