Explain different core modules and examples of Hadoop-related Software .
The core modules of Hadoop include:
- Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
- Hadoop Yet Another Resource Negotiator (YARN): A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
- Hadoop Ozone: An object store for Hadoop Examples of Popular Hadoop-related Software Popular Hadoop packages that are not strictly a part of the core Hadoop modules, but that is frequently used in conjunction with them, include: Apache Hive is data warehouse software that runs on Hadoop and enables users to work with data in HDFS using a SQL-like query language called HiveQL.
Examples of Popular Hadoop-related Software
Popular Hadoop packages that are not strictly a part of the core Hadoop modules, but that are frequently used in conjunction with them, include:
- Apache Hive is data warehouse software that runs on Hadoop and enables users to work with data in HDFS using a SQL-like query language called HiveQL
- Apache Impala is the open-source, native analytic database for Apache Hadoop.
- Apache Pig is a tool that is generally used with Hadoop as an abstraction over MapReduce to analyze large sets of data represented as data flows. Pig enables operations like join, filter, sort, load, etc.
- Apache Zookeeper is a centralized service for enabling highly reliable distributed processing. Apache Sqoop™ is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
- Apache Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed at Acyclical Graphs (DAGS) of actions.
Comments
Post a Comment