Explain HDFS(Hadoop File System) architecture and it's goal .

- September 12, 2022

HDFS Architecture

The architecture of a Hadoop File System is shown in figure 7.2. Those components of the architecture are described as below:

Namenode

The namenode is the commodity hardware that contains the GNU/Linux operating system and the namenode software. It is software that can be run on commodity hardware. The system having the namenode acts as the master server and it does the following tasks:

•Manages the file system namespace.

•Regulates client's access to files.

•It also executes file system operations such as renaming, closing, and opening files and directories.

Datanode

The datanode is a piece of commodity hardware that runs the GNU/Linux operating system as well as the datanode software. A datanode will exist for each node (Commodity hardware/System) in a cluster. These nodes are in charge of their system's data storage.

Datanodes perform read-write operations on the file systems, as per the client's request.

They also perform operations such as block creation, deletion, and replication according to the instructions of the namenode.

Block

In general, user data is saved in HDFS files. In a file system, the file will be partitioned into one or more segments and/or stored in separate data nodes. Blocks are the name given to these file chunks. In othe terms, a Block is the smallest quantity of data that HDFS can read or write. The default block size is 64MB, although it may be adjusted in the HDFS setup as needed.

Goals of HDFS

Fault detection and recovery: Because HDFS relies on a large number of commodity hardware components, component failure is common. As a result, HDFS should have techniques for rapid and automated failure detection and recovery.

Huge datasets: To manage applications with large datasets, HDFS should have hundreds of nodes per cluster.

Hardware at data: When the computation takes place near the data, a requested job can be completed quickly. It decreases network traffic while increasing throughput, especially when large datasets are involved.

Search This Blog

Notes for BSc CSIT