Explain HDFS(Hadoop File System) architecture and it's goal .

 HDFS Architecture

The architecture of a Hadoop File System is shown in figure 7.2. Those components of the architecture are described as below:



Namenode

The namenode is the commodity hardware that contains the GNU/Linux operating system and the namenode software. It is software that can be run on commodity hardware. The system having the namenode acts as the master server and it does the following tasks:

•Manages the file system namespace. 

•Regulates client's access to files.

•It also executes file system operations such as renaming, closing, and opening files and directories.


Datanode

The datanode is a piece of commodity hardware that runs the GNU/Linux operating system as well as the datanode software. A datanode will exist for each node (Commodity hardware/System) in a cluster. These nodes are in charge of their system's data storage.

Datanodes perform read-write operations on the file systems, as per the client's request. 

They also perform operations such as block creation, deletion, and replication according to the instructions of the namenode.


Block

In general, user data is saved in HDFS files. In a file system, the file will be partitioned into one or more segments and/or stored in separate data nodes. Blocks are the name given to these file chunks. In othe terms, a Block is the smallest quantity of data that HDFS can read or write. The default block size is 64MB, although it may be adjusted in the HDFS setup as needed.


Goals of HDFS

Fault detection and recovery: Because HDFS relies on a large number of commodity hardware components, component failure is common. As a result, HDFS should have techniques for rapid and automated failure detection and recovery.

Huge datasets: To manage applications with large datasets, HDFS should have hundreds of nodes per cluster.

Hardware at data: When the computation takes place near the data, a requested job can be completed quickly. It decreases network traffic while increasing throughput, especially when large datasets are involved.

Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Discuss classification or taxonomy of virtualization at different levels.