Lady who codes: HDFS Architecture

вівторок, 2 лютого 2016 р.

HDFS Architecture

Individual machines are known as nodes
A cluster can have as few as one node, as many as several thousands
Two types of nodes: NameNode and DataNode
More nodes = better performance
HDFS is a filesystem written in Java
The NameNode daemon must be running at all times If the NameNode stops, the cluster becomes inaccessible
The NameNode holds all of its metadata in RAM for fast access
A separate daemon known as the Secondary NameNode takes care of some housekeeping tasks for the NameNode
Although files are split into 64MB or 128MB blocks
Blocks are stored as standard files on the DataNodes, in a set of directories specified in Hadoop’s configuration files
Without the metadata on the NameNode, there is no way to access the files in the HDFS cluster
When a client application wants to read a file: It communicates with the NameNode to determine which blocks make up the file, and which DataNodes those blocks reside on. It then communicates directly with the DataNodes to read the data

Access to HDFS from the command line is achieved with the Hadoop shell
Typical commands:
hadoop fs –put foo.txt /tmp/foo.txt
hadoop fs –ls /users/training/f*
hadoop fs –cat foo.txt
hadoop fs –copyToLocal ~/foo.txt .
hadoop fs –mkdir /ttt hadoop fs –rm –R /ttt

Follow the general CLI manual
Follow the file manipulation CLI manual

The DFSAdmin command set is used for administering an HDFS cluster. These are commands that are used only by an HDFS administrator:
hadoop dfsadmin -help

Lady who codes

вівторок, 2 лютого 2016 р.

HDFS Architecture

Немає коментарів:

Дописати коментар

Мітки

вівторок, 2 лютого 2016 р.

HDFS Architecture

Немає коментарів:

Дописати коментар

вівторок, 2 лютого 2016 р.