вівторок, 2 лютого 2016 р.

HDFS Architecture


  • Individual machines are known as nodes 
  • A cluster can have as few as one node, as many as several thousands
  • Two types of nodes: NameNode and DataNode 
  • More nodes = better performance 
  • HDFS is a filesystem written in Java 
  • The NameNode daemon must be running at all times If the NameNode stops, the cluster becomes inaccessible 
  • The NameNode holds all of its metadata in RAM for fast access 
  • A separate daemon known as the Secondary NameNode takes care of some housekeeping tasks for the NameNode
  • Although files are split into 64MB or 128MB blocks
  • Blocks are stored as standard files on the DataNodes, in a set of directories specified in Hadoop’s configuration files 
  • Without the metadata on the NameNode, there is no way to access the files in the HDFS cluster
  • When a client application wants to read a file: It communicates with the NameNode to determine which blocks make up the file, and which DataNodes those blocks reside on. It then communicates directly with the DataNodes to read the data
Access to HDFS from the command line is achieved with the Hadoop shell
Typical commands:
hadoop fs –put foo.txt /tmp/foo.txt
hadoop fs –ls /users/training/f*
hadoop fs –cat foo.txt
hadoop fs –copyToLocal ~/foo.txt .
hadoop fs –mkdir /ttt hadoop fs –rm –R /ttt

Follow the general CLI manual
Follow the file manipulation CLI manual

The DFSAdmin command set is used for administering an HDFS cluster. These are commands that are used only by an HDFS administrator:
hadoop dfsadmin -help

Немає коментарів:

Дописати коментар