Namenode High Availability Client Request

Can someone tell me that if I use a java application to request some file upload / download operations in HDFS with the installation of Namenode HA, where does this request go first? I mean, how does the client know which name is active?

It would be great if you provided a diagram such as a workflow or something that explains the query steps in detail (from start to finish).

+6
hadoop hdfs hadoop2 webhdfs
source share
2 answers

If the hasoop cluster is configured using HA, then it will have the namenode identifiers in hdfs-site.xml as follows:

<property> <name>dfs.ha.namenodes.mycluster</name> <value>namenode1,namenode2</value> </property> 

Whatever NameNode is running, it will become active. You can start the cluster in a specific order so that your preferred node starts first.

If you want to determine the current status of namenode, you can use the getServiceStatus () command:

 hdfs haadmin -getServiceState <machine-name> 

Well, when writing a driver class, you need to set the following properties in the configuration object:

  public static void main(String[] args) throws Exception { if (args.length != 2){ System.out.println("Usage: pgm <hdfs:///path/to/copy> </local/path/to/copy/from>"); System.exit(1); } Configuration conf = new Configuration(false); conf.set("fs.defaultFS", "hdfs://nameservice1"); conf.set("fs.default.name", conf.get("fs.defaultFS")); conf.set("dfs.nameservices","nameservice1"); conf.set("dfs.ha.namenodes.nameservice1", "namenode1,namenode2"); conf.set("dfs.namenode.rpc-address.nameservice1.namenode1","hadoopnamenode01:8020"); conf.set("dfs.namenode.rpc-address.nameservice1.namenode2", "hadoopnamenode02:8020"); conf.set("dfs.client.failover.proxy.provider.nameservice1","org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"); FileSystem fs = FileSystem.get(URI.create(args[0]), conf); Path srcPath = new Path(args[1]); Path dstPath = new Path(args[0]); //in case the same file exists on remote location, it will be overwritten fs.copyFromLocalFile(false, true, srcPath, dstPath); } 

The request is sent to the name service1 and then processed by the Hadoop cluster in accordance with the status namenode (active / standby).

See HDFS High Availability for more details.

+8
source share

Please check out the Namenode HA architecture with key objects in processing HDFS client requests.

HA architecture

Where does this query go first? I mean, how does the client know that namenode is active?

It does not matter for the client / driver which namenode is active. because we are requesting HDFS with the identifier nameservice, not the node name namenode. nameservice will automatically forward client requests to the active namenode .

Example : hdfs://nameservice_id/rest/of/the/hdfs/path

Explanation:

How does this hdfs://nameservice_id/ and what is confs in it?

In hdfs-site.xml file

Create a name service by adding id to it (here nameservice_id is mycluster )

 <property> <name>dfs.nameservices</name> <value>mycluster</value> <description>Logical name for this new nameservice</description> </property> 

Now provide the namenode identifiers to determine the namenodes in the cluster

dfs.ha.namenodes.[$nameservice ID] :

 <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> <description>Unique identifiers for each NameNode in the nameservice</description> </property> 

Then associate the namenode identifiers with the nodes of the namenode node

dfs.namenode.rpc-address.[$nameservice ID].[$name node ID]

 <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>machine1.example.com:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>machine2.example.com:8020</value> </property> 

After that, specify the Java class that HDFS clients use to communicate with the Active NameNode, so that the DFS client uses this class to determine which NameNode performs client requests .

 <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> 

Finally, after these configuration changes, the HDFS URL will be the same.

hdfs://mycluster/<file_lication_in_hdfs>

To answer your question, I have selected several configurations . please check the detailed documentation on how Namenodes, Journalnodes, and Zookeeper machines form Namenode HA in HDFS.

+8
source share

All Articles