Github Project : example-talend-read-files-with-hdfs
Preamble
Configuration: Group of Context
To create the different jobs displayed in this article, you have to create a repository with saagie's platform information (value).
Copy a file from HDFS to local computer
- Create a new job
- Add the component tHDFSConnection : Allows the creation of a HDFS connection.
- Add the component tHDFSGet: Copy the HDFS file in the local directory.
- Create links:
- tHDFSConnection is connected with tHDFSGet (through "OnSubjobOk")
- Double click on tHDFSConnection and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter the Name Node URL.
The URL has to respect this format : hdfs://ip_hdfs:port_hdfs
Use context variables if possible : "hdfs://"+context.IP_HDFS+":"+context.Port_HDFS+"/" - Add the user
- Add Hadoop properties :
- "dfs.nameservices" = "cluster"
- "dfs.ha.namenodes.cluster" = "nn1,nn2"
- "dfs.client.failover.proxy.provider.cluster" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
- "dfs.namenode.rpc-address.cluster.nn1" = "nn1:8020"
- "dfs.namenode.rpc-address.cluster.nn2" = "nn2:8020"
- Untick Use Datanode Hostname
- Double Click on the component tHDFSGet :
- Tick Use an existing connection
- Add a HDFS folder
- Add a local folder
- Add a mask and set a new file name if needed
- Run the job
Comments
0 comments
Article is closed for comments.