Preamble
Configuration: Group of Context
This tutorial can be found on GitHub: example-talend-write-files-with-HDFS
To reproduce the different jobs displayed, you need to create a context with the following variables (default value is optional):
Write file on HDFS
- Create a new job
- Add the component tHDFSConnection : Allows the creation of a HDFS connection.
- Add the component tFileInputDelimited: Reads a file located on your computer.
- Add the component tHDFSOutput: Writes data to HDFS.
-
Create links:
-
tHDFSConnection is connected with tFileInputDelimited (through "OnSubjobOk")
-
tFileInputDelimited is connected with tHDFSOutput (through "Main")
-
- Double click on tHDFSConnection and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter the Name Node URL.
The URL has to respect this format : hdfs://ip_hdfs:port_hdfs/
Use context variables if possible : "hdfs://"+context.IP_HDFS+":"+context.Port_HDFS+"/" - Add the user
- Add Hadoop properties :
- "dfs.nameservices" = "cluster"
- "dfs.ha.namenodes.cluster" = "nn1,nn2"
- "dfs.client.failover.proxy.provider.cluster" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
- "dfs.namenode.rpc-address.cluster.nn1" = "nn1:8020"
- "dfs.namenode.rpc-address.cluster.nn2" = "nn2:8020"
- Untick Use Datanode Hostname
- Double click on the component tFileInputDelimited :
- Add the name of the local file (with its path)
- If you want, tick ".csv" and set your options.
- Double click on the component tHDFSOutput :
- Click on Edit a schema
- Enter a variable flow in Input and Output
- Enter a variable flow in Input and Output
- Tick Use an existing connection
- Enter the name of your file (in HDFS). If you want, you can change those options.
- Click on Edit a schema
- Run the job
Comments
0 comments
Article is closed for comments.