Github Project : example-talend-read-files-with-hdfs
Preamble
Configuration: Group of Context
To create the different jobs displayed in this article, you have to create a repository with saagie's platform information (value).
Read a file from HDFS (In console)
- Create a new job
- Add the component tHDFSConnection : Allows the creation of a HDFS connection.
- Add the component tHDFSInput: Read a file in the HDFS.
- Add the component tLogRow: Display the result.
- Create links:
- tHDFSConnection is connected with tHDFSInput (through "OnSubjobOk")
- tHDFSInput is connected with tLogRun (through "Main")
- Double click on "tHDFSConnection" and set its properties:
- Add a "Cloudera" distribution and select the latest version of Cloudera
- Enter the Name Node URL.
The URL has to respect this format : hdfs://ip_hdfs:port_hdfs/
Use context variables if possible : "hdfs://"+context.IP_HDFS+":"+context.Port_HDFS+"/" - Add the user
- Add Hadoop properties :
- "dfs.nameservices" = "cluster"
- "dfs.ha.namenodes.cluster" = "nn1,nn2"
- "dfs.client.failover.proxy.provider.cluster" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
- "dfs.namenode.rpc-address.cluster.nn1" = "nn1:8020"
- "dfs.namenode.rpc-address.cluster.nn2" = "nn2:8020"
- Untick Use Datanode Hostname
- Double click on the component tHDFSInput :
- Click on Edit a schema
- Enter a variable flow
- Enter a variable flow
- Tick Use an existing connection
- Enter a file name
- Click on Edit a schema
- Run the job
Comments
0 comments
Article is closed for comments.