Apache flute twitter agent does not transfer data streams

I am trying to transfer twitter feeds to hdfs and then use hive. But the first part, streaming data and downloading to hdfs do not work and give a Null Pointer exception.

This is what I tried.

1. Downloaded apache-flume-1.4.0-bin.tar . Pulled it out. All contents are copied to / usr / lib / flume / . in / usr / lib / i changed ownership to user for the flume directory. When I execute the ls command in / usr / lib / flume / , it shows

bin CHANGELOG conf DEVNOTES docs lib LICENSE logs NOTICE README RELEASE-NOTES tools 

2. Move to the conf / directory. I copied the file flume-env.sh.template as flume-env.sh and I edited JAVA_HOME in my path java /usr/lib/jvm/java-7-oracle .

3. Next, I created a file called flume.conf in the same conf directory and added the following contents

 TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = <Twitter Application API key> TwitterAgent.sources.Twitter.consumerSecret = <Twitter Application API secret> TwitterAgent.sources.Twitter.accessToken = <Twitter Application Access token> TwitterAgent.sources.Twitter.accessTokenSecret = <Twitter Application Access token secret> TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, couldera, data science, data scientist, business intelligence, mapreduce, datawarehouse, data ware housing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/%Y/%m/%d/%H/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 600 TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 10000 TwitterAgent.channels.MemChannel.transactionCapacity = 100 

I created an application on twitter. The generated token also added all the keys to the above file. API key I added as a consumer key .

I downloaded flash sources from cloudera -files , as they mentioned here .

4. I added flume-sources-1.0-SNAPSHOT.jar to / user / lib / flume / lib .. p>

5. Launched Hadoop and performed the following

 hadoop fs -mkdir /user/flume/tweets hadoop fs -chown -R flume:flume /user/flume hadoop fs -chmod -R 770 /user/flume 

6. I run the following in / user / lib / flume

 /usr/lib/flume/conf$ bin/flume-ng agent -n TwitterAgent -c conf -f conf/flume-conf 

He shows the JAR, which he shows, and then exits.

When I checked hdfs there are no files in it. hadoop fs -ls /user/flume/tweets and shows nothing.

In hadoop, the core-site.xml file has the following configuration

 <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> <fina1>true</fina1> </property> </configuration> 

thanks

+7
java twitter hadoop cloudera flume
source share
2 answers

I ran the following command and got a job

 bin/flume-ng agent –conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent 
+4
source share

I used this command and it works

 flume-ng agent --conf /etc/flume-ng/conf/ -f /etc/flume-ng/conf/flume.conf - Dflume.root.logger=DEBUG,console -n TwitterAgent 
0
source share

All Articles