Using the local file system as a Flume source

I was just starting to learn Big Data, and at this time I am working on Flume. A common example I met was handling tweets (an example from Cloudera) using some Java.

Just for testing and modeling, can I use the local file system as a Flume source? In particular, some Excel or CSV files? Should I also use some Java code besides the Flume configuration file, as when retrieving Twitter?

Will this source be event driven or infected?

Thanks for your input.

+5
source share
1 answer

I assume that you are using the cloudera sandbox and are talking about putting the file in the sandbox locally in the flume agent that you plan to start. The tray contains:

Source Channel Sink

They should be located locally to the flume agent. A list of available smoke sources is in the user manual: https://flume.apache.org/FlumeUserGuide.html . You can use the Exec source if you just want to transfer data from a file using tail or cat commands. You can also use the source of the buffering queue, which will look at the specified directory for new files and will analyze events from new files as they appear. Read the user manual well. Contains everything you need.

+4
source

All Articles