Get JSON Elements from the Web with Apache Flink

After reading several pages of the Apache Flink documentation ( official documentation , dataartisans ), as well as examples presented in the official repository, I continue to see examples in which they are used as a data source for streaming a file is already loaded, it always connects to the local host.

I am trying to use Apache Flink to load JSON files containing dynamic data. My intention is to try to set a URL where I can access the JSON file as an input source for Apache Flink, instead of downloading it to another system and processing the downloaded file using Apache Flink.

Is it possible to establish this network connection with Apache Flink?

+7
java json apache-flink flink-streaming
source share
1 answer

You can specify the URLs you want to load as input to a DataStream , and then load documents from MapFunction . The following code demonstrates this:

 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<String> inputURLs = env.fromElements("http://www.json.org/index.html"); inputURLs.map(new MapFunction<String, String>() { @Override public String map(String s) throws Exception { URL url = new URL(s); InputStream is = url.openStream(); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(is)); StringBuilder builder = new StringBuilder(); String line; try { while ((line = bufferedReader.readLine()) != null) { builder.append(line + "\n"); } } catch (IOException ioe) { ioe.printStackTrace(); } try { bufferedReader.close(); } catch (IOException ioe) { ioe.printStackTrace(); } return builder.toString(); } }).print(); env.execute("URL download job"); 
+4
source share

All Articles