I am trying to load some data from an Amazon S3 bucket:
SparkConf sparkConf = new SparkConf().setAppName("Importer"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); HiveContext sqlContext = new HiveContext(ctx.sc()); DataFrame magento = sqlContext.read().json("https://s3.eu-central-1.amazonaws.com/*/*.json");
However, this last line throws an error:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: https
The same line works in another project, what am I missing? I am running Spark on a Hortonworks CentOS virtual machine.
lte__ source share