You can export using Hadoop-MongoDB . Just run the Hive request in the main working method. This output will then be used by Mapper to insert data into MongoDB .
Example:
Here I insert a comma delimited text file (id; firstname; lastname) into the MongoDB collection using a simple catch request:
import java.io.IOException; import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Statement; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import com.mongodb.hadoop.MongoOutputFormat; import com.mongodb.hadoop.io.BSONWritable; import com.mongodb.hadoop.util.MongoConfigUtil; public class HiveToMongo extends Configured implements Tool { private static class HiveToMongoMapper extends Mapper<LongWritable, Text, IntWritable, BSONWritable> {
One of the drawbacks is that storing intermediate Hive output requires an 'intermediate region (/ user / hive / tmp). Also, as far as I know, the Mongo-Hadoop connector does not support upserts.
I'm not quite sure, but you can also try to extract data from Hive without starting the hiveserver that provides the Thrift service so you can save some overhead. Look at the source code for the Hive method org.apache.hadoop.hive.cli.CliDriver#processLine(String line, boolean allowInterupting) , which actually executes the request. Then you can hack something like this:
... LogUtils.initHiveLog4j(); CliSessionState ss = new CliSessionState(new HiveConf(SessionState.class)); ss.in = System.in; ss.out = new PrintStream(System.out, true, "UTF-8"); ss.err = new PrintStream(System.err, true, "UTF-8"); SessionState.start(ss); Driver qp = new Driver(); processLocalCmd("SELECT * from users", qp, ss);
Side notes:
There is also a hive-mongo implementation that you can also check. It's also worth taking a look at the Hive-HBase implementation to get an idea of whether you want to implement the same for MongoDB .
source share