Pig + Cassandra: ERROR 1070

I am using hasoop 1.0.4, cassandra 1.2.2 and pig 0.11.0.

I want to run this script on grunt:

**grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});** 

but I have this error:

 **2013-03-19 11:15:54,957 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve CassandraStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]** 

The log file contains:

Pig stack tracing

ERROR 1070: Failed to resolve CassandraStorage using import: [, org.apache.pig.builtin., Org.apache.pig.impl.builtin.]

Parsing failed: Pig script failed parsing: pig script failed to check: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: CassandraStorage could not be resolved by importing: [, org.apache.pig .builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.QueryParserDriver.parse (QueryParserDriver.java:191) at org.apache.pig.PigServer $ Graph.validateQuery (PigServer.java : 1571) at org.apache.pig.PigServer $ Graph.registerQuery (PigServer.java:1544) at org.apache.pig.PigServer.registerQuery (PigServer.java UP16) at org.apache.pig.tools.grunt. GruntParser.processPig (GruntParser.java:991) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse (PigScriptParser.java:412) at org.apache.pig.tools.grunt.GruntParser.parseSernOrrntSernOrner java: 194) at org.apa che.pig.tools.grunt.GruntParser.parseStopOnError (GruntParser.java:170) at org.apache.pig.tools.grunt.Grunt.run (Grunt.java:69) at org.apache.pig.Main.run ( Main.javaโˆ—38) at org.apache.pig.Main.main (Main.java:157) Called: pork script failed to check: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: CassandraStorage could not be resolved via import: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] on org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec (LogicalPlanBuilder.java:1209) at org.apache. pig. 3183) at org.apache.pig.parser.LogicalPlanGenerator.op_clause (LogicalPlanGenerator.java:1315) at org.apache.pig. parser.LogicalPlanGenerator.general_statement (LogicalPlanGenerator.java:799) at org.apache.pig.parser. LogicalPlanGenerator.statement (LogicalPlanGenerator.java at org.apache.pig.parser.QueryParserDriver.parse (QueryParserDriver.java:184) ... 10 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Failed to resolve CassandraStorage using import: [ , org.apache.pig.builtin., Org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName (PigContext.javaโˆ—23) at org.apache.pig.parser.LogicalPlanBuilder .validateFuncSpec (LogicalPlanBuilder.java:1206)

... another 18

thanks.

+4
source share
3 answers

Based on the Pygmalion documentation project and the source of the pig_cassandra script, you can establish a connection between Cassandra and Pig by doing the following:

 for jar in $CASSANDRA_HOME/lib/*.jar; do CLASSPATH=$CLASSPATH:$jar; done; export PIG_CLASSPATH=$PIG_CLASSPATH:$CLASSPATH; export PIG_OPTS="$PIG_OPTS -Dudf.import.list=org.apache.cassandra.hadoop.pig"; export PIG_INITIAL_ADDRESS=localhost; export PIG_RPC_PORT=9160; export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner; pig 

Also be sure to include Cassandra banks in HADOOP_CLASSPATH (for example: set it to hasoop-env.sh)

+4
source

This is definitely a PIG_CLASSPATH problem. You must run pig_cassandra from the examples / pig / bin directory that comes with the cassandra source distribution. this script creates a class path for you before running the pig.

You also need to set the following env variables:

 export JAVA_HOME=Oracle java 6 dir export PIG_HOME=pig directory export PIG_CONF_DIR=hadoop conf directory(needed if running distributed mapreduce) export PIG_INITIAL_ADDRESS=ip of a cassandra node export PIG_RPC_PORT=cassandra RPC port (ie 9160) export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner 

Note: you must create the cassandra source with ant once before running pig_cassandra. This will generate some libs in the cassandra_source / build / lib / jars folder, which requires a pig_cassandra script. otherwise, you will get errors starting with the pig. I donโ€™t remember what kind of mistake. It was a bit of a method that was not detected during the serialization / deserialization phase inside the pig.

+2
source

mine was resolved by doing this

 register hdfs:/udf/cassandra-all.jar; define CqlStorage org.apache.cassandra.hadoop.pig.CqlNativeStorage(); 
0
source

All Articles