Failed to export table from HBase

I cannot export a table from HBase to HDFS. Below is the error. This is a pretty big size. Are there any other ways to export it?

I used the command below to export. I am increasing the rpc timeout, but the work is not done yet.

sudo -u hdfs hbase -Dhbase.rpc.timeout=1000000 org.apache.hadoop.hbase.mapreduce.Export My_Table /hdfs_path 15/05/05 08:50:27 INFO mapreduce.Job: map 0% reduce 0% 15/05/05 08:50:55 INFO mapreduce.Job: Task Id : attempt_1424936551928_0234_m_000001_0, Status : FAILED Error: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:410) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:230) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96) at java.lang.Thread.run(Thread.java:745) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:304) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355) ... 13 more Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96) at java.lang.Thread.run(Thread.java:745) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:30328) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174) ... 17 more 
+5
source share
1 answer

I would suggest looking at the code and doing phase wise export.

If the table is really big, here are some tips that you can try, after seeing the Export command code, you can adjust the cache size, apply a scan filter

see below Export api from hbase

see use command: giving you more options.

With my experience, cachesize (not batch size = number of columns at a time) and / or a custom filter condition should work for you. For example: if your key starts as 0_, where 0 is the name of the region, first export these lines by specifying a filter and then the following region data ... so on. below is a fragment of ExportFilter through which you can understand how this works.

  private static Filter getExportFilter(String[] args) { 138 Filter exportFilter = null; 139 String filterCriteria = (args.length > 5) ? args[5]: null; 140 if (filterCriteria == null) return null; 141 if (filterCriteria.startsWith("^")) { 142 String regexPattern = filterCriteria.substring(1, filterCriteria.length()); 143 exportFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(regexPattern)); 144 } else { 145 exportFilter = new PrefixFilter(Bytes.toBytesBinary(filterCriteria)); 146 } 147 return exportFilter; 148 } /* 151 * @param errorMsg Error message. Can be null. 152 */ 153 private static void usage(final String errorMsg) { 154 if (errorMsg != null && errorMsg.length() > 0) { 155 System.err.println("ERROR: " + errorMsg); 156 } 157 System.err.println("Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " + 158 "[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]\n"); 159 System.err.println(" Note: -D properties will be applied to the conf used. "); 160 System.err.println(" For example: "); 161 System.err.println(" -D mapreduce.output.fileoutputformat.compress=true"); 162 System.err.println(" -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec"); 163 System.err.println(" -D mapreduce.output.fileoutputformat.compress.type=BLOCK"); 164 System.err.println(" Additionally, the following SCAN properties can be specified"); 165 System.err.println(" to control/limit what is exported.."); 166 System.err.println(" -D " + TableInputFormat.SCAN_COLUMN_FAMILY + "=<familyName>"); 167 System.err.println(" -D " + RAW_SCAN + "=true"); 168 System.err.println(" -D " + TableInputFormat.SCAN_ROW_START + "=<ROWSTART>"); 169 System.err.println(" -D " + TableInputFormat.SCAN_ROW_STOP + "=<ROWSTOP>"); 170 System.err.println(" -D " + JOB_NAME_CONF_KEY 171 + "=jobName - use the specified mapreduce job name for the export"); 172 System.err.println("For performance consider the following properties:\n" 173 + " -Dhbase.client.scanner.caching=100\n" 174 + " -Dmapreduce.map.speculative=false\n" 175 + " -Dmapreduce.reduce.speculative=false"); 176 System.err.println("For tables with very wide rows consider setting the batch size as below:\n" 177 + " -D" + EXPORT_BATCHING + "=10"); 178 } 
+1
source

All Articles