OutOfOrderScannerNextException when filtering results in HBase

I am trying to filter the results in HBase as follows:

List<Filter> andFilterList = new ArrayList<>(); SingleColumnValueFilter sourceLowerFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("source"), CompareFilter.CompareOp.GREATER, Bytes.toBytes(lowerLimit)); sourceLowerFilter.setFilterIfMissing(true); SingleColumnValueFilter sourceUpperFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("source"), CompareFilter.CompareOp.LESS_OR_EQUAL, Bytes.toBytes(upperLimit)); sourceUpperFilter.setFilterIfMissing(true); SingleColumnValueFilter targetLowerFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("target"), CompareFilter.CompareOp.GREATER, Bytes.toBytes(lowerLimit)); targetLowerFilter.setFilterIfMissing(true); SingleColumnValueFilter targetUpperFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("target"), CompareFilter.CompareOp.LESS_OR_EQUAL, Bytes.toBytes(upperLimit)); targetUpperFilter.setFilterIfMissing(true); andFilterList.add(sourceUpperFilter); andFilterList.add(targetUpperFilter); FilterList andFilter = new FilterList(FilterList.Operator.MUST_PASS_ALL, andFilterList); List<Filter> orFilterList = new ArrayList<>(); orFilterList.add(sourceLowerFilter); orFilterList.add(targetLowerFilter); FilterList orFilter = new FilterList(FilterList.Operator.MUST_PASS_ONE, orFilterList); FilterList fl = new FilterList(FilterList.Operator.MUST_PASS_ALL); fl.addFilter(andFilter); fl.addFilter(orFilter); Scan edgeScan = new Scan(); edgeScan.setFilter(fl); ResultScanner edgeScanner = table.getScanner(edgeScan); Result edgeResult; logger.info("Writing edges..."); while ((edgeResult = edgeScanner.next()) != null) { // Some code } 

This code triggers this error:

 org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:402) at org.deustotech.internet.phd.framework.rdf2subdue.RDF2Subdue.writeFile(RDF2Subdue.java:150) at org.deustotech.internet.phd.framework.rdf2subdue.RDF2Subdue.run(RDF2Subdue.java:39) at org.deustotech.internet.phd.Main.main(Main.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 178 number_of_rows: 100 close_scanner: false next_call_seq: 0 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3098) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111) at java.lang.Thread.run(Thread.java:745) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:354) ... 9 more Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 178 number_of_rows: 100 close_scanner: false next_call_seq: 0 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3098) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111) at java.lang.Thread.run(Thread.java:745) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1657) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1715) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174) ... 13 more 

The RPC timeout is set to 600000. I tried to remove some filters, given these results:

  • sourceUpperFilter && & (sourceLowerFilter || targetLowerFilter) → Success
  • targetUpperFilter && & (sourceLowerFilter || targetLowerFilter) → Success
  • (sourceUpperFilter & targetUpperFilter) && & (sourceLowerFilter) → Failed
  • (sourceUpperFilter & targetUpperFilter) && & (targetLowerFilter) → Failed

Any help would be greatly appreciated. Thanks.

+7
java filter hbase
source share
2 answers

Reason: searching for multiple rows from a large region. It takes time to fill out #rows at the request of the client. By this time, the client receives an rpc timeout. Thus, the client side will repeat the call on the same scanner. Remember the following. The call client says give me the following N lines wherever you are. The old failed call was executed and would have several lines. So the callback will skip these lines ... to avoid this and to distinguish this case, we have this seqno scan and this is an exception. Upon seeing this, the client will close the scanner and create a new one with the appropriate start line. But this retry method occurs only once. Again, this call can also be disconnected.

So, we need to adjust the latency and / or scan the caching.
heart rate prevents such a timeout for long scans.

In our case, when the data is huge in hbase, we used RPC time out = 1800000 and rental period = 1800000, and we used fuzzy line filters , as well as scan.setCaching(xxxx)// value need to be adjusted ;

Note: value filters are slow (since a full table scan takes a long time to execute) than a row filter

With all the above precautions, we will be able to request huge data from hbase with mapreduce.

Hope this explanation helps.

+1
source share

I solve this problem by setting hbase.client.scanner.caching

see also

The client and RS support the nextCallSeq number during the scan. Each next () call from the client to the server will increase this number on both sides. The client passes this number along with the request on the RS side, and the incoming nextCallSeq and its nextCallSeq will be matched. In the case of a timeout, this increment on the client side should not occur. If on the server side the selection of the next batch of data has ended, there will be a mismatch in the nextCallSeq number. The server will throw an OutOfOrderScannerNextException, and then the client will open the scanner again using startover as the last successfully retrieved string.

Since the problem is caused by overtime on the client side, there may be a corresponding reduction in the client cache cache (hbase.client.scanner.caching) or an increase in rpc timeout time (hbase.rpc.timeout).

Hope this answer helps.

+1
source share

All Articles