I would suggest looking at the code and doing phase wise export.
If the table is really big, here are some tips that you can try, after seeing the Export command code, you can adjust the cache size, apply a scan filter
see below Export api from hbase
see use command: giving you more options.
With my experience, cachesize (not batch size = number of columns at a time) and / or a custom filter condition should work for you. For example: if your key starts as 0_, where 0 is the name of the region, first export these lines by specifying a filter and then the following region data ... so on. below is a fragment of ExportFilter through which you can understand how this works.
private static Filter getExportFilter(String[] args) { 138 Filter exportFilter = null; 139 String filterCriteria = (args.length > 5) ? args[5]: null; 140 if (filterCriteria == null) return null; 141 if (filterCriteria.startsWith("^")) { 142 String regexPattern = filterCriteria.substring(1, filterCriteria.length()); 143 exportFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(regexPattern)); 144 } else { 145 exportFilter = new PrefixFilter(Bytes.toBytesBinary(filterCriteria)); 146 } 147 return exportFilter; 148 } 153 private static void usage(final String errorMsg) { 154 if (errorMsg != null && errorMsg.length() > 0) { 155 System.err.println("ERROR: " + errorMsg); 156 } 157 System.err.println("Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " + 158 "[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]\n"); 159 System.err.println(" Note: -D properties will be applied to the conf used. "); 160 System.err.println(" For example: "); 161 System.err.println(" -D mapreduce.output.fileoutputformat.compress=true"); 162 System.err.println(" -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec"); 163 System.err.println(" -D mapreduce.output.fileoutputformat.compress.type=BLOCK"); 164 System.err.println(" Additionally, the following SCAN properties can be specified"); 165 System.err.println(" to control/limit what is exported.."); 166 System.err.println(" -D " + TableInputFormat.SCAN_COLUMN_FAMILY + "=<familyName>"); 167 System.err.println(" -D " + RAW_SCAN + "=true"); 168 System.err.println(" -D " + TableInputFormat.SCAN_ROW_START + "=<ROWSTART>"); 169 System.err.println(" -D " + TableInputFormat.SCAN_ROW_STOP + "=<ROWSTOP>"); 170 System.err.println(" -D " + JOB_NAME_CONF_KEY 171 + "=jobName - use the specified mapreduce job name for the export"); 172 System.err.println("For performance consider the following properties:\n" 173 + " -Dhbase.client.scanner.caching=100\n" 174 + " -Dmapreduce.map.speculative=false\n" 175 + " -Dmapreduce.reduce.speculative=false"); 176 System.err.println("For tables with very wide rows consider setting the batch size as below:\n" 177 + " -D" + EXPORT_BATCHING + "=10"); 178 }
source share