Effective way to delete multiple lines in HBase

Is there an effective way to delete multiple lines in HBase, or is my use odor not suitable for HBase?

The table has a β€œchart” that contains the elements that are in the charts. The row keys are in the following format: chart|date_reversed|ranked_attribute_value_reversed|content_id

Sometimes I want to regenerate a chart for a given date, so I want to delete all the lines from "chart | date_reversed_1" to "chart | date_reversed_2". Is there a better way than issuing Delete for each line found during the scan? All lines that need to be deleted will be close to each other.

I need to delete rows because I do not want one element (one content_id) to have several records that it would have if its ranked_attribute_value was changed (its change is the reason that the chart needs to be restored).

As a newbie to HBase, maybe I could abuse the strings for something that would be better if you had designer suggestions, cool! Or maybe the diagrams are better generated in the file (for example, there is no HBase to output)? I am using MapReduce.

+8
hbase mapreduce hadoop
source share
3 answers

Firstly, when moving to the delete point in HBase, AFAIK the range is not deleted yet. But there is a way to remove more than one row at a time in the HTableInterface API . To do this, simply create an Delete object with the row keys from the scan and put them on the list and use the API! To speed up the scan, do not include any column family in the scan result, since all you need is a row key to delete entire rows.

Secondly, about design. Firstly, my understanding of the requirement is that there is content with a content identifier, and each content has charts generated against them, and this data is saved; there can be several charts for content by date and depends on the rank. In addition, we want the last generated content table to appear at the top of the table.

For my assumption about the requirement, I would suggest using three tables - auto_id, content_charts and generate_order. The row key for content_charts will be its content identifier, and the row key for generate_order will be long, which will be automatically reduced using the HTableInterface API . To decrease, use "-1" as the sum to offset and initialize the Long.MAX_VALUE value in the auto_id table when you first start the application or manually. Now, if you want to delete chart data, simply clear the column family using delete , then return the new data, and then enter the table of generated_orders. Thus, the last insert will also be at the top of the last insert table, which will contain the content identifier as the cell value. If you want generate_order to have only one entry for each content, first save the identifier of the generated_identification unit and take the value and save it in content_charts when setting and before deleting the column family first remove the row from generate_order. Thus, you can search and display for content using 2, gets the maximum and does not require scanning for diagrams.

I hope this will be helpful.

+7
source share

You can use BulkDeleteProtocol, which uses a scan that determines the appropriate range (start line, end line, filters).

Look here

+2
source share

I came across your situation and this is my code to implement what you want.

 Scan scan = new Scan(); scan.addFamily("Family"); scan.setStartRow(structuredKeyMaker.key(starDate)); scan.setStopRow(structuredKeyMaker.key(endDate + 1)); try { ResultScanner scanner = table.getScanner(scan); Iterator<Entity> cdrIterator = new EntityIteratorWrapper(scanner.iterator(), EntityMapper.create(); // this is a simple iterator that maps rows to exact entity of mine, not so important ! List<Delete> deletes = new ArrayList<Delete>(); int bufferSize = 10000000; // this is needed so I don't run out of memory as I have a huge amount of data ! so this is a simple in memory buffer int counter = 0; while (entityIterator.hasNext()) { if (counter < bufferSize) { // key maker is used to extract key as byte[] from my entity deletes.add(new Delete(KeyMaker.key(entityIterator.next()))); counter++; } else { table.delete(deletes); deletes.clear(); counter = 0; } } if (deletes.size() > 0) { table.delete(deletes); deletes.clear(); } } catch (IOException e) { e.printStackTrace(); } 
+2
source share

Source: https://habr.com/ru/post/650772/


All Articles