An algorithm to cover memory ranges?

I have a set of ranges, such as {(2-8), (13-22), (380-7931), (40032-63278)}. For simplicity, we can assume that they do not overlap (overlapping ranges are already combined).

My goal is to “cover” these ranges using combinations of a given set of “lengths,” such as {4, 64, 1024, 16384}. I have to use no more than N lengths, for example N = 32. I don’t care how many “lengths” I use as long as I’m below my maximum, but I want to minimize the total “extra” area - the numbers are “covered” by a length that is not in the original range set.

Example {(2-8)}, covered (2-66) (one length out of 64 used), has 58 "extra" numbers. {(2-8)}, covered by {(2-6), (6-10)} (two lengths 4), has only 2 "extra" numbers and is preferred.

The My real world application includes pre-programming a fixed MMB TLB to ensure that only certain ranges of memory addresses are accessible (thus TLB gaps are denied access and can be considered accordingly). I would like to cover the ranges as much as possible so that violations can be detected sooner rather than later, but I only have 32 slots for work and 4 fixed page sizes. I can tune my code manually to a sufficient level of performance, but I'm curious if there is a more elegant solution to solve this problem. This seems to be related to a satchel problem, but difficult enough to search for.

+6
source share
3 answers

It can be formulated as a change in the shortest path .

We need to cover a set of ranges of total length M with at most N pages. Pages can have L different lengths, they are not aligned (can be placed at any address). The difference between the total "extra" area and the total page length is equal to the constant M , which minimizes the total page length.

Let us plot the graph associated with this problem. Each memory address in any range has a corresponding vertex on the graph. Each vertex has L outgoing edges corresponding to pages L , starting from a given address. The length of each edge is the length of the page. Each edge comes to a certain vertex of the graph, depending on where the corresponding page ends:

  • The page ends in some unoccupied position. The edges will return to the vertex corresponding to the starting address of the first range after this position.
  • The page ends within a certain range. The edges will return to the top corresponding to the end address of the page.
  • The page ends in an unoccupied position with an address greater than the address of any range. Edge comes to the top of the destination. (The starting address of the very first range corresponds to the starting vertex, and we must find the shortest path between these two vertices).

Since the resulting graph is a DAG , the shortest path can be found in linear time, processing the vertices in topological order (or, even simpler, in the order of the corresponding memory addresses).

For each vertex, store an array of N pairs {path length, reverse pointer}. Although the shortest path algorithm visits any vertex, index this array with the number of transitions in the path, and if the path is shorter than the stored path length, replace {path length, reverse pointer}. When each vertex is processed, find the shortest path in the array of pairs belonging to the target vertex, and then restore the path using the back pointers. This path describes the optimal coverage.

The worst time complexity O (L * M * N) is determined by the maximum number of edges (L * M) and the number of elements in the array belonging to each vertex (N). If the ranges are sparse, most of the edges fall into the starting address of a certain range, most of the vertices corresponding to the internal addresses are not used, and the time complexity is much less.

This algorithm requires O (M * N) space, but for sparse ranges this can be significantly reduced if we put all the vertices of the graph (or, possibly, all length / pointer pairs) in the hash map.

+1
source

Note that the ranges you want to cover have very low values ​​(let you call them "ranges"), and spaces (call them "extra") are very expensive intervals. Now we want to minimize the total cost with no more than N intervals (we will call “covers”), which cover all “ranges” and may contain some “additional” ones. The algorithm below is greedy in nature.

First, take all the intervals you want to cover (“ranges”) as well as the “extra” between them.

1) Cover them with a large interval from min to max "ranges".

2) Iterate and delete the most expensive "extra" (that is, the highest value) between and also count the divison as creating an additional "cover" until the number of covers becomes N, or you run out of expensive "additional" ones.

0
source

A sincere greedy approach that works when each length is a multiple of the last:

Start with a set of minimum intervals that cover all ranges minimally. Combine the longest iterative loop until you omit the maximum range value.

In your case, in the TLB, you probably have alignment restrictions that will make the problem a little more complicated.

0
source

Source: https://habr.com/ru/post/926122/


All Articles