Your questions:
- Think about any_old_process, which should cross the graph and do some work on the found objects, including adding more work.
- ... what data structure can be parallelized to achieve the goals set in the question?
Quotes:
- Some things about garbage collection.
Since you are particularly interested in parallelizing graph algorithms, I will give an example of one type of graph traversal that can be well parallelized.
Summary
Finding local minima ("pools") or maxima ("peaks") are useful operations in digital image processing. A concrete example is the analysis of the geological watershed. One approach to the problem is considering each pixel or a small group of pixels in the image as a node and finding non-overlapping minimum spanning trees (MSTs) with local minima as the roots of the tree.
Mountain Details
The following is a simplified example. This question from a web interview from Palantir Technologies led to Puzzle Programming and Code Golf AnkitSablok . This is simplified by two assumptions ( in bold ):
- That a pixel / cell has only 4 neighborhoods instead of the usual eight.
- The cell has all the neighbors in the mountains (these are local minima) or has a unique satellite. Ie, plains are not allowed.
Below is some javascript that solves this problem. It violates every reasonable coding standard against the use of side effects , but illustrates where there are some possibilities for parallelization.
- In the cycle "Create a list of sinks (that is, roots)", note that each cell can be evaluated completely independently to increase its neighbors with respect to it, as long as the height data is static. In a sequential program, one thread of execution checks each cell. In a parallel program, the cells are expanded so that one and only one stream reads and writes information about the state of the local minimum (
sink[] in the program below). If you generate a list of minima / roots in parallel, the queue operations for the stack must be synchronized. For a discussion of how to do this for stacks and other queues, see "Simple, Fast, and Practical Locks and Locks for Parallel Queue Algorithms," Michael and Scott, 1996. For modern updates, follow the citation tree in Google Scholar (no mutexes required :). - In the cycle βEvery root explores it,β note that each pool can be explored / illustrated / flooded in parallel.
If you want to delve deeper into parallelizing MSTs, see βScalable Parallel Computation of Minimum Forest Petalsβ, Nobari, Cao, arras, Bressan, 2012 . The first two pages provide a clear and concise overview of the field.
Simplified example
A group of farmers has some elevation data and was about to help them understand how rainfall flows on their farmland. We well represent the earth as a two-dimensional array of heights and use the following model based on the idea that water flows downhill:
If the cells of four neighboring cells have higher heights, we call this cell a sink ; water is collected in the sinks. Otherwise, water will flow into the next cell with the lowest height. If the cell is not a receiver, you can assume that it has a unique younger neighbor and that this neighbor will be smaller than the cell.
Cells that drain into the same sink - directly or indirectly - are considered part of the same pool.
Your task is to break the map into pools. In particular, given the height map, your code should split the map into pools and display the sizes of the pools in descending order.
Suppose the height maps are square. Input begins with a line with a single integer, S, height (and width) of the map. The next S lines will contain a map line, each of which has S integers - the heights of S-cells in a line. Some farmers have small plots, such as the examples below, while some have larger plots. However, in no case does the farmer have a plot of land larger than S = 5000.
Your code should list the pool sizes in descending order. (Intermediate spaces are ignored.)
Here is an example:
Input: 5 1 0 2 5 8 2 3 4 7 9 3 5 7 8 9 1 2 5 4 2 3 3 5 2 1 Output: 11 7 7 The basins, labeled with A's, B's, and C's, are: AAAAA AAAAA BBACC BBBCC BBCCC
// lm.js - find the local minima // Globalization of variables. /* The map is a 2 dimensional array. Indices for the elements map as: [0,0] ... [0,n] ... [n,0] ... [n,n] Each element of the array is a structure. The structure for each element is: Item Purpose Range Comment ---- ------- ----- ------- h Height of cell integers s Is it a sink? boolean x X of downhill cell (0..maxIndex) if s is true, x&y point to self y Y of downhill cell (0..maxIndex) b Basin name ('A'..'A'+# of basins) Use a separate array-of-arrays for each structure item. The index range is 0..maxIndex. */ var height = []; var sink = []; var downhillX = []; var downhillY = []; var basin = []; var maxIndex; // A list of sinks in the map. Each element is an array of [ x, y ], where // both x & y are in the range 0..maxIndex. var basinList = []; // An unordered list of basin sizes. var basinSize = []; // Functions. function isSink(x,y) { var myHeight = height[x][y]; var imaSink = true; var bestDownhillHeight = myHeight; var bestDownhillX = x; var bestDownhillY = y; /* Visit the neighbors. If this cell is the lowest, then it the sink. If not, find the steepest downhill direction. */ function visit(deltaX,deltaY) { var neighborX = x+deltaX; var neighborY = y+deltaY; if (myHeight > height[neighborX][neighborY]) { imaSink = false; if (bestDownhillHeight > height[neighborX][neighborY]) { bestDownhillHeight = height[neighborX][neighborY]; bestDownhillX = neighborX; bestDownhillY = neighborY; } } } if (x !== 0) { // upwards neighbor exists visit(-1,0); } if (x !== maxIndex) { // downwards neighbor exists visit(1,0); } if (y !== 0) { // left-hand neighbor exists visit(0,-1); } if (y !== maxIndex) { // right-hand neighbor exists visit(0,1); } downhillX[x][y] = bestDownhillX; downhillY[x][y] = bestDownhillY; return imaSink; } function exploreBasin(x,y,currentSize,basinName) { // This cell is in the basin. basin[x][y] = basinName; currentSize++; /* Visit all neighbors that have this cell as the best downhill path and add them to the basin. */ function visit(x,deltaX,y,deltaY) { if ((downhillX[x+deltaX][y+deltaY] === x) && (downhillY[x+deltaX][y+deltaY] === y)) { currentSize = exploreBasin(x+deltaX,y+deltaY,currentSize,basinName); } return 0; } if (x !== 0) { // upwards neighbor exists visit(x,-1,y,0); } if (x !== maxIndex) { // downwards neighbor exists visit(x,1,y,0); } if (y !== 0) { // left-hand neighbor exists visit(x,0,y,-1); } if (y !== maxIndex) { // right-hand neighbor exists visit(x,0,y,1); } return currentSize; } // Read map from file (1st argument). var lines = $EXEC('cat "' + $ARG[0] + '"').split('\n'); maxIndex = lines.shift() - 1; for (var i = 0; i<=maxIndex; i++) { height[i] = lines.shift().split(' '); // Create all other 2D arrays. sink[i] = []; downhillX[i] = []; downhillY[i] = []; basin[i] = []; } for (var i = 0; i<=maxIndex; i++) { print(height[i]); } // Everyone decides if they are a sink. Create list of sinks (ie roots). for (var x=0; x<=maxIndex; x++) { for (var y=0; y<=maxIndex; y++) a if (sink[x][y] = isSink(x,y)) { // This node is a root (AKA sink). basinList.push([x,y]); } } } //for (var i = 0; i<=maxIndex; i++) { print(sink[i]); } // Each root explores it basin. var basinName = 'A'; for (var i=basinList.length-1; i>=0; --i) { // i-- makes Closure Compiler sad var x = basinList[i][0]; var y = basinList[i][5]; basinSize.push(exploreBasin(x,y,0,basinName)); basinName = String.fromCharCode(basinName.charCodeAt() + 1); } for (var i = 0; i<=maxIndex; i++) { print(basin[i]); } // Done. print(basinSize.sort(function(a, b){return ba}).join(' '));