Creating Sets of Similar Items in a 2D Array

I am trying to solve a problem based on a 2D array. This array contains various elements (out of a total of 3 possible types). Suppose the form is X, Y, Z.

The array seems to be something like this. Please note that it will always be completely filled. The diagram is intended to illustrate.

7 | | | | | | | 6 | | | | | | | 5 | | | | | | | 4 | |X|Z|Y|X| | 3 | |Y|X|Y|Y|X| 2 |Y|Y|X|Z|Z|X| 1 |X|X|Y| |X|X| 0 | | | |Z| | | 0 1 2 3 4 5 

I am trying to create many elements that are located next to each other. For example, set1 may contain elements of type X located in: (0,1), (1,1), (2,2), (2,3), (1,4). Similarly, set2 may contain elements of type Y located at: (3,4), (3,3), 4,3).

Problem: For any point in the array, it must be able to add all elements to the corresponding set and ensure that there are no two sets containing the same element. Please note that a set is only created if more than two adjacent elements of the same type are found.

In addition, if a specific subset of items is deleted, more items are added to replace deleted ones. Then the array must be re-repeated to create new sets or modify existing ones.

Solution: I implemented a recursive solution so that it iterates over all adjacent elements, for example, the element X (0,1). Then, repeating 8 possible adjacent elements, it will call itself recursively whenever type X occurs.

This solution is too much brute force and inefficient, especially in cases where some elements are replaced with new, possibly different types. In this case, almost the entire array must be repeated to create / modify sets and ensure that the same element does not exist in more than one set.

Is there any algorithm to effectively solve this problem? I need help with some ideas / suggestions or pseudo codes.

+8
arrays set algorithm multidimensional-array
source share
4 answers

[EDIT 5/8/2013: Complex time complexity. (O (a (n)) is essentially a constant time!)]

Hereinafter referred to as a “connected component”, I mean the set of all positions that are reachable from each other along a path that allows only horizontal, vertical or diagonal movements between adjacent positions that have the same element. For example. your example {(0,1), (1,1), (2,2), (2,3), (1,4)} is the connected component in your input example. Each item belongs to exactly one connected component.

We will build a union / find structure that will be used to give each position (x, y) a numeric label that has the property, which if and only if any two positions (x, y) and (x ', y') belong to single component, then they have the same label. In particular, this data structure supports three operations:

  • set(x, y, i) set the label for position (x, y) to i.
  • find(x, y) will return the label assigned to position (x, y).
  • union(Z) for some set of labels Z combines all the labels in Z into one label k in the sense that future calls to find(x, y) at any position (x, y) that previously had a label in Z will now return k. (In the general case, k will be one of the shortcuts already in Z, although this is actually not important.) union(Z) also return a new label "master", k.

If in total there is n = width * height, this can be done in O (n * a (n)) time, where a () is an extremely slowly growing inverse Ackerman function. For all practical input sizes, this is the same as O (n).

Note that when two vertices are adjacent to each other, four cases are possible:

  • One above the other (connected by a vertical edge)
  • One is to the left of the other (connected by a horizontal edge)
  • One above and to the left of the other (connected with a diagonal edge \ )
  • One above and to the right of the other (connected by a diagonal edge / )

We can use the following pass to define labels for each position (x, y):

  • Set nextLabel to 0.
  • For each row y in ascending order:
    • For each column x in ascending order:
      • Examine the neighbors W, NW, N, and NE (x, y). Let Z be a subset of these 4 neighbors that have the same form as (x, y).
      • If Z is an empty set, then we will assume that (x, y) launches a completely new component, so call set (x, y, nextLabel) and increment nextLabel.
      • Otherwise, call find (Z [i]) on each Z element to find their labels, and call union () on this set of labels to combine them. Assign a new label (the result of this union () call) to k, and then also call set (x, y, k) to add (x, y) to this component.

After that, calling find(x, y) at any position (x, y) effectively tells you which component it belongs to. If you want to be able to quickly respond to requests of the form "Which positions belong to the related component containing the position (x, y)?" then create a hash table from the posInComp lists and make a second pass through the input array, adding each (x, y) to the posInComp[find(x, y)] list. This can be done in linear time and space. Now, to answer the query for a specific position (x, y), simply call lab = find(x, y) to find the position label, and then list the positions in posInComp[lab] .

To deal with "too small" components, just look at the size of posInComp[lab] . If it is 1 or 2, then (x, y) does not belong to any “sufficiently large” component.

Finally, all this work effectively takes linear time, so it will be lightning fast if your input array is not huge. Therefore, it is wise to re-compromise it from scratch after changing the input array.

+7
source share

In your situation, I would rely on at least two different arrays:

 Array1 (sets) -> all the sets and the associated list of points. Main indices: set names. Array2 (setsDef) -> type of each set ("X", "Y" or "Z"). Main indices: type names. 

Perhaps it is possible to create more helper arrays, for example, the minimum / maximum X / Y values ​​for each set, to speed up the analysis (although in any case it would be pretty fast, as shown below).

You do not mention any programming language, but I am including sample code (C #) because this is the best way to explain this point. Please do not understand this as a suggestion for a better way to continue (I personally don't like the Dictionaries / Lists too much, although I think this is a good graphical way to show the algorithm, even for unused C # users). This code only intends to show a data storage / retrieval approach; the best way to achieve optimal performance will depend on the target language and further problems (such as the size of the dataset), and this is what you should take care of.

 Dictionary<string, List<Point>> sets = new Dictionary<string, List<Point>>(); //All sets and the associated list of points Dictionary<string, List<string>> setsDef = new Dictionary<string, List<string>>(); //Array indicating the type of information stored in each set (X or Y) List<Point> temp0 = new List<Point>(); temp0.Add(new Point(0, 0)); temp0.Add(new Point(0, 1)); sets.Add("Set1", temp0); List<String> tempX = new List<string>(); tempX.Add("Set1"); temp0 = new List<Point>(); temp0.Add(new Point(0, 2)); temp0.Add(new Point(1, 2)); sets.Add("Set2", temp0); List<String> tempY = new List<string>(); tempY.Add("Set2"); setsDef.Add("X", tempX); setsDef.Add("Y", tempY); //-------- TEST //I have a new Y value which is 2,2 Point targetPoint = new Point(2, 2); string targetSet = "Y"; //I go through all the Y sets List<string> targetSets = setsDef[targetSet]; bool alreadyThere = false; Point candidatePoint; string foundSet = ""; foreach (string set in targetSets) //Going through all the set names stored in setsDef for targetSet { List<Point> curPoints = sets[set]; foreach (Point point in curPoints) //Going through all the points in the given set { if (point == targetPoint) { //Already-stored point and thus the analysis will be stopped alreadyThere = true; break; } else if (isSurroundingPoint(point, targetPoint)) { //A close point was found and thus the set where the targetPoint has to be stored candidatePoint = point; foundSet = set; break; } } if (alreadyThere || foundSet != "") { break; } } if (!alreadyThere) { if (foundSet != "") { //Point added to an existing set List<Point> curPoints = sets[foundSet]; curPoints.Add(targetPoint); sets[foundSet] = curPoints; } else { //A new set has to be created string newName = "New Set"; temp0 = new List<Point>(); temp0.Add(targetPoint); sets.Add(newName, temp0); targetSets.Add(newName); setsDef[targetSet] = targetSets; } } 

Where isSurroundingPoint is a function that checks how close both points are to each other:

 private bool isSurroundingPoint(Point point1, Point point2) { bool isSurrounding = false; if (point1.X == point2.X || point1.X == point2.X + 1 || point1.X == point2.X - 1) { if (point1.Y == point2.Y || point1.Y == point2.Y + 1 || point1.Y == point2.Y - 1) { isSurrounding = true; } } return isSurrounding; } 
+1
source share

You might want to check out the growing algorithms in the region that are used for image segmentation. These algorithms start with a seed pixel and grow in an adjacent region, where all the pixels in the region have some property.

In your case, neighboring "pixels" are in the same image segment if they have the same label (i.e., the appearance of an X, Y or Z element)

+1
source share

I wrote something to find objects of one type for another SO question. The following example adds two more types. Any repeated iteration will review the entire list again. The idea is to process a list of points for each type separately. The solve function groups any connected points and removes them from the list before listing the next group. areConnected checks the relationship between the coordinates of points, since we only check points of the same type. In this generalized version, types ( abc ) can be any (strings, numbers, tuples, etc.) if they match.

btw - here is a link to javascript example j_random_hacker awesome algorithm: http://jsfiddle.net/groovy/fP5kP/

Haskell Code:

 import Data.List (elemIndices, delete) example = ["xxyyyz" ,"xyyzzz" ,"yxxzzy" ,"yyxzxy" ,"xyzxyy" ,"xzxxzz" ,"xyzyyz" ,"xyzxyy"] objects abc ws = [("X",solve xs []),("Y",solve ys []),("Z",solve zs [])] where mapIndexes s = concatMap (\(y,xs)-> map (\x->(y,x)) xs) $ zip [0..] (map (elemIndices s) ws) [xs,ys,zs] = map mapIndexes [a,b,c] areConnected (y,x) (y',x') = abs (x-x') < 2 && abs (y-y') < 2 solve [] r = r solve (x:xs) r = let r' = solve' xs [x] in solve (foldr delete xs r') (if null (drop 2 r') then r else r':r) solve' vs r = let ys = filter (\y -> any (areConnected y) r) vs in if null ys then r else solve' (foldr delete vs ys) (ys ++ r) 

Output Example:

 *Main> objects 'x' 'y' 'z' example [("X",[[(7,0),(6,0),(5,0),(4,0)] ,[(3,4),(5,2),(5,3),(4,3),(2,2),(3,2),(2,1),(0,1),(1,0),(0,0)]]) ,("Y",[[(7,5),(6,4),(7,4),(6,3)],[(4,4),(4,5),(3,5),(2,5)] ,[(4,1),(3,0),(3,1),(0,4),(2,0),(0,3),(1,1),(1,2),(0,2)]]) ,("Z",[[(5,5),(6,5),(5,4)] ,[(7,2),(6,2),(5,1),(4,2),(3,3),(1,3),(2,3),(2,4),(1,4),(1,5),(0,5)]])] (0.02 secs, 1560072 bytes) 
0
source share

All Articles