Comparing two mapping vectors

Question

Comparing two mapping vectors

I have two ways to get a bunch of data. Data is stored in a sorted vector<map<string, int> > .

I want to determine if there are inconsistencies between these two vectors.

What I'm doing now (pseudo-code):

 for i in 0... min(length(vector1), length(vector2)): for (k, v) in vector1[i]: if v != vector2[i][k]: // report that k is bad for index i, // with vector1 having v, vector2 having vector2[i][k] for i in 0... min(length(vector1), length(vector2)): for (k, v) in vector2[i]: if v != vector1[i][k]: // report that k is bad for index i, // with vector2 having v, vector1 having vector1[i][k]

This works in general, but breaks horribly if vector1 has a, b, c, d and vector2 has a, b, b1, c, d (it reports a fault for b1 , c and d ). I am following an algorithm that tells me that there is an extra entry in vector2 compared to vector1 .

I think I want to do something when I came across inconsistency records, I look at the following records in the second vector, and if a match is found before the end of the second vector, keep index i record found in the second vector and go to match the next entry in the first vector, starting with vector2[i+1] .

Is there an easier way to do this? Some kind of standard algorithm that I have not come across?

I work in C ++, so C ++ solutions are welcome, but solutions in any language or pseudo-code will also be great.

Example

For arbitrary map objects: a , b , c , d , e , f and g ;

With vector1 : a , b , d , e , f

and vector2 : a , c , e , f

I need an algorithm that tells me:

Extra b at index 1 of vector1 and vector2 c != vector1 d .

or (I would see this as an effective equivalent result)

vector1 b != vector2 c and optional d at index 2 of vector1

Edit

I ended up using std::set_difference and then did some matching on the differences between both sets to determine which entries were similar but different, and which had entries completely missing from another vector.

+6

c ++ diff

Dominic Rodger Aug 12 '09 at 11:53

source share

5 answers

It looks like you are looking for a diff algorithm. The idea is to identify the longest common subsequence of two vectors (using map equality), and then skip non-common parts recursively. In the end, you will have an alternating list of vector subsequences that are identical, and subsequences that do not have common elements. Then you can easily create any output you like.

Apply it to two vectors and there you go.

Please note that since comparing cards is expensive, if you can hash cards (use a strong hash - collisions will lead to incorrect output) and use hashes for comparison, you will save a lot of time.

Once you get down to the inconsistent subsequences at the end, you will have something like:

 Input vectors: abcdef, abc' def Output: COMMON ab LEFT c RIGHT c' COMMON def

Then you can individually compare cards c and c' to find out how they differ.

If you have a mutation and an insert next to each other, it becomes more complex:

 Input vectors: ab VW def, ab XY def Output: COMMON ab LEFT VW RIGHT XY COMMON def

Determining whether to match V and W with X or Y (or not at all) is what you need to come up with for heuristics.

Of course, if you don’t care how the contents of the cards differ, you can stop here and you have the result you need.

+1

bdonlan Aug 12 '09 at 15:02

source share

What exactly are you trying to achieve? Could you determine exactly what result you expect in terms of input? Your pseudo-code compares cards in a vector index. If this is not the correct semantics, then what is it?

0

Ari Aug 12 '09 at 12:16

source share

Can you map some kind of checksum (or Blumen filter) to each card - that with a single check you can decide if the comparison makes sense.

0

Dewfy Aug 12 '09 at 12:23

source share

In your example, note that it is impossible to distinguish

Additional b in index 1 of vector 1 and vector2 c! = Vector1 d.

and

Additionally, b at index 1 of vector 1, additionally d at index 2 v1, and additional c at 1 in v2

because it is not clear that “c” should be compared with “d”, it can be compared with “b”. I assume that the vectors are not sorted because std :: map does not provide a relational operator. Rather, these are cards that, as I understand it, are completely irrelevant ;-) Thus, your example is slightly distorted. It could even be

Compare bfead

with acfe

You can check every element of the first vector for every element of the second vector. This has a quadratic run time.

 for i in 0... length(vector1): foundmatch = false; for j in 0... length(vector2): mismatch = false; for (k, v) in vector1[i]: if v != vector2[j][k]: mismatch = true; break; // no need to compare against the remaining keys. if (!mismatch) // found matching element j in vector2 for element i in vector1 foundmatch = true; break; // no need to compare against the remaining elements in vector2 if (foundmatch) continue; else // report that vector1[i] has no matching element in vector2[] // "extra b at i"

If you want to find the missing elements, just replace vector1 and vector2.

If you want to check an element in vector2 for mismatching vector1 in only one key, you need to add additional code around "there is no need to compare with the rest of the keys."

0

hirschhornsalz Aug 12 '09 at 14:55

source share

Glen · Accepted Answer · 2009-08-12T11:58:29+0000

Something like the std :: mismatch algorithm

You can also use std :: set_difference

Comparing two mapping vectors

Example

Edit

More articles: