Finding the difference between lines in Javascript

I would like to compare two lines (before and after) and pinpoint where and what has changed between them.

For any change I want to know:

  • Starting position of the change (inclusive, starting from 0)
  • The final position of the change (inclusive, starting from 0) compared to the previous text
  • "Change"

Assume that the lines will change only in one place at a time (for example, never B il l "->" K il n ").

In addition, I need start and end positions to reflect the type of change:

  • If the deletion, the start and end positions should be the start and end positions of the deleted text, respectively
  • If the replacement, the start and end positions should be the start and end positions of the โ€œdeletedโ€ text, respectively (the change will be โ€œaddedโ€ text)
  • When inserting, the start and end positions must be the same; text entry point
  • If there are no changes, let the start and end positions remain zero, with an empty change

For example:

"0123456789" -> "03456789" Start: 1, End: 2, Change: "" (deletion) "03456789" -> "0123456789" Start: 1, End: 1, Change: "12" (insertion) "Hello World!" -> "Hello Aliens!" Start: 6, End: 10, Change: "Aliens" (replacement) "Hi" -> "Hi" Start: 0, End: 0, Change: "" (no change) 

I was able to somewhat determine the position of the changed text, but it does not work in all cases, because in order to do this accurately, I need to know what changes are made.

 var OldText = "My edited string!"; var NewText = "My first string!"; var ChangeStart = 0; var NewChangeEnd = 0; var OldChangeEnd = 0; console.log("Comparing start:"); for (var i = 0; i < NewText.length; i++) { console.log(i + ": " + NewText[i] + " -> " + OldText[i]); if (NewText[i] != OldText[i]) { ChangeStart = i; break; } } console.log("Comparing end:"); // "Addition"? if (NewText.length > OldText.length) { for (var i = 1; i < NewText.length; i++) { console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1)); if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) { NewChangeEnd = NewText.length - i; OldChangeEnd = OldText.length - i; break; } } // "Deletion"? } else if (NewText.length < OldText.length) { for (var i = 1; i < OldText.length; i++) { console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1)); if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) { NewChangeEnd = NewText.length - i; OldChangeEnd = OldText.length - i; break; } } // Same length... } else { // Do something } console.log("Change start: " + ChangeStart); console.log("NChange end : " + NewChangeEnd); console.log("OChange end : " + OldChangeEnd); console.log("Change: " + OldText.substring(ChangeStart, OldChangeEnd + 1)); 

How do I know if insertions, deletions or replacements have occurred?


I searched and came up with several other similar questions, but they don't seem to help.

+7
javascript string difference
source share
3 answers

I went through your code, and your logic for string matching makes sense to me. It registers ChangeStart , NewChangeEnd and OldChangeEnd , and the algorithm works fine. You just want to find out if an insert , delete, or replacement has occurred. Here is how I would do it.

First of all, you need to make sure that after you get the first point of an incorrect match, i.e. ChangeStart , when you cross lines from the end, the index should not cross ChangeStart .

I will give you an example. Consider the following lines:

  var NewText = "Hello Worllolds!"; var OldText = "Hello Worlds!"; ChangeStart -> 10 //Makes sense OldChangeEnd -> 8 NewChangeEnd -> 11 console.log("Change: " + NewText.substring(ChangeStart, NewChangeEnd + 1)); //Ouputs "lo" 

In this case, the problem is when it starts to match from the back, the flow looks something like this:

  Comparing end: 1(N: 12 O: 12: ! -> !) 2(N: 11 O: 11: s -> s) 3(N: 10 O: 10: d -> d) -> You need to stop here! //Although there is not a mismatch, but we have reached ChangeStart and //we have already established that characters from 0 -> ChangeStart-1 match //That is why it outputs "lo" instead of "lol" 

Assuming what I just said makes sense, you just need to change your for loops as follows:

  if (NewText.length > OldText.length) { for (var i = 1; i < NewText.length && ((OldText.length-i)>=ChangeStart); i++) { ... NewChangeEnd = NewText.length - i -1; OldChangeEnd = OldText.length - i -1; if(//Mismatch condition reached){ //break..That code is fine. } } 

This condition โ†’ (OldText.length-i)>=ChangeStart takes care of the anomaly that I mentioned, and therefore the for loop automatically terminates if this condition is reached. However, as I already mentioned, situations may arise when this condition is reached before an incorrect match is met, as I just demonstrated. So you need to update the values โ€‹โ€‹of NewChangeEnd and OldChangeEnd as 1 less than the match . In the event of an incorrect match, you save the values โ€‹โ€‹accordingly.

Instead of else -if we could simply wrap these two conditions in a situation where we know that NewText.length > OldText.length definitely not true, i.e. it is either a replacement or a removal . Again NewText.length > OldText.length also means that it can be a replacement or insert according to your examples, which makes sense. So else might look something like this:

 else { for (var i = 1; i < OldText.length && ((OldText.length-i)>=ChangeStart); i++) { ... NewChangeEnd = NewText.length - i -1; OldChangeEnd = OldText.length - i -1; if(//Mismatch condition reached){ //break..That code is fine. } } 

If you have understood the minor changes so far, identifying specific cases is very simple:

  • Delete - Condition โ†’ ChangeStart > NewChangeEnd . ChangeStart -> OldChangeEnd row from ChangeStart -> OldChangeEnd .

Remote text - OldText.substring(ChangeStart, OldChangeEnd + 1);

  1. Insert - Condition โ†’ ChangeStart > OldChangeEnd . Insert a row into ChangeStart .

Nested text - NewText.substring(ChangeStart, NewChangeEnd + 1);

  1. Replacement . If NewText != OldText and the two conditions above are not , then there is a replacement.

The text in the old line that was replaced โ†’ OldText.substring(ChangeStart, OldChangeEnd + 1);

NewText.substring(ChangeStart, NewChangeEnd + 1); text - NewText.substring(ChangeStart, NewChangeEnd + 1);

Start and end positions in OldText that got replaced โ†’ ChangeStart -> OldChangeEnd

I created a jsfiddle containing the changes I mentioned in your code. You can check it out. Hope you start in the right direction.

+3
source share

I had a similar problem and it was solved as follows:

 function diff(oldText, newText) { // Find the index at which the change began var s = 0; while(s < oldText.length && s < newText.length && oldText[s] == newText[s]) { s++; } // Find the index at which the change ended (relative to the end of the string) var e = 0; while(e < oldText.length && e < newText.length && oldText.length - e > s && newText.length - e > s && oldText[oldText.length - 1 - e] == newText[newText.length - 1 - e]) { e++; } // The change end of the new string (ne) and old string (oe) var ne = newText.length - e; var oe = oldText.length - e; // The number of chars removed and added var removed = oe - s; var added = ne - s; var type; switch(true) { case removed == 0 && added > 0: // It an 'add' if none were removed and at least 1 added type = 'add'; break; case removed > 0 && added == 0: // It a 'remove' if none were added and at least one removed type = 'remove'; break; case removed > 0 && added > 0: // It a replace if there were both added and removed characters type = 'replace'; break; default: type = 'none'; // Otherwise there was no change s = 0; } return { type: type, start: s, removed: removed, added: added }; } 

Please note this did not solve my actual problem. My problem was that I had an editor with paragraphs, each of which was modeled with text and a collection of markup defined by an index of the beginning and end, for example. in bold char from 1 to char 5. I used this to detect changes in the string to change the markup indices accordingly. But consider the line:

xx xxx

The diff function approach cannot distinguish between a character added outside or inside bold.

In the end, I took a completely different approach - I just analyzed the HTML created by the editor and used it to determine the start and end markup indices.

+1
source share

Made my slightly better version based on the same tactics as above (looking for differences front and back and back)

 function compareText(oldText, newText) { var difStart,difEndOld,difEndNew; //from left to right - look up the first index where characters are different for(let i=0;i<oldText.length;i++) { if(oldText.charAt(i) !== newText.charAt(i)) { difStart = i; break; } } //from right to left - look up the first index where characters are different //first calc the last indices for both strings var oldMax = oldText.length - 1; var newMax = newText.length - 1; for(let i=0;i<oldText.length;i++) { if(oldText.charAt(oldMax-i) !== newText.charAt(newMax-i)) { //with different string lengths, the index will differ for the old and the new text difEndOld = oldMax-i; difEndNew = newMax-i; break; } } var removed = oldText.substr(difStart,difEndOld-difStart+1); var added = newText.substr(difStart,difEndNew-difStart+1); return [difStart,added,removed]; } 
0
source share

All Articles