Regular expression diff filtering

It seems that it would be very convenient to filter diff so that trivial changes are not displayed. I would like to write a regular expression that will be executed on the line, and then pass it another line that uses the captured arguments to generate the canonical form. If the lines before and after produce the same output, they will be removed from diff.

For example, I am working on a PHP code base where a significant number of array calls are written as my_array[my_key] when they should be my_array["my_key"] to prevent problems if the constant my_key . It would be useful to create a diff where the only change in the line did not add some quotation marks.

I can’t change them all at once, since we don’t have the resources to check the entire code base, so I fix it when I make changes to the function. How can I achieve this? Is there anything else like this that I can use to achieve a similar result. For example, a simpler method might be to skip the canonical form and just see if the input will be converted to output. BTW, I use Git

+7
source share
7 answers

There seems to be no Git diff options to support what you want to do. However, you can use the GIT_EXTERNAL_DIFF environment GIT_EXTERNAL_DIFF and a custom script (or any executable file created using your preferred script or programming language) to manipulate the patch.

I assume that you are on Linux; if not, you can customize this concept to suit your environment. Let's say you have a Git repo where HEAD has a file05 file that contains:

 line 26662: $my_array[my_key] 

And the file06 file, which contains:

 line 19768: $my_array[my_key] line 19769: $my_array[my_key] line 19770: $my_array[my_key] line 19771: $my_array[my_key] line 19772: $my_array[my_key] line 19773: $my_array[my_key] line 19775: $my_array[my_key] line 19776: $my_array[my_key] 

You change file05 to:

 line 26662: $my_array["my_key"] 

And you change file06 to:

 line 19768: $my_array[my_key] line 19769: $my_array["my_key"] line 19770: $my_array[my_key] line 19771: $my_array[my_key] line 19772: $my_array[my_key] line 19773: $my_array[my_key] line 19775: $my_array[my_key2] line 19776: $my_array[my_key] 

Using the following shell script, call her mydiff.sh and place it somewhere in our PATH :

 #!/bin/bash echo " $@ " git diff-files --patch --word-diff=porcelain "${5}" | awk ' /^-./ {rec = FNR; prev = substr($0, 2);} FNR == rec + 1 && /^+./ { ln = substr($0, 2); gsub("\\[\"", "[", ln); gsub("\"\\]", "]", ln); if (prev == ln) { print " " ln; } else { print "-" prev; print "+" ln; } } FNR != rec && FNR != rec + 1 {print;} ' 

Command execution:

 GIT_EXTERNAL_DIFF=mydiff.sh git --no-pager diff 

It will display:

 file05 /tmp/r2aBca_file05 d86525edcf5ec0157366ea6c41bc6e4965b3be1e 100644 file05 0000000000000000000000000000000000000000 100644 index d86525e..c2180dc 100644 --- a/file05 +++ b/file05 @@ -1 +1 @@ line 26662: $my_array[my_key] ~ file06 /tmp/2lgz7J_file06 d84a44f9a9aac6fb82e6ffb94db0eec5c575787d 100644 file06 0000000000000000000000000000000000000000 100644 index d84a44f..bc27446 100644 --- a/file06 +++ b/file06 @@ -1,8 +1,8 @@ line 19768: $my_array[my_key] ~ line 19769: $my_array[my_key] ~ line 19770: $my_array[my_key] ~ line 19771: $my_array[my_key] ~ line 19772: $my_array[my_key] ~ line 19773: $my_array[my_key] ~ line 19775: -$my_array[my_key] +$my_array[my_key2] ~ line 19776: $my_array[my_key] ~ 

This conclusion does not show changes for added quotes in file05 and file06 . The external diff script basically uses the Git diff-files to create the patch and filters the output through the GNU awk script to manipulate it. This sample script does not process all the different combinations of old and new files mentioned for GIT_EXTERNAL_DIFF , and does not output a valid patch, but you should have enough to get you started.

You can use Perl , Python difflib regular expressions or whatever is more convenient for you to implement an external comparison tool that suits your needs.

+6
source
 $ git diff --help -G<regex> Look for differences whose added or removed line matches the given <regex>. 

EDIT

After some tests, I have something like

 git diff -b -w --word-diff-regex='.*\[[^"]*\]' 

Then I get the output as:

 diff --git a/test.php b/test.php index 62a2de0..b76891f 100644 --- a/test.php +++ b/test.php @@ -1,3 +1,5 @@ <?php {+$my_array[my_key]+} = "test"; ?> diff --git a/test1.php b/test1.php index 62a2de0..6102fed 100644 --- a/test1.php +++ b/test1.php @@ -1,3 +1,5 @@ <?php some_other_stuff(); ?> 

Perhaps this will help you. I found it here http://www.rhinocerus.net/forum/lang-lisp/659593-git-word-diff-regex-lisp-source.html and there is more detailed information about this stream

EDIT2

 git diff -G'\[[A-Za-z_]*\]' --pickaxe-regex 
+4
source

grepdiff can be used to filter hunks in a diff file.

 $ git diff -U1 | grepdiff 'console' --output-matching=hunk 

Only pieces that match the given string "console" are displayed here.

+2
source

Normalize the input files in the first step, and then compare the normalized files. This gives you the most control over the process. For example. you may need to apply the regular expression to non-HTML parts of the code, not inside lines, and not inside comments (or ignore comments altogether). Calculating the difference by normalized code is the right way to do such things; working with regular expressions on separate lines is much more error prone and, at most, cracking.

Some leak functions, such as, for example, meld allow you to hide the "slight" difference and come with a set of default templates, for example. hide only spaces. I guess this is pretty much what you want.

+1
source

from my own git --help

- word-diff-regular expression = <regex>

Use <regex> to decide what a word is, rather than treating non-whitespace spaces as words. Also implied is --word-diff, if it has already been enabled. Each non-overlapping <regex> match is considered a word. Anything between these matches is considered a space and ignored (!) In order to search for differences. You can add |[^[:space:]] to your regular expression to make sure that it matches all characters without spaces. A match containing a new line is quietly truncated (!) In a new line. A regular expression can also be specified using the diff parameter or configuration, see Gitattributes (1) or git -config (1). Writing this explicitly overrides any diff driver or configuration setting. Diff drivers override configuration settings.

+1
source

I use an approach that combines git diff and applies regular expression matching based on results. In some test code (PERL), I know that testing is successful when the OutputFingerprint stored in the resulting test files has not changed.

First i do

 my $matches = `git diff -- mytestfile` 

and then evaluate the result:

 if($matches =~ /OutputFingerprint/){ fail(); return 1; }else{ ok(); return 0; } 
0
source

If the goal minimizes trivial differences, you can consider our SmartDifferencer tool.

These tools compare the syntax of the language, not the layout, so many trivial changes (layout, modified comments, even modified based on numbers) are ignored and not reported. Each tool has a full language parser; there is a version for many languages, including PHP.

It will not treat the $ FOO [abc] example as "semantically identical" to $ FOO ["abc"] because it is not. If abc actaully is defined as constant, then $ FOO ["abc"] is not semantically equivalent.

-2
source

All Articles