Git difftool funny slows down in Cygwin / MinGW

I noticed that git difftool very slow. Between each call, diff causes a delay of about 1.2 seconds.

To test this, I wrote the difftool user command:

 #!/bin/sh echo $0 $1 $2 

And configured Git to use this tool in my ~/.gitconfig

 [diff] tool = mydiff [difftool "mydiff"] prompt = false cmd = "~/mydiff \"$LOCAL\" \"$REMOTE\"" 

I tested it in Git sources:

 $ git clone https://github.com/git/git.git $ cd git $ git rev-parse HEAD 1bc8feaa7cc752fe3b902ccf83ae9332e40921db $ git diff head~10 --stat --name-only | wc -l 23 

When I time a git difftool with 259b5e6d33 , the result is ridiculously slow:

 $ time git difftool 259b5 mydiff /dev/null Documentation/RelNotes/2.6.3.txt ... mydiff /tmp/mY2T6l_upload-pack.c upload-pack.c real 0m10.381s user 0m1.997s sys 0m6.667s 

After trying a simpler script, it goes much faster:

 $ time git diff --name-only --stat 259b5 | xargs -n1 -I{} sh -c 'git show 259b5:{} > {}.tmp && ~/mydiff {} {}.tmp' mydiff Documentation/RelNotes/2.6.3.txt Documentation/RelNotes/2.6.3.txt.tmp mydiff upload-pack.c upload-pack.c.tmp real 0m1.149s user 0m0.472s sys 0m0.821s 

What did I miss?

Here are the results that I got

 | Cygwin | Debian | Ubuntu | Method | | ------ | ------ | ------ | -------- | | 10.381 | 2.620 | 0.580 | difftool | | 1.149 | 0.567 | 0.210 | custom | 

For Cygwin results, I measured 2.8 s spent on git-difftool and 7.5s spent on git-difftool--helper . The latter has a length of 98 lines. I don’t understand why it is so slow.

+7
git benchmarking cygwin mingw git-difftool
source share
3 answers

Using some of the methods found in msysgit GitHub , I narrowed it down a bit.

For each file in diff, git-difftool--helper re-runs the following internal commands:

 12:44:46.941239 git.c:351 trace: built-in: git 'config' 'diff.tool' 12:44:47.359239 git.c:351 trace: built-in: git 'config' 'difftool.bc.cmd' 12:44:47.933239 git.c:351 trace: built-in: git 'config' '--bool' 'mergetool.prompt' 12:44:48.797239 git.c:351 trace: built-in: git 'config' '--bool' 'difftool.prompt' 12:44:49.696239 git.c:351 trace: built-in: git 'config' 'difftool.bc.cmd' 12:44:50.135239 git.c:351 trace: built-in: git 'config' 'difftool.bc.path' 12:44:50.422239 git.c:351 trace: built-in: git 'config' 'mergetool.bc.path' 12:44:51.060239 git.c:351 trace: built-in: git 'config' 'difftool.bc.cmd' 12:44:51.452239 git.c:351 trace: built-in: git 'config' 'difftool.bc.cmd' 

Note that in this particular case, it took approximately 4.5 seconds to complete. This is a pretty consistent template throughout my journal.

Please note that some of them are duplicated - git config difftool.bc.cmd is called 4 times!

Now possible means:

  • I cut the execution time of these commands in half by moving all the diff-related sections to the top of my .gitconfig file. Jokes aside. It is still noticeable, but now it is about 2 seconds instead of 4.5.
  • Make sure that your Git folder in the "Program Files" section and your user profile (where .gitconfig ) are disconnected from real-time scanning.
  • Basically, Git should be more efficient at parsing and getting configuration values. Ideally, it will cache this data instead of re-querying (and repairing ...) from the configuration every time in the loop. Perhaps even cached for the entire execution of the command.
+2
source share

git difftool should be a little faster with Git 2.13 (Q2 2017)
See commit d12a8cf (April 14, 2017) Jeff Hostetler ( jeffhostetler ) .
(merger of Junio ​​C Hamano - gitster - to commit 8868ba1 , April 24, 2017)

unpack-trees : avoid duplicate ODB requests during validation

(ODB: Object DataBase)

Teach traverse_trees_recursive() not to run redundant ODB searches when both directories are on the same OID.

In operations like read-tree and checkout , there are likely to be many peer directories having the same OID when the differences between commits are relatively small.
In these cases, we can avoid multiple ODB bumps for the same OID.

This patch handles n = 2 and n = 3 cases and simply copies the data, rather than repeating fill_tree_descriptor ().

 ================ 

In the Windows repo (500K trees, 3.1M files, 450MB index) this reduced the total time by 0.75 seconds, when cyclic movement between two commits with a difference in one file.

 (avg) before: 22.699 (avg) after: 21.955 =============== 
+1
source share

After some investigation, I have evidence that poor performance is related to files belonging to a user from another domain. In particular, I came to the following conclusions:

  • I work in a corporate environment with several domains and thousands of users.
  • Due to organizational changes, each user, probably only at the transition stage, is stored in two domains, in his main domain, as well as in the second domain. When you change ownership of objects through the Windows GUI, each user appears twice, and you need to go to the advanced user selection to identify the one that is assigned to a specific domain.
  • cygwin with acl enabled displays the user of the "other domain" file as "<domain> + <username>". The primary domain is simply "<username>". Cygwin without acl only displays "<username>"; in both cases. This can be quite confusing, as permission to use and ownership, as defined by cygwin, indicates write permission, while the user does not actually have it.
  • Files belonging to the "other domain" itself can be writable by "this domain" itself, so the purpose of the domain is largely transparent.
  • The large source tree from our version control system (which was also reflected in the git repo) had thousands of files belonging to a "different domain." This seems to cause slow file operations. A change in ownership of the "primary domain" fixed a speed problem for both git and other file access.

I must assume that obtaining file permissions for users in other domains is slow and for some reason is not cached (he was always the same user).

The rest of the article below is what I originally published. I let it stand.


For me (working in a large company with several geographically distributed Windows domains), the culprit is that cygwin uses Windows acl by default. Consider this request for all known users in the Domain:

 $ time (mkpasswd -D | wc -l) 45183 real 27m55,340s user 0m1,637s sys 0m0,123s 

Fix (1) (2) was a simple matter for installing NTFS file systems using noacl , i.e. my /etc/fstab contains the line

 none / cygdrive binary,posix=0,user,noacl 0 0 

(while eliminating the annoying cygdrive prefix).

I can’t help but imagine that cygwin / msys (the same behavior there, except that installing Windows git is mounted by noacl by default, probably for this reason) performs a domain server request for every file it touches, and Don't cache results.

This change was introduced around 2015 using cygwin 2.4 or 2.5. From the release notes for version 2.4:

To comply with the standard Windows ACLs, the permissions of the POSIX owner and all other users in the ACL are calculated using the Windows AuthZ API. This may slow down the calculation of POSIX permissions, noticeably under certain circumstances [...] (emphasis by me).

The noacl option reduced the start time of BeyondCompare (or an echo line, for that matter) from 25 seconds to 1. It is completely incomprehensible why a simple git diff in the same file is very fast even with acl since I would naively assume that the required information and, therefore, the required actions of the FS are the same.

Now I will conduct a cygserver , which can improve the situation by caching.

Update: cygserver does not improve the situation, unfortunately.


(1) Fix for git. mkpasswd no effect.

(2) I did not understand and did not influence the access rights and ownership of git (and the ClearCase views, which we also access through cygwin). The feeling of my feeling is to stay as close as possible to the semantics of Windows (which means that noacl might run into problems).

(3) The cygwin documentation discusses scenarios in which query results are not cached. One of them consists of a sequence of cygwin processes that are not spawned from a common ancestor of cygwin (e.g. bash), but from a window program such as cmd . I must assume that Windows provides a caching mechanism for its own programs, or that Windows will not be suitable for use in this corporate environment. For some reason, cygwin is not using it.

+1
source share

All Articles