[HN Gopher] Improve Git monorepo performance with a file system ...
___________________________________________________________________
Improve Git monorepo performance with a file system monitor
Author : chmaynard
Score : 61 points
Date : 2022-06-29 19:20 UTC (3 hours ago)
(HTM) web link (github.blog)
(TXT) w3m dump (github.blog)
| blopker wrote:
| Having a cross-platform file watcher built into a ubiquitous tool
| like git is pretty awesome. I could see build tools integrating
| with this and making more aspects of development faster without
| having to run a bunch of file watcher services. They all seem to
| have issues.
|
| I have tried Watchman, but setting it up is a pain. There are so
| many ways to use it. I also welcome running less Facebook code on
| my systems.
| rektide wrote:
| > _[FSMonitor] is currently available on macOS and Windows._
|
| Are there any other git features with this limitation? Wild to me
| that we're here.
|
| Thankfully the article covers the semi-longstanding "hooks" that
| existing (& very high performance) tools like Watchman (which are
| cross platform) can use.
|
| Great in depth read. Good stuff! From the 2.37 release[1].
|
| [1] https://github.blog/2022-06-27-highlights-from-git-2-37/
| https://news.ycombinator.com/item?id=31898261 (34 points, 2 days
| ago, 7 comments)
| milliams wrote:
| My assumption was that on Linux it's just been using inotify or
| something for a while and so hasn't needed a bespoke monitor. I
| have no idea if that's true or makes sense though.
| gpderetta wrote:
| More likely the linux fs is fast enough not to need the
| optimization. Unsurprisingly git was designed to run well on
| linux .
| [deleted]
| bobkazamakis wrote:
| yeah it uses famous linux-exclusive data structures, like
| hashes and strings.
| nasretdinov wrote:
| Not sure why the response got downvoted. I personally found
| Git performance to be, well, okay on macOS (but depends)
| and absolutely horrible on Windows due to very slow stat()
| calls on NTFS.
|
| Of course, in a large enough monorepo Linux performance
| would also suffer, but to a much lesser degree.
|
| Also, conveniently, both Windows and macOS have an API for
| recursive directory watch, whereas Linux doesn't (in
| Vanilla kernel). Inotify can only watch the immediate
| directory you're observing + there's a pretty low default
| limit on the number of inotify descriptors that you're
| allowed to have on top of that
| staticassertion wrote:
| My guess is that inotify is so slow with large directories that
| it wasn't worth it. Plus inotify has cumbersome user limits.
|
| inotify has a number of other relevant limitations, like not
| being able to create recursive notifications or handle "move"
| operations. Implementation effort is going to be way higher for
| an inotify-based system, and of course that's made far worse by
| the numerous file systems in linux - I imagine any
| implementation would probably start first with ext4.
|
| I suspect an ideal solution would be via ebpf, but I'm not
| sure.
| est31 wrote:
| I've been wondering about why there was no linux support, and
| found an e-mail from the author of the subcommand (as well as
| the github.blog post) explaining the situation.
|
| Apparently an older implementation using inotify was dropped
| because inotify does not work recursively, so you would have to
| do an inotify call for all directories of the hierarchy which
| is obviously very inefficient. There are system wide limits in
| the number of directories you can listen to, and even if you
| increase the limit you would probably cause a lot of overhead.
|
| Newer linuxes support the fanotify system call, which does
| allow recursive listening. They haven't implemented something
| using fanotify yet however.
|
| https://lore.kernel.org/git/e1442a04-7c68-0a7a-6e95-304854ad...
| tex0 wrote:
| This serves as an example to me that git is - maybe - not the
| right tool for the job.
| elpakal wrote:
| so what is?
| staticassertion wrote:
| Moving to a continuous, asynchronous strategy versus a point-
| in-time synchronous strategy, seems like a perfectly reasonable
| way to improve performance.
| BudaDude wrote:
| As with a lot of developer tools, the most adopted solutions
| are rarely the best tool for the job. But because everyone
| knows them, thats what continues to be used.
| eurasiantiger wrote:
| I wonder if this will cause issues in repos where changes can
| come from containerized apps syncing their runtime config to
| disk. Depending on the platform and the container framework, a
| lot of different things could potentially break here, from NFS-
| related to number of open files.
| saagarjha wrote:
| I have a healthy suspicion of the performance of file-watchers. I
| hope this feature doesn't make Git faster at the expense of "all
| filesystem operations crawl".
| mpawelski wrote:
| This is awesome. Especially the fact that it's built-in and easy
| to turn on.
|
| Seams like quite a complex solution though. I guess some big
| company (Microsoft?) implemented it internally for their own use
| and later tried to move it to upstream git. I wonder if there was
| some pushback from git maintainers from having this functionality
| built-in.
|
| Also why for Windows they use named pipes when in theory Windows
| also supports it?
| (https://devblogs.microsoft.com/commandline/af_unix-comes-to-...)
|
| BTW, to the author of this article. It is very good. It was an
| interesting read. The are some small issues:
|
| - "markdown" link didn't get converted to html:
| "[core.untrackedcache](https://git-scm.com/docs/git-
| config#Documentation/git-config...)"
|
| - the link to "philosophy" of Scalar doesn't work:
| https://github.com/microsoft/git/blob/HEAD/contrib/scalar/do...
| infogulch wrote:
| What's the current state of git tooling for large files and
| partial clones?
| infogulch wrote:
| My holy grail implementation would be a "partial clone" that
| downloads desired files like normal, but creates stubs for
| selected files that are not stored on the device but downloaded
| on-demand upon opening them, like the OneDrive Files On-Demand
| [1] or Google Drive File Stream.
|
| [1]: https://support.microsoft.com/en-us/office/save-disk-
| space-w...
| minimalist wrote:
| Have you looked into git-annex?
___________________________________________________________________
(page generated 2022-06-29 23:00 UTC)