[HN Gopher] Fast file synchronization and network forwarding for...
___________________________________________________________________
Fast file synchronization and network forwarding for remote
development
Author : saikatsg
Score : 73 points
Date : 2022-10-16 18:01 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| solarkraft wrote:
| If only macOS supported mounting via SSHFS ...
| parasti wrote:
| Mutagen also has a Docker extension. Really easy to set up. I
| installed it recently after searching for ways to speed up Docker
| on an Apple M1. It did work in my case.
| emrah wrote:
| What is the benefit over rsync which is the perfect tool for this
| at the moment? Maybe add an faq section to the readme for
| questions like this?
| xenoscopic wrote:
| The primary benefits:
|
| - Mutagen performs bidirectional synchronization (though it can
| also operate unidirectionally); rsync is unidirectional
|
| - Mutagen uses recursive filesystem watching to avoid full
| filesystem rescans (whereas rsync always does a full filesystem
| rescan). This allows Mutagen to provide a more "real time"
| sync.
|
| - Mutagen has an active synchronization loop that doesn't
| require manual invocation.
|
| - Mutagen has more idiomatic Windows support.
|
| - Mutagen doesn't require that it be pre-installed on both
| endpoints.
|
| Both use differential transfers (i.e. the "rsync algorithm")
| for transferring individual files.
|
| There are other differences, of course, as well as
| similarities. Mutagen's design is tuned for development work,
| rsync's design is tuned for replication. I still use rsync for
| archival operations on a daily basis - it's great!
| fpoling wrote:
| In past I have used lsyncd to develop locally and synchronize the
| changes to a remote host over ssh where the code base was
| compiled. This worked nicely even over GPRS network connection
| with a speed like 30 Kbit/s. As the link had high latency it was
| important to use emacs shell for the remote connection. This way
| I could type the command locally and send it to the remote host
| when pressing enter.
| cube2222 wrote:
| We've been using Mutagen extensively for remote development with
| an EC2 instance hosting a docker-compose with a couple of
| services and live rebuild+reload, and it's been working
| fantastic.
|
| It's also nice for automatically managing port forwards.
| ta988 wrote:
| I've been using mutagen for over 6 months now to sync over an M1
| Linux VM. The only thing I miss is an option that would say
| "force everything from A" or "force everything from B" I've had
| rare cases where there were conflicts that I only could resolve
| by pausing mutagen and running rsync. But I appreciate that
| mutagen warns you and just doesn't overwrite silently like
| syncthing can do sometimes.
| ithkuil wrote:
| Mutagen allows to choose a replication mode:
| https://mutagen.io/documentation/synchronization
|
| Do you want something different from the "one-way-replica"
| mode?
| ajvs wrote:
| How does it compare to syncthing?
| jedisct1 wrote:
| Super useful tool!
|
| Plus, it's multi platform. I'm using it to synchronize
| directories between hosts running macOS, OpenBSD and Linux.
| Everything works fine.
|
| I haven't tried the Docker Desktop extension since I switched to
| Colima (Docker Desktop is constantly broken on Apple Silicon).
| Naac wrote:
| I haven't found anything better than using Unison. Maybe the
| linked README could compare prior art?
| xenoscopic wrote:
| Conceptually speaking, Mutagen and Unison are very similar (and
| actually I mentioned Benjamin Pierce's work in another comment
| here asking about the sync algorithm - fantastic stuff!). I
| tend to avoid direct comparisons because they always come
| across one-sided, but some cursory differences:
|
| - Mutagen tries to integrate recursive filesystem watching very
| tightly into its synchronization loop to drive synchronization
| and allow for near-instant filesystem rescans
|
| - Mutagen automatically copies an "agent" binary to remote
| systems to support synchronization, so no remote install is
| required
|
| - Mutagen uses Protocol Buffers for its data storage, so
| synchronization sessions created with older versions continue
| to work with newer versions
|
| - Mutagen written in Go, Unison in OCaml (which allows Mutagen
| broader platform support "for free")
|
| - Mutagen tries to treat Windows as a first-class citizen
|
| - Mutagen uses race-free traversal (e.g. openat, fstatat,
| unlinkat, etc.) to perform operations
|
| Obviously the internal implementations are different, but both
| use differential (rsync-style) file transfers, both use the
| same reconciliation concepts, etc.
|
| Mutagen has the advantage of Go, recursive filesystem watching,
| and modern POSIX/Windows APIs that didn't exist when Unison was
| originally written, though some of that functionality has been
| brought into Unison.
|
| For a comparison with Syncthing (and to some extent Unison),
| check out this comment[0].
|
| [0]: https://news.ycombinator.com/item?id=30966448
| karamanolev wrote:
| This sounds like my dream tool - I've always loved how quickly
| and well local tools work and remote environments cut into that
| good experience significantly. For me to be productive, I really
| need an instant feedback loop where tools work fast and I can
| immediately experience the result of some small piece of work.
|
| Has anyone tried this for a real-world project and can share
| feedback?
| grogenaut wrote:
| I generally find systems that aren't setup to let you dev
| locally and require a dev in prod or remote also don't let you
| work in tiny tight feedback loops either. I generally focus
| making it work everywhere the same instead of fast sync but
| that's just me. Well and the systems I have control over.
| ta988 wrote:
| Yes it is excellent, syncing macos (Jetbrains tools and a few
| other things) with a Linux VM .
| cassianoleal wrote:
| I find that VS Code's Remote-* extensions work well. I'm
| currently writing a Terraform provider on a remote Linux box
| using Remote-SSH and everything feels local. Compilation, etc
| happens on the remote and if I were serving requests it's dead
| easy to forward a port.
| fpoling wrote:
| Mutagen tries to be secure so in principle one can develop on
| untrusted remote machine. VSCode remote always assumes that
| the remote part is trusted.
| cassianoleal wrote:
| That sounds interesting but I can't find any mention to it
| in the docs. In fact, it sounds like it's just copying
| files over to the remote and running commands there.
|
| Are you able to provide a reference to how Mutagen secures
| my code on an untrusted remote?
| xenoscopic wrote:
| The general philosophy with Mutagen is to (a) delegate
| encryption to other tools and (b) use secure defaults
| (especially for permissions).
|
| So, for example, Mutagen doesn't implement any
| encryption, instead relying on transports like OpenSSH to
| provide the underlying transport encryption. In the
| Docker case, Mutagen does rely on the user securing the
| Docker transport if using TCP, but works to make this
| clear in the docs, and Mutagen is generally using the
| Docker Unix Domain Socket transport anyway. When
| communicating with itself, Mutagen also only uses secure
| Unix Domain Sockets and Windows Named Pipes.
|
| When it comes to permissions, Mutagen doesn't do a
| blanket transfer of file ownership and permissions.
| Ownership defaults to the user under which the mutagen-
| agent binary is operating and permissions default to
| 0700/0600. The only permission bits that Mutagen
| transfers are executability bits, and only to entities
| with a corresponding read bit set. The idea is that
| synchronizing files to a remote, multi-user system
| shouldn't automatically expose your files to everyone on
| that system. These settings can be tweaked, of course,
| and in certain cases (specifically the Docker Desktop
| extension), broader permissions are used by default to
| emulate the behavior of the existing virtual filesystems
| that Mutagen is replacing.
| AnthonBerg wrote:
| I'd like to know more about the theory behind the synchronisation
| -- how the syncing is known to be safe and non-destructive.
| xenoscopic wrote:
| The synchronization uses a repeated three-way merge algorithm,
| very similar to Git's merge when merging branches. It is
| triggered by recursive filesystem watching, which is also used
| to accelerate filesystem rescans. It maintains a virtual most-
| recent-ancestor and uses the two synchronization endpoints as
| the "branches" being merged. Much like Git has "-X ours" and
| "-X theirs" options, Mutagen also has automated conflict
| resolution[0] modes that can be specified. You can find the
| reconciliation algorithm here[1] (and there are an exhaustive
| set of test cases in the corresponding _test.go file).
|
| To avoid a large class of race conditions (at least to the
| extent possible allowed by POSIX and Windows), Mutagen will use
| `*at` style system calls for all filesystem traversal on POSIX
| systems, with a similar strategy on Windows.
|
| Also, to avoid race conditions due to filesystem changes
| between scan time and change-application time, Mutagen will
| perform just-in-time checks that filesystem contents haven't
| changed from what was fed into the reconciliation algorithm.
|
| [0]: https://mutagen.io/documentation/synchronization#modes
| [1]: https://github.com/mutagen-
| io/mutagen/blob/master/pkg/synchr...
| xenoscopic wrote:
| Also, while Mutagen's exact implementation is novel in a
| number of ways, I would be remiss to not point out that huge
| amount of academic work in this field was done by Benjamin
| Pierce[0] and later implemented in Unison[1].
|
| [0]: https://www.cis.upenn.edu/~bcpierce/papers/index.shtml#S
| ynch... [1]: https://www.cis.upenn.edu/~bcpierce/unison/
| liketochill wrote:
| I've been using unison for what feels like 14 years. Once
| working it was great but it always took me a while to
| figure out the exact command line options I wanted.
| Beautiful tool.
| AnthonBerg wrote:
| Thank you so much for the great replies!
| xani_ wrote:
| How's that compared to sshfs (wth cache/kernel_cache enabled) ?
| I've used it few times where I had need to dev like that and it
| was generally just fine for just editing a file, where
| performance tanked was doing a lot of file I/O at once (say
| updating git repo)
| xenoscopic wrote:
| The benchmarks will likely be highly dependent on your use
| case, but SSHFS-style virtual filesystems (specifically those
| backed by FUSE) typically have significantly lower performance
| than something like an APFS/ext4/NTFS filesystem that Mutagen
| could target with synchronization.
|
| All of your readdir()/stat()/open()/read()-style calls will
| suffer significantly on virtual filesystems, and unfortunately
| these get hit a lot by things like IDEs (e.g. when indexing
| code), compilers, and dynamic language runtimes (especially
| PHP).
|
| No tool is at fault in this chain, of course, it's a hard
| problem. Mutagen is able to offer better performance by being a
| little less dynamic and creating "real" copies of all the files
| on a more persistent filesystem.
| ta988 wrote:
| Advantage of mutagen is that it works on OSes that can't do
| sshfs. It felt faster too especially with a lot of IOs like
| node modules or other things that touch a lot of files. But I
| never ran a benchmark , it is so much faster by at least a
| factor 10 than whatever is in docker desktop when populating
| node modules that I don't even need a benchmark.
| xenoscopic wrote:
| Mutagen author here -- happy to answer any questions about
| Mutagen[0], its Docker Desktop extension[1], its Compose
| integration[2], or anything else!
|
| [0]: https://mutagen.io/ [1]:
| https://mutagen.io/documentation/docker-desktop-extension [2]:
| https://mutagen.io/documentation/orchestration/compose
| notemaker wrote:
| Any user stories with *vim + mutagen for _large_ remote code
| bases? Vs code remote is the only thing that has been fast enough
| in my experience, but I would love to be able to use my local
| neovim instance for remote development instead and this tool
| looks promising.
| xenoscopic wrote:
| It should work fine. Many users use Mutagen on multi-GB
| codebases. If we're talking something larger (say 10s of GBs or
| TB-sized monorepos), then there are some tweaks you can do to
| make life with Mutagen a little easier. Feel free to reach out
| to jacob[-at-]mutagen.io if you have a specific use case, or
| pop over to the Mutagen Community Slack Workspace[0] to chat.
|
| [0]: https://mutagen.io/slack
| eddyg wrote:
| This sounds useful. But one question that comes to mind right
| away:
|
| Does Mutagen handle the case where "local tools" (running on a
| completely different architecture than the remote) still need to
| "know" about include/header/library/etc. files from the _remote_
| machine in order to provide working "intelligence" capabilities?
|
| It's one thing to efficiently sync "code", but it's another to
| make local tools fully-aware of the remote system's header files,
| libraries, etc.
| xenoscopic wrote:
| On the synchronization front, Mutagen's only goal is to
| facilitate the synchronization of files (albeit with a focus on
| development-related settings and low-latency for a "real time"
| feel). It doesn't attempt to integrate with any higher-level
| tooling (except in the cases of Docker Desktop and Compose,
| which is facilitated via external projects). That sort of
| tooling, language, and framework-specific integration is a bit
| outside the project's target scope (and something that becomes
| very domain-specific).
|
| Mutagen will, however, happily operate between different
| operating systems and architectures, so things like working
| with a remote amd64-based Docker engine from your local
| arm64-based laptop are totally possible.
|
| Also, several external projects (such as DDEV[0] and Garden[1])
| do use Mutagen as a low-level component in their stack to
| provide synchronization that does "know" a bit more about the
| framework that you're using.
|
| [0]: https://ddev.com/ [1]: https://garden.io/
___________________________________________________________________
(page generated 2022-10-16 23:00 UTC)