[HN Gopher] Hyperspace
___________________________________________________________________
Hyperspace
Author : tobr
Score : 498 points
Date : 2025-02-25 15:51 UTC (7 hours ago)
(HTM) web link (hypercritical.co)
(TXT) w3m dump (hypercritical.co)
| gnomesteel wrote:
| I don't need this,storage is cheap, but I'm glad it exists.
| ttoinou wrote:
| Storage isnt cheap on macs though. One has to pay 2k USD to get
| 8 TB SSD
| bob1029 wrote:
| Storage comes in many forms. It doesn't need to be soldered
| to the mainboard to satisfy most use cases.
| ttoinou wrote:
| But cleaning / making space on your main soldered drive
| where the OS is is quite important
| NoToP wrote:
| The fact that copying doesn't copy seems dangerous. Like what if
| I wanted to copy for the purpose of modifying the file while
| retaining the original. A trivial example of this might be I have
| a meme template and I want to write text in it while still
| keeping a blank copy of the template.
|
| There's a place for alias file pointers, but lying to the user
| and pretending like an alias is a copy is bound to lead to
| unintended and confusing results
| timabdulla wrote:
| It's copy on write.
| herrkanin wrote:
| It's not a symbolic link - it copies on modification. No need
| to worry!
| hutattedonmyarm wrote:
| It's Copy On Write. When you modify either one it does get
| turned into an actual copy
| IsTom wrote:
| Copy-on-write means that it performs copy only when you make
| the first change (and only copies part that changes, rest is
| used from the original file), until then copying is free.
| mlhpdx wrote:
| Is it file level or block level copy? The latter, I hope.
|
| Update: whoops, missed it in your comment. Block (changed
| bytes) level.
| pca006132 wrote:
| CoW is not aliasing. It will perform the actual copying when
| you modify the file content.
| parwej wrote:
| Psj lagi Eu
| petercooper wrote:
| I love the model of it being free to scan and see if you'd get
| any benefit, then paying for the actual results. I, too, am a
| packrat, ran it, and got 7GB to reclaim. Not quite worth the
| squeeze for me, but I appreciate it existing!
| sejje wrote:
| I also really like this pricing model.
|
| I wish it were more obvious how to do it with other software.
| Often there's a learning curve in the way before you can see
| the value.
| MBCook wrote:
| He's talked about it on the podcast he was on. So many users
| would buy this, run it once, then save a few gigs and be done.
| So a subscription didn't make a ton of sense.
|
| After all how many perfect duplicate files do you probably
| create a month accidentally?
|
| There's a subscription or buy forever option for people who
| think that would actually be quite useful to them. But for a
| ton of people a one time IAP that gives them a limited amount
| of time to use the program really does make a lot of sense.
|
| And you can always rerun it for free to see if you have enough
| stuff worth paying for again.
| jedbrooke wrote:
| it's very refreshing compared to those "free trials" you have
| to remember to cancel (pro tip: use virtual credit cards which
| you can lock for those so if you forget to cancel the charges
| are blocked)
|
| however has anyone been able to find out from the website how
| much the license actually costs?
| Analemma_ wrote:
| In earlier episodes of ATP when they were musing on possible
| names, one listener suggested the frankly amazing "Dupe Nukem". I
| get that this is a potential IP problem, which is why John didn't
| use it, but surely Duke Nukem is not a zealously-defended brand
| in 2025. I think interest in that particular name has been stone
| dead for a while now.
| InsideOutSanta wrote:
| It's a genius name, but Gearbox owns Duke Nukem. They're not
| exactly dormant. Duke Nukem as a franchise made over a billion
| in revenue. In 2023, Zen released a licensed Duke Nukem pinball
| table, so there is at least some ongoing interest in the
| franchise.
|
| I probably wouldn't have risked it, either.
| mzajc wrote:
| Reminds me of Avira's Luke Filewalker - I wonder if they needed
| any special agreement with Lucasfilm/Disney. I couldn't find
| any info on it, and their website doesn't mention Star Wars at
| all.
| siranachronist wrote:
| https://github.com/pkolaczk/fclones can do the same thing, and
| it's perfectly free and open source. terminal based though
| PenguinRevolver wrote:
| brew install fclones
|
| Thanks for the recommendation! Just installed it via homebrew.
| CharlesW wrote:
| _[I was wrong, see below.--cw]_ It doesn 't do the same thing.
| An APFS clone/copy-on-write clone is not the same as a hard or
| soft link. https://eclecticlight.co/2019/01/05/aliases-hard-
| links-symli...
| PenguinRevolver wrote:
| Your source points out that:
|
| < _You can also create [APFS (copy on write) clones] in
| Terminal using the command `cp -c oldfilename newfilename`
| where the c option requires cloning rather than a regular
| copy._
|
| `fclones dedupe` uses the same command[1]: if
| cfg!(target_os = "macos") { result.push(format!("cp
| -c {target} {link}"));
|
| [1] https://github.com/pkolaczk/fclones/blob/555cde08fde4e700
| b25...
| CharlesW wrote:
| I stand corrected, thank you!
| rahimnathwani wrote:
| Hyperspace said I can save 10GB.
|
| But then I ran this command and saved over 20GB:
| brew install fclones cd ~ fclones group . | fclones
| dedupe
|
| I've used fclones before in the default mode (create hard
| links) but this is the first time I've run it at the top level
| of my home folder, in dedupe mode (i.e. using APFS clones).
| Fingers crossed it didn't wreck anything.
| diimdeep wrote:
| Nice, also compression at file system level can save a lot of
| space and with current CPU speeds is completely transparent. It
| is feature from HFS+ that is still works in APFS, but is not
| officially supported anymore, what is wrong with you Apple ?
|
| This tool to enable compression is free and open source
|
| https://github.com/RJVB/afsctool
|
| Also note about APFS vs HFS+, if you use HDD e.g. as backup
| media for Time Machine, HFS+ is must have over APFS as it is
| optimised only for SSD (random access).
|
| https://bombich.com/blog/2019/09/12/analysis-apfs-enumeratio...
|
| https://larryjordan.com/blog/apfs-is-not-yet-ready-for-tradi...
|
| Not so smart Time Machine setup utility forcefully re-creates
| APFS on a HDD media, so you have to manually create HFS+ volume
| (e.g. with Disk Utily) and then use terminal command to add
| this volume as TM destination
|
| `sudo tmutil setdestination /Volumes/TM07T`
| herrkanin wrote:
| As a web dev, it's been fun listening to Accidental Tech Podcast
| where Siracusa has been talking (or ranting) about the ins and
| outs of developing modern mac apps in Swift and SwiftUI.
| Analemma_ wrote:
| The part where he said making a large table in HTML and
| rendering it with a web view was orders of magnitude faster
| than using the SwiftUI native platform controls made me bash my
| head against my desk a couple times. What are we doing here,
| Apple.
| mohsen1 wrote:
| Hacker News loves to hate Electron apps. In my experience
| ChatGPT on Mac (which I assume is fully native) is nearly
| impossible to use because I have a lot of large chats in my
| history but the website works much better and faster. ChatGPT
| website packed in Electron would've been much better. In
| fact, I am using a Chrome "PWA App" for ChatGPT now instead
| of the native app.
| RandomDistort wrote:
| Someone more experienced that me could probably comment on
| this more, but theoretically is it possible for Electron
| production builds to become more efficient by having a much
| longer build process and stripping out all the unnecessary
| parts of Chromium?
| wat10000 wrote:
| It's possible to make bad apps with anything. The
| difference is that, as far as I can tell, it's not possible
| to make good apps with Electron.
| avtar wrote:
| > In my experience ChatGPT on Mac (which I assume is fully
| native)
|
| If we are to believe ChatGPT itself: "The ChatGPT macOS
| desktop app is built using Electron, which means it is
| primarily written in JavaScript, HTML, and CSS"
| spiderfarmer wrote:
| As a web dev I must say that this segment made me happy and
| thankful for the browser team that really knows how to
| optimize.
| megaman821 wrote:
| I wish there were modern benchmarks against browser engines.
| A long time ago native apps were much faster at rendering UI
| than the browser, but that may performance rewrites ago, so I
| wonder how browsers perform now.
| airstrike wrote:
| Shoutout to iced, my favorite GUI toolkit, which isn't even
| in 1.0 yet but can do that with ease and faster than anything
| I've ever seen: https://github.com/iced-rs/iced
|
| https://github.com/tarkah/iced_table is a third-party widget
| for tables, but you can roll out your own or use other
| alternatives too
|
| It's in Rust, not Swift, but I think switching from the
| latter to the former is easier than when moving away from
| many other popular languages.
| BobAliceInATree wrote:
| SwiftUI is a joke when it comes to performance. Even Marco's
| Overcast stutters when displaying a table of a dozen rows (of
| equal height).
|
| That being said, it's not quite an apples to apples
| comparison, because SwiftUI or UIKit can work with basically
| an infinite number of rows, whereas HTML will eventually get
| to a point where it won't load.
| wpm wrote:
| I love the new Overcast's habit of mistaking my scroll
| gestures for taps when browsing the sections of a podcast.
| divan wrote:
| What are the potential risks or problems of such conversion of
| duplicates into APFS clones?
| captn3m0 wrote:
| The linked docs cover this in detail.
| pca006132 wrote:
| Is this the dedup function provided by other FS?
| coder543 wrote:
| I think the term to search for is reflink. Btrfs is one
| example: https://btrfs.readthedocs.io/en/latest/Reflink.html
|
| Like with Hyperspace, you would need to use a tool that can
| identify which files are duplicates, and then convert them into
| reflinks.
| pca006132 wrote:
| I thought reflink is provided by the underlying FS, and
| Hyperspace is a dedup tool that finds the duplicates.
| coder543 wrote:
| Yes. Hyperspace is finding the identical files and then
| replacing all but one copy with a reflink copy using the
| filesystem's reflink functionality.
|
| When you asked about the filesystem, I assumed you were
| asking about which filesystem feature was being used, since
| hyperspace itself is not provided by the filesystem.
|
| Someone else mentioned[0] fclones, which can do this task
| of finding and replacing duplicates with reflinks on more
| than just macOS, if you were looking for a userspace tool.
|
| [0]: https://news.ycombinator.com/item?id=43173713
| MBCook wrote:
| Hyperspace uses built in APFS features, it just applies
| them to existing files.
|
| You only get CoW on APFS if you copy a file with certain
| APIs or tools.
|
| If you have a program that does it manually, you copied a
| duplicate to somewhere on your desk from some other source,
| or your files already existed on the file system when you
| converted to APFS because you've been carrying them for a
| long time then you'd have duplicates.
|
| APFS doesn't _look_ for duplicates at any point. It just
| keeps track of those that it knows are duplicates because
| of copy operations.
| zerd wrote:
| You can do the same with `cp -c` on macOS, or `cp
| --reflink=always` on Linux, if your filesystem supports it.
| kevincox wrote:
| Yes, Linux has a systemcall to do this for any filesystem with
| reflink support (and it is safe and atomic). You need a
| "driver" program to identify duplicates but there are a handful
| out there. I've used https://github.com/markfasheh/duperemove
| and was very pleased with how it worked.
| exitb wrote:
| What are examples of files that make up the "dozens of gigabytes"
| of duplicated data?
| xnx wrote:
| There are some CUDA files that every local AI app install that
| take multiple GB.
| wruza wrote:
| Also models that various AI libraries and plugins love to
| autodownload into custom locations. Python folks definitely
| need to learn caching, symlinks, asking a user where to store
| data, or at least logging where they actually do it.
| butlike wrote:
| audio files; renders, etc.
| password4321 wrote:
| iMovie used to copy video files etc. into its "library".
| zerd wrote:
| .terraform, rust target directory, node_modules.
| jarbus wrote:
| In my experience, Macs use up a ridiculous amount of "System"
| storage for no reason that users can't delete. I've grown tired
| of family members asking me to help them free up storage that I
| can't even find. That's the major issue from what I've seen;
| unless this app prevents apple deliberately eating up 50%+ of the
| storage space of a machine, this doesn't do much for the people I
| know.
| ezfe wrote:
| There's no magic around it, macOS just doesn't do a good job
| explaining it using the built in tools. Just use Daisy Disk or
| something. It's all there and can be examined.
| p_ing wrote:
| These are often Time Machine snapshots. Nuking those can free
| up quite a bit of space. sudo tmutil
| listlocalsnapshots / sudo tmutil deletelocalsnapshots
| <date_value_of_snapshot>
| Jaxan wrote:
| Even without time machine there are loads of storage spent on
| "system". Especially now with the apple intelligence (even
| when turned off).
| p_ing wrote:
| Apple "Intelligence" gets its own category in 15.3.1.
| sir_eliah wrote:
| There's a cross-platform open-source version of this program:
| https://github.com/qarmin/czkawka
| spiderfarmer wrote:
| That's not remotely comparable.
| nulld3v wrote:
| I don't think czkawa supports deduplication via reflink so it's
| not exactly the same thing. fclones as linked by another user
| is more similar: https://news.ycombinator.com/item?id=43173713
| bhouston wrote:
| I gave it a try on my massive folder of NodeJS projects but it
| only found 1GB of savings on a 8.1GB folder.
|
| I then tried again including my user home folder (731K files,
| 127K folders, 2755 eligible files) to hopefully catch more
| savings and I only ended up at 1.3GB of savings (300MB more than
| just what was in the NodeJS folders.)
|
| I tried to scan System and Library but it refused to do so
| because of permission issues.
|
| I think the fact that I use pnpm for my package manager has made
| my disk space usage already pretty near optimal.
|
| Oh well. Neat idea. But the current price is too high to justify
| this. Also I would want it as a background process that runs once
| a month or something.
| lou1306 wrote:
| > it only found 1GB of savings on a 8.1GB folder.
|
| You "only" found that 12% of the space you are using is wasted?
| Am I reading this right?
| warkdarrior wrote:
| The relevant number (missing from above) is the total amount
| of space on that storage device. If it saves 1GB on a 8TB
| drive, it's not a big win.
| jy14898 wrote:
| If it saved 8.1GB, by your measure it'd also not be a big
| win?
| horsawlarway wrote:
| This is basically only a win on macOS, and only because
| Apple charges through the nose for disk space.
|
| Ex - On my non-apple machines, 8GB is trivial. I load
| them up with the astoundingly cheap NVMe drives in the
| multiple terabyte range (2TB for ~$100, 4TB for ~$250)
| and I have a cheap NAS.
|
| So that "big win" is roughly 40 cents of hardware costs
| on the direct laptop hardware. Hardly worth the time and
| effort involved, even if the risk is zero (and I don't
| trust it to be zero).
|
| If it's just "storage" and I don't need it fast (the
| perfect case for this type of optimization) I throw it on
| my NAS where it's cheaper still... Ex - it's not 40 cents
| saved, it's ~10.
|
| ---
|
| At least for me, 8GB is no longer much of a win. It's a
| rounding error on the last LLM model I downloaded.
|
| And I'd suggest that basically anyone who has the ability
| to not buy extortionately priced drives soldered onto a
| mainboard is not really winning much here either.
|
| I picked up a quarter off the ground on my walk last
| night. That's a bigger win.
| borland wrote:
| > This is basically only a win on macOS, and only because
| Apple charges through the nose for disk space
|
| You do realize that this software is only available on
| macOS, and only works because of Apple's APFS filesystem?
| You're essentially complaining that medicine is only a
| win for people who are sick.
| horsawlarway wrote:
| > and only works because of Apple's APFS filesystem
|
| There are lots of other file systems that support this
| kind of deduplication...
|
| Like ZFS that the author of the software explicitly
| mentions in his write up
| https://www.truenas.com/docs/references/zfsdeduplication/
|
| Or Btrfs ex: https://kb.synology.com/en-
| id/DSM/help/DSM/StorageManager/vo...
|
| Or hell, even NTFS: https://learn.microsoft.com/en-
| us/windows-server/storage/dat...
|
| This is NOT a novel or new feature in filesystems...
| Basically any CoW file system will do it, and lots of
| other filesystems have hacks built on top to support this
| kinds of feature.
|
| ---
|
| My point is that "people are only sick" because the
| company is pricing storage outrageously. Not that Apple
| is the only offender in this space - but man are they the
| most egregious.
| oneeyedpigeon wrote:
| It should be proportional to the total _used_ space, not
| the space available. The previous commenter said it was a 1
| GB savings from ~8 GB of used space; that 's equally
| significant whether it happens on a 10 GB drive or a 10 TB
| one.
| horsawlarway wrote:
| He picked node_modules because it's highly likely to
| encounter redundant files there.
|
| If you read the rest of the comment he only saved another
| 30% running his entire user home directory through it.
|
| So this is not a linear trend based on space used.
| borland wrote:
| He "only" saved 30%? That's amazing. I really doubt most
| people are going to get anywhere near that.
|
| When I run it on my home folder (Roughly 500GB of data) I
| find 124 MB of duplicated files.
|
| At this stage I'd like it to tell me what those files are
| - The dupes are probably dumb ones that I can simply go
| delete by hand, but I can understand why he'd want people
| to pay up first, as by simply telling me what the dupes
| are he's proved the app's value :-)
| wlesieutre wrote:
| Another 30% more than the 1GB saved in node modules, for
| 1.3GB total. Not 30% of total disk space.
|
| For reference, from the comment they're talking about:
|
| _> I then tried again including my user home folder
| (731K files, 127K folders, 2755 eligible files) to
| hopefully catch more savings and I only ended up at 1.3GB
| of savings (300MB more than just what was in the NodeJS
| folders.)_
| bhouston wrote:
| > He "only" saved 30%? That's amazing. I really doubt
| most people are going to get anywhere near that.
|
| You misunderstood my comment. I ran it on my home folder
| which contains 165GB of data and it found 1.3GB is
| savings. That isn't significant for me to care about
| because I currently have 225GB free of my 512GB drive.
|
| BTW I highly recommend the free "disk-inventory-x"
| utility for MacOS space management.
| rconti wrote:
| Absolutely, 100% backwards. The tool cannot save space from
| disk space that is not scanned. Your "not a big win"
| comment assumes that there is _no space left to be
| reclaimed on the rest of the disk_. Or that the disk is not
| empty, or that the rest of the disk can 't be reclaimed at
| an even higher rate.
| bhouston wrote:
| I have a 512GB drive in my MacBook Air M3 with 225GB free.
| Saving 1GB is 0.5% of my total free space, and it is
| definitely "below my line." It is a neat tool still in
| concept.
|
| When I ran it on my home folder with 165GB of data it only
| found 1.3GB of savings. This isn't that significant to me and
| it isn't really worth paying for.
|
| BTW I highly recommend the free "disk-inventory-x" utility
| for MacOS space management.
| timerol wrote:
| Your original comment did not mention that your home folder
| was 165 GB, which is extremely relevant here
| zamalek wrote:
| pnpm tries to be a drop-in replacement for npm, and dedupes
| automatically.
| diggan wrote:
| > pnpm tries to be a drop-in replacement for npm
|
| True
|
| > and dedupes automatically
|
| Also true.
|
| But the way you put them after each other, makes it sound
| like npm does de-duplication, and since pnpm tries to be a
| drop-in replacement for npm, so does pnpm.
|
| So for clarification: npm doesn't do de-duplication across
| all your projects, and that in particular was of the more
| useful features that pnpm brought to the ecosystem when it
| first arrived.
| MrJohz wrote:
| More importantly, pnpm installs packages as symlinks, so the
| deduping is rather more effective. I believe it also tries to
| mirror the NPM folder structure and style of deduping as
| well, but if you have two of the same package installed
| anywhere on your system, pnpm will only need to download and
| save one copy of that package.
| spankalee wrote:
| npm's --install-strategy=linked flag is supposed to do this
| too, but it has been broken in several ways for years.
| modzu wrote:
| whats the price? doesnt seem to be published anywhere
| scblock wrote:
| It's on the Mac App Store so you'll find the pricing there.
| Looks like $10 for one month (one time use maybe?), $20 for a
| year, $50 lifetime.
| diggan wrote:
| Even if I have both a Mac and iPhone, but happen to use my
| Linux computer right now, it seems like the store page
| (https://apps.apple.com/us/app/hyperspace-reclaim-disk-
| space/...) is not showing the price, probably because I'm
| not actively on a Apple device? Seems like a poor UX even
| for us Mac users.
| oneeyedpigeon wrote:
| I see it on my android phone. It's a free app but the
| subs are an in-app purchase so you need to hunt that
| section down.
| pimlottc wrote:
| It's buried under a drop-down in the "Information"
| section, under "In-App Purchases". I agree, it's not the
| greatest.
| diggan wrote:
| Ah, you're absolutely right, missed that completely.
| Buried at the bottom of the page :) Thanks for pointing
| it out.
| MBCook wrote:
| It's a side effect of the terrible store design.
|
| It's a free app because you don't have to buy it to run
| it. It will tell you how much space it can save you for
| free. So you don't have to waste $20 to find out it only
| would've been 2kb.
|
| But that means the parts you actually have to buy are in
| app purchases, which are always hidden on the store
| pages.
| piqufoh wrote:
| PS9.99 a month, PS19.99 for one year, PS49.99 for life (app
| store purchase prices visible once you've scanned a
| directory).
| p_ing wrote:
| > I tried to scan System and Library but it refused to do so
| because of permission issues.
|
| macOS has a sealed volume which is why you're seeing permission
| errors.
|
| https://support.apple.com/guide/security/signed-system-volum...
| bhouston wrote:
| For some reason "disk-inventory-x" will scan those folders. I
| used that amazing tool to prune left over Unreal Engine files
| and docker caches when they put them not in my home folder.
| The tool asks for a ton of permissions when you run it in
| order to do the scan though, which is a bit annoying.
| alwillis wrote:
| It's not obvious but the system folder is on a separate,
| secure volume; the Finder does some trickery to make the
| system volume and the data volume appear as one.
|
| In general, you don't want to mess with that.
| kdmtctl wrote:
| Didn't have time to try it myself, but there is an option for
| minimal files size to consider clearly seen on the AppStore
| screenshot. I suppose it was introduced to minimize comparison
| buffers. It is possible that node modules are sliding under
| this size and wasn't considered.
| jbverschoor wrote:
| Does it preserve all metadata, extended attributes, and alternate
| streams/named forks?
| atommclain wrote:
| He spoke to this on No Longer ery Good, episode 626 of The
| Accidental Tech Podcast. Time stamp ~1:32:30
|
| It tries, but there are some things it can't perfectly preserve
| like the last access time. Instances where it can't duplicate
| certain types of extended attributes or ownership permissions
| it will not perform the operation.
|
| https://podcasts.apple.com/podcast/id617416468?i=10006919599...
| jbverschoor wrote:
| Well, the FAQ also states that people should notify if you're
| missing attributes, so it really sounds like it's a
| predefined list instead of just enumeration through
| everything.
|
| No word about alternate data streams. I'll pass for now..
| Although it's nice to see how much duplicates you have
| criddell wrote:
| The FAQ talks about this a little:
|
| Q: Does Hyperspace preserve file metadata during reclamation?
|
| A: When Hyperspace replaces a file with a space-saving clone,
| it attempts to preserve all metadata associated with that file.
| This includes the creation date, modification date,
| permissions, ownership, Finder labels, Finder comments, whether
| or not the file name extension is visible, and even resource
| forks. If the attempt to preserve any of these piece of
| metadata fails, then the file is not replaced.
|
| If you find some piece of file metadata that is not preserved,
| please let us know.
|
| Q: How does Hyperspace handle resource forks?
|
| A: Hyperspace considers the contents of a file's resource fork
| to be part of the file's data. Two files are considered
| identical only if their data and resource forks are identical
| to each other.
|
| When a file is replaced by a space-saving clone during
| reclamation, its resource fork is preserved.
| bob1029 wrote:
| > There is no way for Hyperspace to cooperate with all other
| applications and macOS itself to coordinate a "safe" time for
| those files to be replaced, nor is there a way for Hyperspace for
| forcibly take exclusive control of those files.
|
| This got me wondering why the filesystem itself doesn't run a
| similar kind of deduplication process in the background.
| Presumably, it is at a level of abstraction where it could safely
| manage these concerns. What could be the downsides of having this
| happen automatically within APFS?
| pizzafeelsright wrote:
| data loss is the largest concern
|
| I still do not trust de-duplication software.
| dylan604 wrote:
| Even using sha-256 or greater type of hashing, I'd still have
| concerns about letting a system make deletion decisions
| without my involvement. I've even been part of de-dupe
| efforts, so maybe my hesitation is just because I wrote some
| of the code and I know I'm not perfect in my coding or even
| my algo decision trees. I know that any mistake I made would
| not be of malice but just ignorance or other stupid mistake.
|
| I've done the entire compare every file via hashing and then
| log each of the matches for humans to compare, but never has
| any of that ever been allowed to mv/rm/link -s anything. I
| feel my imposter syndrome in this regard is not a bad thing.
| borland wrote:
| Now you understand why this app costs more than 2x the
| price of alternatives such as diskDedupe.
|
| Any halfway-competent developer can write some code that
| does a SHA256 hash of all your files and uses the Apple
| filesystem API's to replace duplicates with shared-clones.
| I know swift, I could probably do it in an hour or two.
| Should you trust my bodgy quick script? Heck no.
|
| The author - John Siracusa - has been a professional
| programmer for decades and is an exceedingly meticulous
| kind of person. I've been listening to the ATP podcast
| where they've talked about it, and the app has undergone an
| absolute ton of testing. Look at the guardrails on the FAQ
| page https://hypercritical.co/hyperspace/ for an example of
| some of the extra steps the app takes to keep things safe.
| Plus you can review all the proposed file changes before
| you touch anything.
|
| You're not paying for the functionality, but rather the
| care and safety that goes around it. Personally, I would
| trust this app over just about any other on the mac.
| btilly wrote:
| More than TeX or SQLite?
| criddell wrote:
| > I'd still have concerns about letting a system make
| deletion decisions without my involvement
|
| You are involved. You see the list of duplicates and can
| review them as carefully as you'd like before hitting the
| button to write the changes.
| dylan604 wrote:
| Yeah, the lack of involvement was more in response to ZFS
| doing this not this app. I could have crossed the streams
| with other threads about ZFS if it's not directly in this
| thread
| axus wrote:
| Question for the developer: what's your liability if user
| files are corrupted?
| codazoda wrote:
| Most EULA's would disclaim liability for data loss and
| suggest users keep good backups. I haven't read a EULA for
| a long time, but I think most of them do so.
| borland wrote:
| I can't find a specific EULA or disclaimer for the
| Hyperspace app, but given that the EULA's for major
| things like Microsoft Office basically say "we offer you
| no warranty or recourse no matter what this software
| does" I would hardly expect an indie app to offer
| anything like that
| albertzeyer wrote:
| > This got me wondering why the filesystem itself doesn't run a
| similar kind of deduplication process in the background.
|
| I think that ZFS actually does this.
| https://www.truenas.com/docs/references/zfsdeduplication/
| pmarreck wrote:
| It's considered an "expensive" configuration that is only
| good for certain use-cases, though, due to its memory
| requirements.
| abrookewood wrote:
| Yes true, but that page also covers some recent
| improvements to de-duplication that might assist.
| p_ing wrote:
| Windows Server does this for NTFS and ReFS volumes. I used it
| quite a bit on ReFS w/ Hyper-V VMs and it worked _wonders_. Cut
| my storage usage down by ~45% with a majority of Windows Server
| VMs running a mix of 2016 /2019 at the time.
| borland wrote:
| Yep. At a previous job we had a file server that we published
| Windows build output to.
|
| There were about 1000 copies of the same pre-requisite .NET
| and VC++ runtimes (each build had one) and we only paid for
| the cost of storing it once. It was great.
|
| It is worth pointing out though, that on Windows Server this
| deduplication is a background process; When new duplicate
| files are created, they genuinely are duplicates and take up
| extra space, but once in a while the background process comes
| along and "reclaims" them, much like the Hyperspace app here
| does.
|
| Because of this (the background sweep process is expensive),
| it doesn't run all the time and you have to tell it which
| directories to scan.
|
| If you want "real" de-duplication, where a duplicate file
| will never get written in the first place, then you need
| something like ZFS
| sterlind wrote:
| hey, it's defrag all over again!
|
| _(not really, since it 's not fragmentation, but
| conceptually similar)_
| p_ing wrote:
| Both ZFS and WinSvr offer "real" dedupe. One is on-write,
| which requires a significant amount of available memory,
| the other is on a defined schedule, which uses considerably
| less memory (300MB + 10MB/TB).
|
| ZFS is great if you believe you'll exceed some threshold of
| space while writing. I don't personally plan my volumes
| with that in mind but rather make sure I have some amount
| of excess free space.
|
| WinSvr allows you to disable dedupe if you want (don't know
| why you would) where as ZFS is a one-way street without
| exporting the data.
|
| Both have pros and cons. I can live with the WinSvr cons
| while ZFS cons (memory) would be outside of my budget, or
| would have been at the particular time with the particular
| system.
| taneliv wrote:
| On ZFS it consumes a lot of RAM. In part I think this is
| because ZFS does it on the block level, and has to keep track
| of a lot of blocks to compare against when a new one is written
| out. It might be easier on resources if implemented on the file
| level. Not sure if the implementation would be simpler or more
| complex.
|
| It might also be a little unintuitive that modifying one byte
| of a large file would result in a lot disk activity, as the
| file system would need to duplicate the file again.
| gmueckl wrote:
| Files are always represented as lists of blocks or block
| spans within a file system. Individual blocks could in theory
| be partially shared between files at the complexity cost of a
| reference counter for each block. So changing a single byte
| in a copy on write file could take the same time regardless
| of file size because only the affected bock would have to be
| duplicated. I don't know at all how MacOS implements this
| copynon write scheme, though.
| MBCook wrote:
| APFS is a copy on write filesystem if you use the right
| APIs, so it does what you describe but only for entire
| files.
|
| I believe as soon as you change a single bite you get a
| complete copy that's your own.
|
| And that's how this program works. It finds perfect
| duplicates and then effectively deletes and replaces them
| with a copy of the existing file so in the background
| there's only one copy of the bits on the disk.
| mintplant wrote:
| I suppose this means that you could find yourself
| unexpectedly out of disk space in unintuitive ways, if
| you're only trying to change one byte in a cloned file
| but there isn't enough space to copy its entire contents?
| pansa777 wrote:
| It doesn't work like you think. If you change one byte of
| duplicated file - the only "byte" will be changed on disk
| (a "byte", because, technically is not a byte, but a
| block).
|
| As far as I understand, it works like a reflink feature
| in the modern linux FSs. If so, thats really cool, and
| thats also a bit better than the zfs's snapshots. Iam
| newbie on macos, but it looks amazing
| MBCook wrote:
| I'm not sure if it works on a file or block level for
| CoW, but yes.
|
| However APFS gives you a number of space related foot-
| guns if you want. You can overcommit partitions, for
| example.
|
| It also means if you have 30 GB of files on disk that
| could take up anywhere from a few hundred K to 30 GB of
| actual data depending on how many dupes you have.
|
| It's a crazy world, but it provides some nice features.
| tonyedgecombe wrote:
| > I believe as soon as you change a single bite you get a
| complete copy that's your own.
|
| I think it stores a delta:
|
| https://en.m.wikipedia.org/wiki/Apple_File_System#Clones
| alwillis wrote:
| That's not how this works. Nothing is deleted. It creates
| zero-space clones of existing files.
|
| https://en.wikipedia.org/wiki/Apple_File_System?wprov=sft
| i1#...
| amzin wrote:
| Is there a FS that keeps only diffs in clone files? It would
| be neat
| rappatic wrote:
| I wondered that too.
|
| If we only have two files, A and its duplicate B with some
| changes as a diff, this works pretty well. Even if the user
| deletes A, the OS could just apply the diff to the file on
| disk, unlink A, and assign B to that file.
|
| But if we have A and two different diffs B1 and B2, then
| try to delete A, it gets a little murkier. Either you do
| the above process and recalculate the diff for B2 to make
| it a diff of B1; or you keep the original A floating around
| on disk, not linked to any file.
|
| Similarly, if you try to modify A, you'd need to
| recalculate the diffs for all the duplicates.
| Alternatively, you could do version tracking and have the
| duplicate's diffs be on a specific version of A. Then every
| file would have a chain of diffs stretching back to the
| original content of the file. Complex but could be useful.
|
| It's certainly an interesting concept but might be more
| trouble than it's worth.
| abrookewood wrote:
| ZFS does this by de-duplicating at the block level, not
| the file level. It means you can do what you want without
| needing to keep track of a chain of differences between
| files. Note that de-duplication on ZFS has had issues in
| the past, so there is definitely a trade-off. A newer
| version of de-duplication sounds interesting, but I don't
| have any experience with it:
| https://www.truenas.com/docs/references/zfsdeduplication/
| UltraSane wrote:
| VAST storage does something like this. Unlike how most
| storage arrays identify the same block by hash and only
| store it once VAST uses a content aware hash so hashes of
| similar blocks are also similar. They store a reference
| block for each unique hash and then when new data comes in
| and is hashed the most similar block is used to create byte
| level deltas against. In practice this works extremely
| well.
|
| https://www.vastdata.com/blog/breaking-data-reduction-
| trade-...
| abrookewood wrote:
| ZFS: "The main benefit of deduplication is that, where
| appropriate, it can greatly reduce the size of a pool and
| the disk count and cost. For example, if a server stores
| files with identical blocks, it could store thousands or
| even millions of copies for almost _no extra disk space_. "
| (emphasis added)
|
| https://www.truenas.com/docs/references/zfsdeduplication/
| alwillis wrote:
| That's how APFS works; it uses delta extents for tracking
| differences in clones: https://en.wikipedia.org/wiki/Delta_
| encoding?wprov=sfti1#Var...
| abrookewood wrote:
| In regards to the second point, this isn't correct for ZFS:
| "If several files contain the same pieces (blocks) of data or
| any other pool data occurs more than once in the pool, ZFS
| stores just one copy of it. Instead of storing many copies of
| a book it stores one copy and an arbitrary number of pointers
| to that one copy." [0]. So changing one byte of a large file
| will not suddenly result in writing the whole file to disk
| again.
|
| [0] https://www.truenas.com/docs/references/zfsdeduplication/
| karparov wrote:
| Not the whole file but it would duplicate the block. GP
| didn't claim that the whole file is copied.
| btilly wrote:
| This applies to modifying a byte. But inserting a byte will
| change every block from then on, and will force a rewrite.
|
| Of course, that is true of most filesystems.
| asdfman123 wrote:
| If Apple is anything like where I work, there's probably a
| three-year-old bug ticket in their system about it and no real
| mandate from upper management to allocate resources for it.
| ted_dunning wrote:
| This is commonly done with compression on block storage
| devices. That fails, of course, if the file system is
| encrypting the blocks it sends down to the device.
|
| Doing deduplication at this level is nice because you can
| dedupe across file systems. If you have, say, a thousand
| systems that all have the same OS files you can save vats of
| storage. Many times, the only differences will be system
| specific configurations like host keys and hostnames. No single
| filesystem could recognize this commonality.
|
| This fails when the deduplication causes you to have fewer
| replicas of files with intense usage. To take the previous
| example, if you boot all thousand machines at the same time,
| you will have a prodigious I/O load on the kernel images.
| UltraSane wrote:
| NTFS supports deduplication but it is only available on Server
| versions which is very annoying.
| nielsbot wrote:
| Disk Utility.app manages to keep the OS running while make the
| disk exclusive-access.. I wonder how it does that.
| 999900000999 wrote:
| A 20$ 1 year licence for something that probably has a FOSS
| equivalent on Linux...
|
| However, considering Apple will never ever ever allow user
| replaceable storage on a laptop, this might be worth it.
| ezfe wrote:
| The cost is because of the fact people won't use it regularly.
| The developer is offering life time unlocks, lower cost levels
| for shorter timeframes etc.
| p_ing wrote:
| The developer does need to make up for the $100 yearly
| privilege of publishing the app to the App Store.
| jeroenhd wrote:
| I have yet to see a GUI variant of deduplication software for
| Linux. There are plenty of command line tools, which probably
| can be ported to macOS, but there's no user friendly tool to
| just click through as far as I know.
|
| There's value in convenience. I wouldn't pay for a yearly
| license (that price seems more than fair for a "version
| lifetime" price to me?) but seeing as this tool will probably
| need constant maintenance as Apple tweaks and changes APFS over
| time, combined with the mandatory Apple taxes for publishing
| software like this, it's not too awful.
| 999900000999 wrote:
| 50$ for a lifetime license.
|
| Which really means up until the dev gets bored, which can be
| as short as 18 months.
|
| I wouldn't mind something like this versioned to OS. 20$ for
| the current OS, and ten dollars for every significant update.
| artimaeis wrote:
| The Mac App Store (and all of Apple's App Stores) doesn't
| enable this sort of licensing. It's exactly the sort of
| thing that drives a lot of developers to independent
| distribution.
|
| That's why we see so many more subscription-based apps
| these days, application development is an ongoing process
| with ongoing costs, so it needs to have ongoing income. But
| the traditional buy-it-once app pricing doesn't enable that
| long-term development and support. The app store supports
| subscriptions though, so now we get way more subscription-
| based apps.
|
| I really think Siracusa came up with a clever pricing
| scheme here, given his want to use the app store for
| distribution.
| 999900000999 wrote:
| Okay I stand corrected.
| ZedZark wrote:
| I did this with two scripts - one that produces and cached sha1
| sums of files, and another that consumes the output of the first
| (or any of the *sum progs) and produces stats about duplicate
| files, with options to delete or hard-link them.
| strunz wrote:
| I wonder how any comments about hard links will be in these
| comments by people misunderstanding what this app does.
| theamk wrote:
| if file is not going to be modified (in the low-level sense -
| open("w") on the filename; as opposed to rename-and-create-
| new), then reflinks (what this app does) and hardlinks act
| somewhat identically.
|
| For example if you have multiple node_modules, or app
| installs, or source photos/videos (ones you don't edit), or
| music archives, then hardlinks work just fine.
| albertzeyer wrote:
| I wrote a similar (but simpler) script which would replace a file
| by a hardlink if it has the same content.
|
| My main motivation was for the packages of Python virtual envs,
| where I often have similar packages installed, and even if
| versions are different, many files would still match. Some of the
| packages are quite huge, e.g. Numpy, PyTorch, TensorFlow, etc. I
| got quite some disk space savings from this.
|
| https://github.com/albertz/system-tools/blob/master/bin/merg...
| andrewla wrote:
| This does not use hard links or symlinks; this uses a feature
| of the filesystem that allows the creation of copy-on-write
| clones. [1]
|
| [1] https://en.wikipedia.org/wiki/Apple_File_System#Clones
| gurjeet wrote:
| So albertzeyer's script can be adapted to use `cp -c`
| command, to achieve the same effect as Hyperspace.
| diggan wrote:
| > Like all my apps, Hyperspace is a bit difficult to explain.
| I've attempted to do so, at length, in the Hyperspace
| documentation. I hope it makes enough sense to enough people that
| it will be a useful addition to the Mac ecosystem.
|
| Am I missing something, or isn't it a "file de-duplicator" with a
| nice UI/UX? Sounds pretty simple to describe, and tells you why
| it's useful with just two words.
| protonbob wrote:
| No because it isn't getting rid of the duplicate, it's using a
| feature of APFS that allows for duplicates to exist separately
| but share the same internal data.
| yayoohooyahoo wrote:
| Is it not the same as a hard link (which I believe are
| supported on Mac too)?
| andrewla wrote:
| My understanding is that it is a copy-on-write clone, not a
| hard link. [1]
|
| > Q: Are clone files the same thing as symbolic links or
| hard links?
|
| > A: No. Symbolic links ("symlinks") and hard links are
| ways to make two entries in the file system that share the
| same data. This might sound like the same thing as the
| space-saving clones used by Hyperspace, but there's one
| important difference. With symlinks and hard links, a
| change to one of the files affects all the files.
|
| > The space-saving clones made by Hyperspace are different.
| Changes to one clone file do not affect other files. Cloned
| files should look and behave exactly the same as they did
| before they were converted into clones.
|
| [1] https://hypercritical.co/hyperspace/
| dylan604 wrote:
| What kind of changes could you make to one clone that
| would still qualify it as a clone? If there are changes,
| it's no longer the same file. Even after reading the How
| It Works[0] link, I'm not groking how it works. Is it
| making some sort of delta/diff that is applied to the
| original file? That's not possible for every file format
| like large media files. I could see that being
| interesting for text based files, but that gets
| complicated for complex files.
|
| [0] https://hypercritical.co/hyperspace/#how-it-works
| aeontech wrote:
| If I understand correctly, a COW clone references the
| same contents (just like a hardlink) as long as all the
| filesystem references are pointing to identical file
| contents.
|
| Once you open one of the reference handles and modify the
| contents, the copy-on-write process is invoked by the
| filesystem, and the underlying data is copied into a new,
| separate file with your new changes, breaking the link.
|
| Comparing with a hardlink, there is no copy-on-write, so
| any changes made to the contents when editing the file
| opened from one reference would also show up if you open
| the other hardlinks to the same file contents.
| dylan604 wrote:
| ah, that's where the copy-on-write takes place.
| sometimes, just reading it written by someone else is the
| knock upside the head I need.
| MBCook wrote:
| That's correct.
| zippergz wrote:
| A copy-on-write clone is not the same thing as a hard link.
| rahimnathwani wrote:
| With a hard link, the content of each of the two 'files'
| are identical in perpetuity.
|
| With APFS Clones, the contents start off identical, but can
| be changed independently. If you change a small part of a
| file, those block(s) will need to be created, but the
| existing blocks will continue to be shared with the clone.
| actionfromafar wrote:
| Almost, but the difference is that if you change one of
| hardlinked files, you change "all of them". (It's really
| the same file but with different paths.)
|
| https://hypercritical.co/hyperspace/#how-it-works
|
| APFS apparently allows for creating "link files" which when
| changed, start to diverge.
| alwillis wrote:
| It's not the same because clones can have separate meta
| data; in addition, if a cloned file changes, it stores a
| diff of the changes from the original.
| diggan wrote:
| Right, but the concept is the same, "remove duplicates" in
| order to save storage space. If it's using reflinks,
| softlinks, APFS clones or whatever is more or less an
| implementation detail.
|
| I know that internally it isn't actually "removing" anything,
| and that it uses fancy new technology from Apple. But in
| order to explain the project to strangers, I think my tagline
| gets the point across pretty well.
| CharlesW wrote:
| > _Right, but the concept is the same, "remove duplicates"
| in order to save storage space._
|
| The duplicates aren't removed, though. Nothing changes from
| the POV of users or software that use those files, and you
| can continue to make changes to them independently.
| vultour wrote:
| De-duplication does not mean the duplicates completely
| disappear. If I download a deduplication utility I expect
| it to create some sort of soft/hard link. I definitely
| don't want it to completely remove random files on the
| filesystem, that's just going to wreak havoc.
| sgerenser wrote:
| But it can still wreak havoc if you use hardlinks or
| softlinks, because maybe there was a good reason for
| having a duplicate file! Imagine you have a photo
| "foo.jpg." You make a copy of it "foo2.jpg" You're
| planning on editing that file, but right now, it's a
| duplicate. At this point you run your "deduper" that
| turns the second file into a hardlink. Then a few days
| later you go and edit the file, but wait, the original
| "backup" file is now modified too! You lost your
| original.
|
| That's why Copy-on-write clones are completely different
| than hardlinks.
| dingnuts wrote:
| It does get rid of the duplicate. The duplicate data is
| deleted and a hard link is created in its place.
| zippergz wrote:
| It does not make hard links. It makes copy-on-write clones.
| kemayo wrote:
| No, because it's not actually a hard link -- if you modify
| one of the files they'll diverge.
| 8n4vidtmkvmk wrote:
| Sounds like jdupes with -B
| kemayo wrote:
| Cursory googling suggests that it's using the same
| filesystem feature, yeah.
| dewey wrote:
| The author of the software is a file system enthusiast (so much
| that in the podcast he's a part of they have a dedicated sound
| effect every time "filesystem" comes up), a long time blogger
| and macOS reviewer. So you'll have to see it in that context
| while documenting every bit and the technical details behind it
| is important to him...even if it's longer than a tag line on a
| landing page.
|
| In times where documentation is often an afterthought, and
| technical details get hidden away from users all the time
| ("Ooops some error occurred") this should be celebrated.
| zerd wrote:
| I've been using `fclones` [1] to do this, with `dedupe`, which
| uses reflink/clonefile.
|
| https://github.com/pkolaczk/fclones
| svilen_dobrev wrote:
| $ rmlint -c sh:link -L -y s -p -T duplicates
|
| will produce a script which, if run, will hardlink duplicates
| Analemma_ wrote:
| That's not what this app is doing though. APFS clones are copy-
| on-write pointers to the same data, not hardlinks.
| phiresky wrote:
| If you replace `sh:link` with `sh:clone` instead, it will.
|
| > clone: reflink-capable filesystems only. Try to clone both
| files with the FIDEDUPERANGE ioctl(3p) (or
| BTRFS_IOC_FILE_EXTENT_SAME on older kernels). This will free
| up duplicate extents while preserving the metadata of both.
| Needs at least kernel 4.2.
| wpm wrote:
| On Linux
| jamesfmilne wrote:
| Would be nice if git could make use of this on macOS.
|
| Each worktree I usually work on is several gigs of (mostly)
| identical files.
|
| Unfortunately the source files are often deep in a compressed git
| pack file, so you can't de-duplicate that.
|
| (Of course, the bigger problem is the build artefacts on each
| branch, which are like 12G per debug/release per product, but
| they often diverge for boring reasons.)
| diggan wrote:
| Git is a really poor fit for a project like that since it's
| snapshot based instead of diff based... Luckily, `git lfs`
| exists for working around that, I'm assuming you've already
| investigated that for the large artifacts?
| theamk wrote:
| "git worktree" shares a .git folder between multiple checkouts.
| You'll still have multiple files in working copy, but at least
| the .pack files would be shared. It is great feature, very
| robust, I use it all the time.
|
| There is also ".git/objects/info/alternates", accessed via "--
| shared"/"--reference" option of "git clone", that allows only
| sharing of object storage and not branches etc... but it is has
| caveats, and I've only used it in some special circumstances.
| globular-toast wrote:
| Git de-duplicates everything in its store (in the .git
| directory) already. That's how it can store thousands of
| commits which are snapshots of the entire repository without
| eating up tons of disk space. Why do you have duplicated files
| in the working directory, though?
| andrewla wrote:
| Many comments here offering similar solutions based on hardlinks
| or symlinks.
|
| This uses a specific feature of APFS that allows the creation of
| copy-on-write clones. [1] If a clone is written to, then it is
| copied on demand and the original file is unmodified. This is
| distinct from the behavior of hardlinks or symlinks.
|
| [1] https://en.wikipedia.org/wiki/Apple_File_System#Clones
| bombela wrote:
| Also called reflink on Linux. Which are supported by bcachefs,
| Btrfs, CIFS, NFS 4.2, OCFS2, overlayfs, XFS, and OpenZFS.
|
| Sources: https://unix.stackexchange.com/questions/631237/in-
| linux-whi... https://forums.veeam.com/veeam-backup-
| replication-f2/openzfs...
| radicality wrote:
| Hopefully doesn't have similar bug like jdupes did
|
| https://web.archive.org/web/20210506130542/https://github.co...
| david_allison wrote:
| > Hyperspace can't be installed on "Macintosh HD" because macOS
| version 15 or later is required.
|
| macOS 15 was released in September 2024, this feels far too soon
| to deprecate older versions.
| kstrauser wrote:
| He wanted to write it in Swift 6. Does it support older OS
| versions?
| jjcob wrote:
| Swift 6 is not the problem. It's backward compatible.
|
| The problem is SwiftUI. It's very new, still barely usable on
| the Mac, but they are adding lots of new features every macOS
| release.
|
| If you want to support older versions of macOS you can't use
| the nice stuff they just released. Eg. pointerStyle() is a
| brand new macOS 15 API that is very useful.
| MBCook wrote:
| I can't remember for sure but there may also have been a
| recent file system API he said he needed. Or a bug that he
| had to wait for a fix on.
| therockhead wrote:
| It's been a while since I last looked at SwiftUI on mac, Is
| it really still that bad ?
| jjcob wrote:
| It's not bad, just limited. I think it's getting usable,
| but just barely so.
|
| They are working on it, and making it better every year.
| I've started using it for small projects and it's pretty
| neat how fast you can work with it -- but not everything
| can be done yet.
|
| Since they are still adding pretty basic stuff every
| year, it really hurts if you target older versions.
| AppKit is so mature that for most people it doesn't
| matter if you can't use new features introduced in the
| last 3 years. For SwiftUI it still makes a big
| difference.
| therockhead wrote:
| I wonder why they haven't tried to back port SwiftUI
| improvements/versions to the older OSs. Seems like this
| should have been possible.
| tobr wrote:
| Can it really be seen as deprecating an old version when it's a
| brand new app?
| borland wrote:
| +1. He's not taking anything away because you never had it.
| johnmaguire wrote:
| I'm a bit confused as the Mac App Store says it's over 4
| years old.
| furyofantares wrote:
| The 4+ Age rating is like, who can use the app. Not for 3
| year olds, apparently.
| heywoods wrote:
| Despite knowing this is the correct interpretation, I
| still consistently make the same incorrect interpretation
| as the parent comment. It would be nice if they made this
| more intuitive. Glad I'm not the only one that's made
| that mistake.
| throwanem wrote:
| I feel like that's true for most of the relatively low-
| level disk and partition management tooling. As unpopular
| an opinion as it may lately be around here, I'm enough of
| a pedagogical traditionalist to remain convinced that
| introductory logical volume management is best left at
| least till kindergarten.
| pmarreck wrote:
| The way they specify this has always confused me, because
| I actually care more about how old the app is than what
| age range it's aimed for
| ryandrake wrote:
| Came here to post the same thing. Would love to try the
| application, but I guess not if the developer is deliberately
| excluding my device (which cannot run the bleeding edge OS).
| wpm wrote:
| The developer deliberately chose to write it in Swift 6.
| Apple is the one who deliberately excluded Swift 6 from your
| device.
| ryandrake wrote:
| Yea, too bad :( Everyone involved with macOS and iOS
| development seems to be (intentionally or unintentionally)
| keeping us on the hardware treadmill.
| ForOldHack wrote:
| Expensive. Keeping us on the expensive hardware
| treadmill. My guess is that it cannot be listed in the
| Apple store unless its only for Macs released in the last
| 11 months.
| kstrauser wrote:
| In fairness, I don't think you can describe it as bleeding
| edge when we're 5 months into the annual 12 month upgrade
| cycle. It's recent, but not exactly an early adapter version
| at this point.
| BWStearns wrote:
| I have file A that's in two places and I run this.
|
| I modify A_0. Does this modify A_1 as well or just kind of reify
| the new state of A_0 while leaving A_1 untouched?
| madeofpalk wrote:
| It's called copy-on-write because when you modify A_0, it'll
| make a copy of the file if you write to it but not A_1.
|
| https://en.wikipedia.org/wiki/Copy-on-write#In_computer_stor...
| bsimpson wrote:
| Which means if you actually edited those files, you might
| fill up your HD much more quickly than you expected.
|
| But if you have the same 500MB of node_modules in each of
| your dozen projects, this might actually durably save some
| space.
| _rend wrote:
| > Which means if you actually edited those files, you might
| fill up your HD much more quickly than you expected.
|
| I'm not sure if this is what you intended, but just to be
| sure: writing changes to a cloned file doesn't immediately
| duplicate the entire file again in order to write those
| changes -- they're actually written out-of-line, and the
| identical blocks are only stored once. From [the docs](^1)
| posted in a sibling comment:
|
| > Modifications to the data are written elsewhere, and both
| files continue to share the unmodified blocks. You can use
| this behavior, for example, to reduce storage space
| required for document revisions and copies. The figure
| below shows a file named "My file" and its copy "My file
| copy" that have two blocks in common and one block that
| varies between them. On file systems like HFS Plus, they'd
| each need three on-disk blocks, but on an Apple File System
| volume, the two common blocks are shared.
|
| [^1]: https://developer.apple.com/documentation/foundation/
| file_sy...
| kdmtctl wrote:
| What will happen when the original file will be deleted?
| Often this handled by block reference counters, which just
| would be decreased. How APFS handles this? Is there any
| master/copy concepts or just block references?
| BWStearns wrote:
| Thanks for the clarification. I expected it worked like that
| but couldn't find it spelled out after a brief perusal of the
| docs.
| lgdskhglsa wrote:
| He's using the "copy on write" feature of the file system. So
| it should leave A_1 untouched, creating a new copy for A_0's
| modifications. More info:
| https://developer.apple.com/documentation/foundation/file_sy...
| astennumero wrote:
| What algorithm does the application use to figure out if two
| files are identical? There's a lot of interesting algorithms out
| there. Hashes, bit by bit comparison etc. But these techniques
| have their own disadvantages. What is the best way to do this for
| a large amount of files?
| diegs wrote:
| This reminds me of
| https://en.wikipedia.org/wiki/Venti_(software) which was a
| content-addressible filesystem which used hashes for de-
| duplication. Since the hashes were computed at write time, the
| performance penalty is amortized.
| w4yai wrote:
| I'd hash the first 1024 bytes of all files, and starts from
| there is any collision. That way you don't need to hash the
| whole (large) files, but only those with same hashes.
| kstrauser wrote:
| At that point, why hash them instead of just using the first
| 1024 bytes as-is?
| sedatk wrote:
| Probably because you need to keep a lot of those in memory.
| kstrauser wrote:
| I suspect that a computer with so many files that this
| would be useful probably has a lot of RAM in it, at least
| in the common case.
| sedatk wrote:
| But you need to constantly process them too, not just
| store them.
| borland wrote:
| In order to check if a file is a duplicate of another, you
| need to check it against _every other possible file_. You
| need some kind of "lookup key".
|
| If we took the first 1024 bytes of each file as the lookup
| key, then our key size would be 1024 bytes. If you have 1
| million files on your disk, then that's 128MB of ram just
| to store all the keys. That's not a big deal these days,
| but it's also annoying if you have a bunch of files that
| all start with the same 1024 bytes -- e.g. perhaps all the
| photoshop documents start with the same header. You'd need
| a 2-stage comparison, where you first match the key (1024
| bytes) and then do a full comparison to see if it really
| matches.
|
| Far more efficient - and less work - If you just use a
| SHA256 of the file's contents. That gets you a much smaller
| 32 byte key, and you don't need to bother with 2-stage
| comparisons.
| kstrauser wrote:
| I understand the concept. My main point is that it's
| probably not a huge advantage to store hashes of the
| first 1KB, which requires CPU to calculate, over just the
| raw bytes, which requires storage. There's a tradeoff
| either way.
|
| I don't think it would be far more efficient to do hash
| the entire contents though. If you have a million files
| storing a terabyte of data, the 2 stage comparison would
| read at most 1GB (1 million * 1KB) of data, and less for
| smaller files. If you do a comparison of the whole hashed
| contents, you have to read the entire 1TB. There are a
| hundred confounding variables, for sure. I don't think
| you could confidently estimate which would be more
| efficient without a lot of experimenting.
| philsnow wrote:
| If you're going to keep partial hashes in memory, may as
| well align it on whatever boundary is the minimal
| block/sector size that your drives give back to you.
| Hashing (say) 8kB takes less time than it takes to fetch
| it from SSD (much less disk), so if you only used the
| first 1kB, you'd (eventually) need to re-fetch the same
| block to calculate the hash for the rest of the bytes in
| that block.
|
| ... okay, so as long as you always feed chunks of data
| into your hash in the same deterministic order, it
| doesn't matter for the sake of correctness what that
| order is or even if you process some bytes multiple
| times. You could hash the first 1kB, then the second-
| through-last disk blocks, then the entire first disk
| block again (double-hashing the first 1kB) and it would
| still tell you whether two files are identical.
|
| If you're reading from an SSD and seek times don't
| matter, it's in fact probable that on average a lot of
| files are going to differ near the start and end (file
| formats with a header and/or footer) more than in the
| middle, so maybe a good strategy is to use the first 32k
| and the last 32k, and then if they're still identical,
| continue with the middle blocks.
|
| In memory, per-file, you can keep something like
| - the length - h(block[0:4]) - h(block[0:4] |
| block[-5:]) - h(block[0:4] | block[-5:] |
| block[4:32]) - h(block[0:4] | block[-5:] |
| block[4:128]) - ... - h(block[0:4] |
| block[-5:] | block[4:])
|
| etc, and only calculate the latter partial hashes when
| there is a collision between earlier ones. If you have
| 10M files and none of them have the same length, you
| don't need to hash anything. If you have 10M files and 9M
| of them are copies of each other except for a metadata
| tweak that resides in the last handful of bytes, you
| don't need to read the entirety of all 10M files, just a
| few blocks from each.
|
| A further refinement would be to have per-file-format
| hashing strategies... but then hashes wouldn't be
| comparable between different formats, so if you had 1M
| pngs, 1M zips, and 1M png-but-also-zip quine files, it
| gets weird. Probably not worth it to go down this road.
| smusamashah wrote:
| And why first 1024, can pick from predefined points.
| f1shy wrote:
| Depending on the medium, the penalty of reading single
| bytes in sparse locations could be comparable with
| reading the whole file. Maybe not a big win.
| amelius wrote:
| I suspect that bytes near the end are more likely to be
| different (even if there may be some padding). For example,
| imagine you have several versions of the same document.
|
| Also, use the length of the file for a fast check.
| borland wrote:
| I don't know exactly what Siracusa is doing here, but I can
| take an educated guess:
|
| For each candidate file, you need some "key" that you can use
| to check if another candidate file is the same. There can be
| millions of files so the key needs to be small and quick to
| generate, but at the same time we don't want any false
| positives.
|
| The obvious answer today is a SHA256 hash of the file's
| contents; It's very fast, not too large (32 bytes) and the odds
| of a false positive/collision are low enough that the world
| will end before you ever encounter one. SHA256 is the de-facto
| standard for this kind of thing and I'd be very surprised if
| he'd done anything else.
| MBCook wrote:
| You can start with the size, which is probably really unique.
| That would likely cut down the search space fast.
|
| At that point maybe it's better to just compare byte by byte?
| You'll have to read the whole file to generate the hash and
| if you just compare the bytes there is no chance of hash
| collision no matter how small.
|
| Plus if you find a difference in bytes 1290 you can just stop
| there instead of reading the whole thing to finish the hash.
|
| I don't think John has said exactly how on ATP (his podcast
| with Marco and Casey), but knowing him as a longtime
| listener/reader he's being _very_ careful. And I think he's
| said that on the podcast too.
| unclebucknasty wrote:
| > _which is probably really unique_
|
| Wonder what the distribution is here, on average? I know
| certain file types tend to cluster in specific ranges.
|
| > _maybe it's better to just compare byte by byte? You'll
| have to read the whole file to generate the hash_
|
| Definitely, for comparing any two files. But, if you're
| searching for duplicates across the entire disk, then
| you're theoretically checking each file multiple times, and
| each file is checked against multiple times. So, hashing
| them on first pass could _conceivably_ be more efficient.
|
| > _if you just compare the bytes there is no chance of hash
| collision_
|
| You could then compare hashes and, only in the exceedingly
| rare case of a collision, do a byte-by-byte comparison to
| rule out false positives.
|
| But, if your first optimization (the file size comparison)
| really does dramatically reduce the search space, then
| you'd also dramatically cut down on the number of re-
| comparisons, meaning you may be better off not hashing
| after all.
|
| You could probably run the file size check, then based on
| how many comparisons you'll have to do for each matched
| set, decide whether hashing or byte-by-byte is optimal.
| f1shy wrote:
| I think the prob. is not so low. I remember reading here
| about a person getting a foto of another chat in a chat
| application, which was using sha in the background. I do not
| recall all the details, it is improbable, but possible.
| kittoes wrote:
| The probability is truly, obscenely, low. If you read about
| a collision then you surely weren't reading about SHA256.
|
| https://crypto.stackexchange.com/questions/47809/why-
| havent-...
| sgerenser wrote:
| LOL nope, I seriously doubt that was the result of a SHA256
| collision.
| amelius wrote:
| Or just use whatever algorithm rsync uses.
| rzzzt wrote:
| I experimented with a similar, "hardlink farm"-style approach
| for deduplicated, browseable snapshots. It resulted in a
| small bash script which did the following:
|
| - compute SHA256 hashes for each file on the source side
|
| - copy files which are not already known to a "canonical
| copies" folder on the destination (this step uses the hash
| itself as the file name, which makes it easy to check if I
| had a copy from the same file earlier)
|
| - mirror the source directory structure to the destination
|
| - create hardlinks in the destination directory structure for
| each source file; these should use the original file name but
| point to the canonical copy.
|
| Then I got too scared to actually use it :)
| pmarreck wrote:
| xxHash (or xxh3 which I believe is even faster) is massively
| faster than SHA256 at the cost of security, which is
| unnecessary here.
|
| Of course, engineering being what it is, it's possible that
| only one of these has hardware support and thus might end up
| actually being faster in realtime.
| PhilipRoman wrote:
| Blake3 is my favorite for this kind of thing. It's a
| cryptographic hash (maybe not the world's strongest, but
| considered secure), and also fast enough that in real world
| scenarios it performs just as well as non-crypto hashes
| like xx.
| karparov wrote:
| This can be done much faster and safer.
|
| You can group all files into buckets, and as soon as a bucket
| is empty, discard it. If in the end there are still files in
| the same bucket, they are duplicates.
|
| Initially all files are in the same bucket.
|
| You now iterate over differentiators which given two files
| tell you whether they are _maybe_ equal or _definitely_ not
| equal. They become more and more costly but also more and
| more exact. You run the differentiator on all files in a
| bucket to split the bucket into finer equivalence classes.
|
| For example:
|
| * Differentiator 1 is the file size. It's really cheap, you
| only look at metadata, not the file contents.
|
| * Differentiator 2 can be a hash over the first file block.
| Slower since you need to open every file, but still blazingly
| fast and O(1) in file size.
|
| * Differentiator 3 can be a hash over the whole file. O(N) in
| file size but so precise that if you use a cryptographic hash
| then you're very unlikely to have false positives still.
|
| * Differentiator 4 can compare files bit for bit. Whether
| that is really needed depends on how much you trust collision
| resistance of your chosen hash function. Don't discard this
| though. Git got bitten by this.
| williamsmj wrote:
| Deleted comment based on a misunderstanding.
| Sohcahtoa82 wrote:
| > This tool simply identifies files that point at literally
| the same data on disk because they were duplicated in a copy-
| on-write setting.
|
| You misunderstood the article, as it's basically doing the
| opposite of what you said.
|
| This tool finds duplicate data that is specifically _not_
| duplicated via copy-on-write, and then _turns it into_ a
| copy-on-write copy.
| williamsmj wrote:
| Fair. Deleted.
| ziofill wrote:
| Lovely idea, but way too expensive for me.
| bsimpson wrote:
| Interesting idea, and I like the idea of people getting paid for
| making useful things.
|
| Also, I get a data security itch having a random piece of
| software from the internet scan every file on an HD, particularly
| on a work machine where some lawyers might care about what's
| reading your hard drive. It would be nice if it was open source,
| so you could see what it's doing.
| Nevermark wrote:
| > I like the idea of people getting paid for making useful
| things
|
| > It would be nice if it was open source
|
| > I get a data security itch having a random piece of software
| from the internet scan every file on an HD
|
| With the source it would be easy for others to create freebie
| versions, with or without respecting license restrictions or
| security.
|
| I am not arguing anything, except pondering how software
| economics and security issues are full of unresolved holes, and
| the world isn't getting default fairer or safer.
|
| --
|
| The app was a great idea, indeed. I am now surprised Apple
| doesn't automatically reclaim storage like this. Kudos to the
| author.
| benced wrote:
| You could download the app, disconnect Wifi and Ethernet, run
| the app and the reclamation process, remove the app (remember,
| you have the guarantees of the macOS App Store so no kernel
| extensions etc), and then reconnect.
|
| Edit: this might not work with the payment option actually. I
| don't think you can IAP without the internet.
| diimdeep wrote:
| Requires macOS 15.0 or later. - Oh god, this is so stupid and
| most irritating thing about macOS "Application development".
|
| It is really unfair to call it "software" it is more like "glued
| to recent version of OS ware", meanwhile I can still run .exe
| compiled in 2006, and with wine even on mac or linux.
| kstrauser wrote:
| However, you can't run an app targeted for Windows 11 on
| Windows XP. How unfair is that? Curse you, Microsoft.
| DontBreakAlex wrote:
| Nice, but I'm not getting a subscription for a filesystem
| utility. Had it been a one-time $5 license, I would have bought
| it. At the current price, it's literally cheaper to put files in
| a S3 bucket or outright buy an SSD.
| benced wrote:
| "I don't value software but that's not a respectable opinion so
| I'll launder that opinion via subscriptions"
| DontBreakAlex wrote:
| Well I do value software, I'm paid $86/h to write some! I
| just find that for $20/year or $50 one time, you can get way
| more than 12G of hard drive space. I also don't think that
| this piece of software requires so much maintenance that it
| wouldn't be worth making at a lower price. I'm not saying
| that it's bad software, it's really great, just too
| expensive... Personally, my gut feeling is that the dev would
| have had more sales with a one time $5, and made more money
| overall.
| amelius wrote:
| There are several such tools for Linux, and they are free, so
| maybe just change operating systems.
| augusto-moura wrote:
| I'm pretty sure some of them also work on MacOS. rmlint[1],
| for example can output a script that reflinks duplicates (or
| run any script for both files): rmlint -c
| sh:handler=reflink .
|
| I'm not sure if reflink works out of the box, but you can
| write your own alternative script that just links both files
|
| [1]: https://github.com/sahib/rmlint
| dewey wrote:
| It does not support APFS:
| https://github.com/sahib/rmlint/issues/421
| dewey wrote:
| I don't think either of them supports APFS deduplication
| though?
| botanical76 wrote:
| I can't even find the price anywhere. Do you have to install
| the software to see it?
| sbarre wrote:
| The Mac App Store page has the pricing at the bottom in the
| In-App Purchases section..
|
| TL;DR - $49 for a lifetime subscription, or $19/year or
| $9/month.
|
| It could definitely be easier to find.
| dewey wrote:
| They had long discussions about the pricing on the podcast the
| author is a part of (atp.fm). It went through a few iterations
| of one time purchase, fee for each time you free up space and a
| subscription. There will always be people unhappy about either
| choice.
|
| Edit: Apparently both is possible in the end:
| https://hypercritical.co/hyperspace/#purchase
| mrguyorama wrote:
| Who would be unhappy with $5 owned forever? Other than the
| author of course for making less money.
| criddell wrote:
| People who want the app to stick around and continue to be
| developed.
|
| I worry about that with Procreate. It feels like it's
| priced too low to be sustainable.
| dewey wrote:
| > Two kinds of purchases are possible: one-time purchases and
| subscriptions.
|
| https://hypercritical.co/hyperspace/#purchase
| pmarreck wrote:
| Claude 3.7 just rewrote the whole thing (just based on reading
| the webpage description) as a commandline app for me, so
| there's that.
|
| And because it has no Internet access yet (and because I
| prompted it to use a workaround like this in that
| circumstance), the first thing it asked me to do (after
| hallucinating the functionality first, and then catching
| itself) was run `curl https://hypercritical.co/hyperspace/ |
| sed 's/<[^>]*>//g' | grep -v "^$" | clip`
|
| ("clip" is a bash function I wrote to pipe things onto the
| clipboard or spit them back out in a cross-platform linux/mac
| way) clip() { if command -v pbcopy
| > /dev/null; then [ -t 0 ] && pbpaste || pbcopy;
| else if command -v xclip > /dev/null; then
| [ -t 0 ] && xclip -o -selection clipboard || xclip -selection
| clipboard; else echo "clip function
| error: Neither pbcopy/pbpaste nor xclip are available." >&2;
| return 1; fi; fi }
| jacobp100 wrote:
| The price does seem very high. It's probably a niche product
| and I'd imagine developers are the ones who would see the
| biggest savings. Hopefully it works out for them
| criddell wrote:
| I think it's priced reasonably. A one-time $5 license wouldn't
| be sustainable.
|
| Since it's the kind of thing you will likely only need every
| couple of years, $10 each time feels fair.
|
| If putting all your data online or into an SSD makes more
| sense, then this app isn't for you and that's okay too.
| the_clarence wrote:
| Its interesting how Linux tools are all free when even trivial
| mac tools are being sold. Nothing against someone trying to
| monetize but the linux culture sure is nice!
| dewey wrote:
| It's not that nice to call someone's work they spent months on
| "trivial" without knowing anything about the internals and what
| they ran into.
| MadnessASAP wrote:
| I don't think they meant it in a disparaging way, except
| maybe against Apple. Moreso that typically filesystems that
| can support deduplication include a deduplication tool in
| it's standard suite of FS tools. I too find it odd that Apple
| does not do this.
| eikenberry wrote:
| I don't understand why a simple, closed source de-dup app is at
| the top of the front page with 160+ comments? What is so
| interesting about it? I read the blog and the comments here and I
| still don't get it.
| benced wrote:
| The developer is popular and APFS cloning is genuinely
| technically interesting.
|
| (no, it's not a symlink)
| augusto-moura wrote:
| COW filesystems are older than MacOS, no surprises for me.
| Maybe people aren't that aware of it?
| ForOldHack wrote:
| CoW - Copy on Write. Most probably on older mainframes. (
| Actually newer mainframes ).
|
| "CoW is used as the underlying mechanism in file systems
| like ZFS, Btrfs, ReFS, and Bcachefs"
|
| Obligatory: https://en.wikipedia.org/wiki/Copy-on-write
| therockhead wrote:
| I assume it's because it's from John Siracusa, a long-time Mac
| enthusiast, blogger, and podcaster. If you listen to him on
| ATP, it's hard not to like him, and anything he does is bound
| to get more than the usual upvotes on HN.
| dewey wrote:
| For those mentioning that there's no price listed, it's not that
| easy as in the App Store the price varies by country. You can
| open the App Store link and then look at "In App Purchases"
| though.
|
| For me on the German store it looks like this:
| Unlock for One Year 22,99 EUR Unlock for One Month 9,99
| EUR Lifetime Unlock 59,99 EUR
|
| So it supports both one time purchases and subscriptions.
| Depending on what you prefer. More about that here:
| https://hypercritical.co/hyperspace/#purchase
| twp wrote:
| CLI tool to find duplicate files unbelievably quickly:
|
| https://github.com/twpayne/find-duplicates
| archagon wrote:
| I have to confess: it miffs me that a utility that would normally
| fly completely under the radar is likely to make the creator
| thousands of dollars just because he runs a popular podcast. (Am
| I jealous? Oh yes. But only because I tried to sell similar apps
| in the past and could barely get any downloads no matter how much
| I marketed them. Selling software without an existing network
| seems nigh-on impossible these days.)
|
| Anyway, congrats to Siracusa on the release, great idea, etc.
| etc.
| dewey wrote:
| I can understand your criticism as it's easy to arrive at that
| conclusion (Also a common occurrence when levelsio launches a
| new product, as his Twitter following is large) but it's also
| not fair to discount it as "just because he runs a popular
| podcast".
|
| The author is a "household" name in the macOS / Apple scene for
| a long time even before the podcast. If someone is spending all
| their life blogging about all things Apple on outlets like
| ArsTechnica and is consistently putting out new content on
| podcasts for decades they will naturally have a better
| distribution.
|
| How many years did you spend on building up your marketing and
| distribution reach?
| archagon wrote:
| I know! I actually like him and wish him the best. I just get
| a bit annoyed when one of the ATP folks releases some small
| utility with an unclear niche and then later talks about how
| they've "merely" earned thousands of dollars from it. When I
| was an app developer, I would have counted myself lucky to
| have made just a hundred bucks from a similar release. The
| gang's popularity gives them a distorted view of the market
| sometimes, IMHO.
| karparov wrote:
| TL;DR: He wrote an OS X dedup app which finds files with the same
| contents and tells the filesystem that their contents are
| identical, so it can save space (using copy-on-write features).
|
| He points out its dangerous but could be worth it cause space
| savings.
|
| I wonder if the implementation is using a hash only or does an
| additional step to actually compare the contents to avoid hash
| collision issues.
|
| It's not open source, so we'll never know. He chose a pay model
| instead.
|
| Also, some files might not be identical but have identical
| blocks. Something that could be explored too. Other filesystems
| have that either in their tooling or do it online or both.
| sgt wrote:
| Any way it can be built for 14? It requires macOS 15.
| rusinov wrote:
| John is a the legend.
| re wrote:
| On a related note: are there any utilities that can measure disk
| usage of a folder taking (APFS) cloned files into account?
| Take8435 wrote:
| Downloaded. Ran it. Tells me "900" files can be cleaned. No
| summary, no list. But I was at least asked to buy the app. Why
| would I buy the app if I have no idea if it'll help?
| crb wrote:
| From the FAQ:
|
| > If some eligible files were found, the amount of disk space
| that can be reclaimed is shown next to the "Potential Savings"
| label. To proceed any further, you will have to make a
| purchase. Once the app's full functionality is unlocked, a
| "Review Files" button will become available after a successful
| scan. This will open the Review Window.
|
| I half remember this being discussed on ATP; the logic being
| that if you have the list of files, you will just go and de-
| dupe them yourself.
| AyyEye wrote:
| > the logic being that if you have the list of files, you
| will just go and de-dupe them yourself.
|
| If you can do that, you can check for duplicates yourself
| anyway. It's not like there aren't already dozens of great
| apps that dedupe.
| eps wrote:
| This reminds me -
|
| Back in the MS-DOS days, when the RAM was sparse, there was a
| class of so-called "memory optimization" programs. They all
| inevitably found at least few KB to be reclaimed through their
| magic even if the same optimizer was run back to back with
| itself and allowed to "optimize" things. That is, on each run
| they always find extra memory to be freed. They ultimately did
| nothing but claim they did the work. Must've sold pretty well
| nonetheless.
| galaxyLogic wrote:
| On Windows there is "Dev Drive" which I believe does a similar
| "copy-on-write" -thing.
|
| If it works it's a no-brainer so why isn't it the default?
|
| https://learn.microsoft.com/en-us/windows/dev-drive/#dev-dri...
| siranachronist wrote:
| requires refs, which still isnt supported on the system drive
| on windows, iirc
| o10449366 wrote:
| What would an equivalent tool be on linux? I guess it depends on
| the filesystem?
| JackYoustra wrote:
| What's the difference with jdupes?
| mattgreenrocks wrote:
| What jumped out to me:
|
| > Finally, at WWDC 2017, Apple announced Apple File System (APFS)
| for macOS (after secretly test-converting everyone's iPhones to
| APFS and then reverting them back to HFS+ as part of an earlier
| iOS 10.x update in one of the most audacious technological
| gambits in history).
|
| How can you revert a FS change like that if it goes south? You'd
| certainly exercise the code well but also it seems like you
| wouldn't be able to back out of it if something was wrong.
| quux wrote:
| IIRC migrating from HFS+ to APFS can be done without touching
| any of the data blocks and a parallel set of APFS metadata
| blocks and superblocks are written to disk. In the test
| migrations Apple did the entire migration including generating
| APFS superblocks but held short of committing the change that
| would permanently replace the HFS+ superblocks with APFS ones.
| To roll back they "just" needed clean up all the generated APFS
| superblocks and metadata blocks.
| MBCook wrote:
| I think that's what they did too. And it was a genius way of
| testing. They did it more than once too I think.
|
| Run the real thing, throw away the results, report all
| problems back to the mothership so you have a high chance of
| catching them all even on their multi-hundred million device
| fleet.
| k1t wrote:
| Yes, that's how it's described in this talk transcript:
|
| https://asciiwwdc.com/2017/sessions/715
|
| _Let's say for simplification we have three metadata regions
| that report all the entirety of what the file system might be
| tracking, things like file names, time stamps, where the
| blocks actually live on disk, and that we also have two
| regions labeled file data, and if you recall during the
| conversion process the goal is to only replace the metadata
| and not touch the file data._
|
| _We want that to stay exactly where it is as if nothing had
| happened to it._
|
| _So the first thing that we're going to do is identify
| exactly where the metadata is, and as we're walking through
| it we'll start writing it into the free space of the HFS+
| volume._
|
| _And what this gives us is crash protection and the ability
| to recover in the event that conversion doesn't actually
| succeed._
|
| _Now the metadata is identified._
|
| _We'll then start to write it out to disk, and at this
| point, if we were doing a dry-run conversion, we'd end here._
|
| _If we're completing the process, we will write the new
| superblock on top of the old one, and now we have an APFS
| volume._
___________________________________________________________________
(page generated 2025-02-25 23:00 UTC)