[HN Gopher] Hyperspace
       ___________________________________________________________________
        
       Hyperspace
        
       Author : tobr
       Score  : 498 points
       Date   : 2025-02-25 15:51 UTC (7 hours ago)
        
 (HTM) web link (hypercritical.co)
 (TXT) w3m dump (hypercritical.co)
        
       | gnomesteel wrote:
       | I don't need this,storage is cheap, but I'm glad it exists.
        
         | ttoinou wrote:
         | Storage isnt cheap on macs though. One has to pay 2k USD to get
         | 8 TB SSD
        
           | bob1029 wrote:
           | Storage comes in many forms. It doesn't need to be soldered
           | to the mainboard to satisfy most use cases.
        
             | ttoinou wrote:
             | But cleaning / making space on your main soldered drive
             | where the OS is is quite important
        
       | NoToP wrote:
       | The fact that copying doesn't copy seems dangerous. Like what if
       | I wanted to copy for the purpose of modifying the file while
       | retaining the original. A trivial example of this might be I have
       | a meme template and I want to write text in it while still
       | keeping a blank copy of the template.
       | 
       | There's a place for alias file pointers, but lying to the user
       | and pretending like an alias is a copy is bound to lead to
       | unintended and confusing results
        
         | timabdulla wrote:
         | It's copy on write.
        
         | herrkanin wrote:
         | It's not a symbolic link - it copies on modification. No need
         | to worry!
        
         | hutattedonmyarm wrote:
         | It's Copy On Write. When you modify either one it does get
         | turned into an actual copy
        
         | IsTom wrote:
         | Copy-on-write means that it performs copy only when you make
         | the first change (and only copies part that changes, rest is
         | used from the original file), until then copying is free.
        
           | mlhpdx wrote:
           | Is it file level or block level copy? The latter, I hope.
           | 
           | Update: whoops, missed it in your comment. Block (changed
           | bytes) level.
        
         | pca006132 wrote:
         | CoW is not aliasing. It will perform the actual copying when
         | you modify the file content.
        
         | parwej wrote:
         | Psj lagi Eu
        
       | petercooper wrote:
       | I love the model of it being free to scan and see if you'd get
       | any benefit, then paying for the actual results. I, too, am a
       | packrat, ran it, and got 7GB to reclaim. Not quite worth the
       | squeeze for me, but I appreciate it existing!
        
         | sejje wrote:
         | I also really like this pricing model.
         | 
         | I wish it were more obvious how to do it with other software.
         | Often there's a learning curve in the way before you can see
         | the value.
        
         | MBCook wrote:
         | He's talked about it on the podcast he was on. So many users
         | would buy this, run it once, then save a few gigs and be done.
         | So a subscription didn't make a ton of sense.
         | 
         | After all how many perfect duplicate files do you probably
         | create a month accidentally?
         | 
         | There's a subscription or buy forever option for people who
         | think that would actually be quite useful to them. But for a
         | ton of people a one time IAP that gives them a limited amount
         | of time to use the program really does make a lot of sense.
         | 
         | And you can always rerun it for free to see if you have enough
         | stuff worth paying for again.
        
         | jedbrooke wrote:
         | it's very refreshing compared to those "free trials" you have
         | to remember to cancel (pro tip: use virtual credit cards which
         | you can lock for those so if you forget to cancel the charges
         | are blocked)
         | 
         | however has anyone been able to find out from the website how
         | much the license actually costs?
        
       | Analemma_ wrote:
       | In earlier episodes of ATP when they were musing on possible
       | names, one listener suggested the frankly amazing "Dupe Nukem". I
       | get that this is a potential IP problem, which is why John didn't
       | use it, but surely Duke Nukem is not a zealously-defended brand
       | in 2025. I think interest in that particular name has been stone
       | dead for a while now.
        
         | InsideOutSanta wrote:
         | It's a genius name, but Gearbox owns Duke Nukem. They're not
         | exactly dormant. Duke Nukem as a franchise made over a billion
         | in revenue. In 2023, Zen released a licensed Duke Nukem pinball
         | table, so there is at least some ongoing interest in the
         | franchise.
         | 
         | I probably wouldn't have risked it, either.
        
         | mzajc wrote:
         | Reminds me of Avira's Luke Filewalker - I wonder if they needed
         | any special agreement with Lucasfilm/Disney. I couldn't find
         | any info on it, and their website doesn't mention Star Wars at
         | all.
        
       | siranachronist wrote:
       | https://github.com/pkolaczk/fclones can do the same thing, and
       | it's perfectly free and open source. terminal based though
        
         | PenguinRevolver wrote:
         | brew install fclones
         | 
         | Thanks for the recommendation! Just installed it via homebrew.
        
         | CharlesW wrote:
         | _[I was wrong, see below.--cw]_ It doesn 't do the same thing.
         | An APFS clone/copy-on-write clone is not the same as a hard or
         | soft link. https://eclecticlight.co/2019/01/05/aliases-hard-
         | links-symli...
        
           | PenguinRevolver wrote:
           | Your source points out that:
           | 
           | < _You can also create [APFS (copy on write) clones] in
           | Terminal using the command `cp -c oldfilename newfilename`
           | where the c option requires cloning rather than a regular
           | copy._
           | 
           | `fclones dedupe` uses the same command[1]:                 if
           | cfg!(target_os = "macos") {           result.push(format!("cp
           | -c {target} {link}"));
           | 
           | [1] https://github.com/pkolaczk/fclones/blob/555cde08fde4e700
           | b25...
        
             | CharlesW wrote:
             | I stand corrected, thank you!
        
         | rahimnathwani wrote:
         | Hyperspace said I can save 10GB.
         | 
         | But then I ran this command and saved over 20GB:
         | brew install fclones       cd ~       fclones group . | fclones
         | dedupe
         | 
         | I've used fclones before in the default mode (create hard
         | links) but this is the first time I've run it at the top level
         | of my home folder, in dedupe mode (i.e. using APFS clones).
         | Fingers crossed it didn't wreck anything.
        
         | diimdeep wrote:
         | Nice, also compression at file system level can save a lot of
         | space and with current CPU speeds is completely transparent. It
         | is feature from HFS+ that is still works in APFS, but is not
         | officially supported anymore, what is wrong with you Apple ?
         | 
         | This tool to enable compression is free and open source
         | 
         | https://github.com/RJVB/afsctool
         | 
         | Also note about APFS vs HFS+, if you use HDD e.g. as backup
         | media for Time Machine, HFS+ is must have over APFS as it is
         | optimised only for SSD (random access).
         | 
         | https://bombich.com/blog/2019/09/12/analysis-apfs-enumeratio...
         | 
         | https://larryjordan.com/blog/apfs-is-not-yet-ready-for-tradi...
         | 
         | Not so smart Time Machine setup utility forcefully re-creates
         | APFS on a HDD media, so you have to manually create HFS+ volume
         | (e.g. with Disk Utily) and then use terminal command to add
         | this volume as TM destination
         | 
         | `sudo tmutil setdestination /Volumes/TM07T`
        
       | herrkanin wrote:
       | As a web dev, it's been fun listening to Accidental Tech Podcast
       | where Siracusa has been talking (or ranting) about the ins and
       | outs of developing modern mac apps in Swift and SwiftUI.
        
         | Analemma_ wrote:
         | The part where he said making a large table in HTML and
         | rendering it with a web view was orders of magnitude faster
         | than using the SwiftUI native platform controls made me bash my
         | head against my desk a couple times. What are we doing here,
         | Apple.
        
           | mohsen1 wrote:
           | Hacker News loves to hate Electron apps. In my experience
           | ChatGPT on Mac (which I assume is fully native) is nearly
           | impossible to use because I have a lot of large chats in my
           | history but the website works much better and faster. ChatGPT
           | website packed in Electron would've been much better. In
           | fact, I am using a Chrome "PWA App" for ChatGPT now instead
           | of the native app.
        
             | RandomDistort wrote:
             | Someone more experienced that me could probably comment on
             | this more, but theoretically is it possible for Electron
             | production builds to become more efficient by having a much
             | longer build process and stripping out all the unnecessary
             | parts of Chromium?
        
             | wat10000 wrote:
             | It's possible to make bad apps with anything. The
             | difference is that, as far as I can tell, it's not possible
             | to make good apps with Electron.
        
             | avtar wrote:
             | > In my experience ChatGPT on Mac (which I assume is fully
             | native)
             | 
             | If we are to believe ChatGPT itself: "The ChatGPT macOS
             | desktop app is built using Electron, which means it is
             | primarily written in JavaScript, HTML, and CSS"
        
           | spiderfarmer wrote:
           | As a web dev I must say that this segment made me happy and
           | thankful for the browser team that really knows how to
           | optimize.
        
           | megaman821 wrote:
           | I wish there were modern benchmarks against browser engines.
           | A long time ago native apps were much faster at rendering UI
           | than the browser, but that may performance rewrites ago, so I
           | wonder how browsers perform now.
        
           | airstrike wrote:
           | Shoutout to iced, my favorite GUI toolkit, which isn't even
           | in 1.0 yet but can do that with ease and faster than anything
           | I've ever seen: https://github.com/iced-rs/iced
           | 
           | https://github.com/tarkah/iced_table is a third-party widget
           | for tables, but you can roll out your own or use other
           | alternatives too
           | 
           | It's in Rust, not Swift, but I think switching from the
           | latter to the former is easier than when moving away from
           | many other popular languages.
        
           | BobAliceInATree wrote:
           | SwiftUI is a joke when it comes to performance. Even Marco's
           | Overcast stutters when displaying a table of a dozen rows (of
           | equal height).
           | 
           | That being said, it's not quite an apples to apples
           | comparison, because SwiftUI or UIKit can work with basically
           | an infinite number of rows, whereas HTML will eventually get
           | to a point where it won't load.
        
             | wpm wrote:
             | I love the new Overcast's habit of mistaking my scroll
             | gestures for taps when browsing the sections of a podcast.
        
       | divan wrote:
       | What are the potential risks or problems of such conversion of
       | duplicates into APFS clones?
        
         | captn3m0 wrote:
         | The linked docs cover this in detail.
        
       | pca006132 wrote:
       | Is this the dedup function provided by other FS?
        
         | coder543 wrote:
         | I think the term to search for is reflink. Btrfs is one
         | example: https://btrfs.readthedocs.io/en/latest/Reflink.html
         | 
         | Like with Hyperspace, you would need to use a tool that can
         | identify which files are duplicates, and then convert them into
         | reflinks.
        
           | pca006132 wrote:
           | I thought reflink is provided by the underlying FS, and
           | Hyperspace is a dedup tool that finds the duplicates.
        
             | coder543 wrote:
             | Yes. Hyperspace is finding the identical files and then
             | replacing all but one copy with a reflink copy using the
             | filesystem's reflink functionality.
             | 
             | When you asked about the filesystem, I assumed you were
             | asking about which filesystem feature was being used, since
             | hyperspace itself is not provided by the filesystem.
             | 
             | Someone else mentioned[0] fclones, which can do this task
             | of finding and replacing duplicates with reflinks on more
             | than just macOS, if you were looking for a userspace tool.
             | 
             | [0]: https://news.ycombinator.com/item?id=43173713
        
             | MBCook wrote:
             | Hyperspace uses built in APFS features, it just applies
             | them to existing files.
             | 
             | You only get CoW on APFS if you copy a file with certain
             | APIs or tools.
             | 
             | If you have a program that does it manually, you copied a
             | duplicate to somewhere on your desk from some other source,
             | or your files already existed on the file system when you
             | converted to APFS because you've been carrying them for a
             | long time then you'd have duplicates.
             | 
             | APFS doesn't _look_ for duplicates at any point. It just
             | keeps track of those that it knows are duplicates because
             | of copy operations.
        
             | zerd wrote:
             | You can do the same with `cp -c` on macOS, or `cp
             | --reflink=always` on Linux, if your filesystem supports it.
        
         | kevincox wrote:
         | Yes, Linux has a systemcall to do this for any filesystem with
         | reflink support (and it is safe and atomic). You need a
         | "driver" program to identify duplicates but there are a handful
         | out there. I've used https://github.com/markfasheh/duperemove
         | and was very pleased with how it worked.
        
       | exitb wrote:
       | What are examples of files that make up the "dozens of gigabytes"
       | of duplicated data?
        
         | xnx wrote:
         | There are some CUDA files that every local AI app install that
         | take multiple GB.
        
           | wruza wrote:
           | Also models that various AI libraries and plugins love to
           | autodownload into custom locations. Python folks definitely
           | need to learn caching, symlinks, asking a user where to store
           | data, or at least logging where they actually do it.
        
         | butlike wrote:
         | audio files; renders, etc.
        
         | password4321 wrote:
         | iMovie used to copy video files etc. into its "library".
        
         | zerd wrote:
         | .terraform, rust target directory, node_modules.
        
       | jarbus wrote:
       | In my experience, Macs use up a ridiculous amount of "System"
       | storage for no reason that users can't delete. I've grown tired
       | of family members asking me to help them free up storage that I
       | can't even find. That's the major issue from what I've seen;
       | unless this app prevents apple deliberately eating up 50%+ of the
       | storage space of a machine, this doesn't do much for the people I
       | know.
        
         | ezfe wrote:
         | There's no magic around it, macOS just doesn't do a good job
         | explaining it using the built in tools. Just use Daisy Disk or
         | something. It's all there and can be examined.
        
         | p_ing wrote:
         | These are often Time Machine snapshots. Nuking those can free
         | up quite a bit of space.                   sudo tmutil
         | listlocalsnapshots /         sudo tmutil deletelocalsnapshots
         | <date_value_of_snapshot>
        
           | Jaxan wrote:
           | Even without time machine there are loads of storage spent on
           | "system". Especially now with the apple intelligence (even
           | when turned off).
        
             | p_ing wrote:
             | Apple "Intelligence" gets its own category in 15.3.1.
        
       | sir_eliah wrote:
       | There's a cross-platform open-source version of this program:
       | https://github.com/qarmin/czkawka
        
         | spiderfarmer wrote:
         | That's not remotely comparable.
        
         | nulld3v wrote:
         | I don't think czkawa supports deduplication via reflink so it's
         | not exactly the same thing. fclones as linked by another user
         | is more similar: https://news.ycombinator.com/item?id=43173713
        
       | bhouston wrote:
       | I gave it a try on my massive folder of NodeJS projects but it
       | only found 1GB of savings on a 8.1GB folder.
       | 
       | I then tried again including my user home folder (731K files,
       | 127K folders, 2755 eligible files) to hopefully catch more
       | savings and I only ended up at 1.3GB of savings (300MB more than
       | just what was in the NodeJS folders.)
       | 
       | I tried to scan System and Library but it refused to do so
       | because of permission issues.
       | 
       | I think the fact that I use pnpm for my package manager has made
       | my disk space usage already pretty near optimal.
       | 
       | Oh well. Neat idea. But the current price is too high to justify
       | this. Also I would want it as a background process that runs once
       | a month or something.
        
         | lou1306 wrote:
         | > it only found 1GB of savings on a 8.1GB folder.
         | 
         | You "only" found that 12% of the space you are using is wasted?
         | Am I reading this right?
        
           | warkdarrior wrote:
           | The relevant number (missing from above) is the total amount
           | of space on that storage device. If it saves 1GB on a 8TB
           | drive, it's not a big win.
        
             | jy14898 wrote:
             | If it saved 8.1GB, by your measure it'd also not be a big
             | win?
        
               | horsawlarway wrote:
               | This is basically only a win on macOS, and only because
               | Apple charges through the nose for disk space.
               | 
               | Ex - On my non-apple machines, 8GB is trivial. I load
               | them up with the astoundingly cheap NVMe drives in the
               | multiple terabyte range (2TB for ~$100, 4TB for ~$250)
               | and I have a cheap NAS.
               | 
               | So that "big win" is roughly 40 cents of hardware costs
               | on the direct laptop hardware. Hardly worth the time and
               | effort involved, even if the risk is zero (and I don't
               | trust it to be zero).
               | 
               | If it's just "storage" and I don't need it fast (the
               | perfect case for this type of optimization) I throw it on
               | my NAS where it's cheaper still... Ex - it's not 40 cents
               | saved, it's ~10.
               | 
               | ---
               | 
               | At least for me, 8GB is no longer much of a win. It's a
               | rounding error on the last LLM model I downloaded.
               | 
               | And I'd suggest that basically anyone who has the ability
               | to not buy extortionately priced drives soldered onto a
               | mainboard is not really winning much here either.
               | 
               | I picked up a quarter off the ground on my walk last
               | night. That's a bigger win.
        
               | borland wrote:
               | > This is basically only a win on macOS, and only because
               | Apple charges through the nose for disk space
               | 
               | You do realize that this software is only available on
               | macOS, and only works because of Apple's APFS filesystem?
               | You're essentially complaining that medicine is only a
               | win for people who are sick.
        
               | horsawlarway wrote:
               | > and only works because of Apple's APFS filesystem
               | 
               | There are lots of other file systems that support this
               | kind of deduplication...
               | 
               | Like ZFS that the author of the software explicitly
               | mentions in his write up
               | https://www.truenas.com/docs/references/zfsdeduplication/
               | 
               | Or Btrfs ex: https://kb.synology.com/en-
               | id/DSM/help/DSM/StorageManager/vo...
               | 
               | Or hell, even NTFS: https://learn.microsoft.com/en-
               | us/windows-server/storage/dat...
               | 
               | This is NOT a novel or new feature in filesystems...
               | Basically any CoW file system will do it, and lots of
               | other filesystems have hacks built on top to support this
               | kinds of feature.
               | 
               | ---
               | 
               | My point is that "people are only sick" because the
               | company is pricing storage outrageously. Not that Apple
               | is the only offender in this space - but man are they the
               | most egregious.
        
             | oneeyedpigeon wrote:
             | It should be proportional to the total _used_ space, not
             | the space available. The previous commenter said it was a 1
             | GB savings from ~8 GB of used space; that 's equally
             | significant whether it happens on a 10 GB drive or a 10 TB
             | one.
        
               | horsawlarway wrote:
               | He picked node_modules because it's highly likely to
               | encounter redundant files there.
               | 
               | If you read the rest of the comment he only saved another
               | 30% running his entire user home directory through it.
               | 
               | So this is not a linear trend based on space used.
        
               | borland wrote:
               | He "only" saved 30%? That's amazing. I really doubt most
               | people are going to get anywhere near that.
               | 
               | When I run it on my home folder (Roughly 500GB of data) I
               | find 124 MB of duplicated files.
               | 
               | At this stage I'd like it to tell me what those files are
               | - The dupes are probably dumb ones that I can simply go
               | delete by hand, but I can understand why he'd want people
               | to pay up first, as by simply telling me what the dupes
               | are he's proved the app's value :-)
        
               | wlesieutre wrote:
               | Another 30% more than the 1GB saved in node modules, for
               | 1.3GB total. Not 30% of total disk space.
               | 
               | For reference, from the comment they're talking about:
               | 
               |  _> I then tried again including my user home folder
               | (731K files, 127K folders, 2755 eligible files) to
               | hopefully catch more savings and I only ended up at 1.3GB
               | of savings (300MB more than just what was in the NodeJS
               | folders.)_
        
               | bhouston wrote:
               | > He "only" saved 30%? That's amazing. I really doubt
               | most people are going to get anywhere near that.
               | 
               | You misunderstood my comment. I ran it on my home folder
               | which contains 165GB of data and it found 1.3GB is
               | savings. That isn't significant for me to care about
               | because I currently have 225GB free of my 512GB drive.
               | 
               | BTW I highly recommend the free "disk-inventory-x"
               | utility for MacOS space management.
        
             | rconti wrote:
             | Absolutely, 100% backwards. The tool cannot save space from
             | disk space that is not scanned. Your "not a big win"
             | comment assumes that there is _no space left to be
             | reclaimed on the rest of the disk_. Or that the disk is not
             | empty, or that the rest of the disk can 't be reclaimed at
             | an even higher rate.
        
           | bhouston wrote:
           | I have a 512GB drive in my MacBook Air M3 with 225GB free.
           | Saving 1GB is 0.5% of my total free space, and it is
           | definitely "below my line." It is a neat tool still in
           | concept.
           | 
           | When I ran it on my home folder with 165GB of data it only
           | found 1.3GB of savings. This isn't that significant to me and
           | it isn't really worth paying for.
           | 
           | BTW I highly recommend the free "disk-inventory-x" utility
           | for MacOS space management.
        
             | timerol wrote:
             | Your original comment did not mention that your home folder
             | was 165 GB, which is extremely relevant here
        
         | zamalek wrote:
         | pnpm tries to be a drop-in replacement for npm, and dedupes
         | automatically.
        
           | diggan wrote:
           | > pnpm tries to be a drop-in replacement for npm
           | 
           | True
           | 
           | > and dedupes automatically
           | 
           | Also true.
           | 
           | But the way you put them after each other, makes it sound
           | like npm does de-duplication, and since pnpm tries to be a
           | drop-in replacement for npm, so does pnpm.
           | 
           | So for clarification: npm doesn't do de-duplication across
           | all your projects, and that in particular was of the more
           | useful features that pnpm brought to the ecosystem when it
           | first arrived.
        
           | MrJohz wrote:
           | More importantly, pnpm installs packages as symlinks, so the
           | deduping is rather more effective. I believe it also tries to
           | mirror the NPM folder structure and style of deduping as
           | well, but if you have two of the same package installed
           | anywhere on your system, pnpm will only need to download and
           | save one copy of that package.
        
             | spankalee wrote:
             | npm's --install-strategy=linked flag is supposed to do this
             | too, but it has been broken in several ways for years.
        
         | modzu wrote:
         | whats the price? doesnt seem to be published anywhere
        
           | scblock wrote:
           | It's on the Mac App Store so you'll find the pricing there.
           | Looks like $10 for one month (one time use maybe?), $20 for a
           | year, $50 lifetime.
        
             | diggan wrote:
             | Even if I have both a Mac and iPhone, but happen to use my
             | Linux computer right now, it seems like the store page
             | (https://apps.apple.com/us/app/hyperspace-reclaim-disk-
             | space/...) is not showing the price, probably because I'm
             | not actively on a Apple device? Seems like a poor UX even
             | for us Mac users.
        
               | oneeyedpigeon wrote:
               | I see it on my android phone. It's a free app but the
               | subs are an in-app purchase so you need to hunt that
               | section down.
        
               | pimlottc wrote:
               | It's buried under a drop-down in the "Information"
               | section, under "In-App Purchases". I agree, it's not the
               | greatest.
        
               | diggan wrote:
               | Ah, you're absolutely right, missed that completely.
               | Buried at the bottom of the page :) Thanks for pointing
               | it out.
        
               | MBCook wrote:
               | It's a side effect of the terrible store design.
               | 
               | It's a free app because you don't have to buy it to run
               | it. It will tell you how much space it can save you for
               | free. So you don't have to waste $20 to find out it only
               | would've been 2kb.
               | 
               | But that means the parts you actually have to buy are in
               | app purchases, which are always hidden on the store
               | pages.
        
           | piqufoh wrote:
           | PS9.99 a month, PS19.99 for one year, PS49.99 for life (app
           | store purchase prices visible once you've scanned a
           | directory).
        
         | p_ing wrote:
         | > I tried to scan System and Library but it refused to do so
         | because of permission issues.
         | 
         | macOS has a sealed volume which is why you're seeing permission
         | errors.
         | 
         | https://support.apple.com/guide/security/signed-system-volum...
        
           | bhouston wrote:
           | For some reason "disk-inventory-x" will scan those folders. I
           | used that amazing tool to prune left over Unreal Engine files
           | and docker caches when they put them not in my home folder.
           | The tool asks for a ton of permissions when you run it in
           | order to do the scan though, which is a bit annoying.
        
             | alwillis wrote:
             | It's not obvious but the system folder is on a separate,
             | secure volume; the Finder does some trickery to make the
             | system volume and the data volume appear as one.
             | 
             | In general, you don't want to mess with that.
        
         | kdmtctl wrote:
         | Didn't have time to try it myself, but there is an option for
         | minimal files size to consider clearly seen on the AppStore
         | screenshot. I suppose it was introduced to minimize comparison
         | buffers. It is possible that node modules are sliding under
         | this size and wasn't considered.
        
       | jbverschoor wrote:
       | Does it preserve all metadata, extended attributes, and alternate
       | streams/named forks?
        
         | atommclain wrote:
         | He spoke to this on No Longer ery Good, episode 626 of The
         | Accidental Tech Podcast. Time stamp ~1:32:30
         | 
         | It tries, but there are some things it can't perfectly preserve
         | like the last access time. Instances where it can't duplicate
         | certain types of extended attributes or ownership permissions
         | it will not perform the operation.
         | 
         | https://podcasts.apple.com/podcast/id617416468?i=10006919599...
        
           | jbverschoor wrote:
           | Well, the FAQ also states that people should notify if you're
           | missing attributes, so it really sounds like it's a
           | predefined list instead of just enumeration through
           | everything.
           | 
           | No word about alternate data streams. I'll pass for now..
           | Although it's nice to see how much duplicates you have
        
         | criddell wrote:
         | The FAQ talks about this a little:
         | 
         | Q: Does Hyperspace preserve file metadata during reclamation?
         | 
         | A: When Hyperspace replaces a file with a space-saving clone,
         | it attempts to preserve all metadata associated with that file.
         | This includes the creation date, modification date,
         | permissions, ownership, Finder labels, Finder comments, whether
         | or not the file name extension is visible, and even resource
         | forks. If the attempt to preserve any of these piece of
         | metadata fails, then the file is not replaced.
         | 
         | If you find some piece of file metadata that is not preserved,
         | please let us know.
         | 
         | Q: How does Hyperspace handle resource forks?
         | 
         | A: Hyperspace considers the contents of a file's resource fork
         | to be part of the file's data. Two files are considered
         | identical only if their data and resource forks are identical
         | to each other.
         | 
         | When a file is replaced by a space-saving clone during
         | reclamation, its resource fork is preserved.
        
       | bob1029 wrote:
       | > There is no way for Hyperspace to cooperate with all other
       | applications and macOS itself to coordinate a "safe" time for
       | those files to be replaced, nor is there a way for Hyperspace for
       | forcibly take exclusive control of those files.
       | 
       | This got me wondering why the filesystem itself doesn't run a
       | similar kind of deduplication process in the background.
       | Presumably, it is at a level of abstraction where it could safely
       | manage these concerns. What could be the downsides of having this
       | happen automatically within APFS?
        
         | pizzafeelsright wrote:
         | data loss is the largest concern
         | 
         | I still do not trust de-duplication software.
        
           | dylan604 wrote:
           | Even using sha-256 or greater type of hashing, I'd still have
           | concerns about letting a system make deletion decisions
           | without my involvement. I've even been part of de-dupe
           | efforts, so maybe my hesitation is just because I wrote some
           | of the code and I know I'm not perfect in my coding or even
           | my algo decision trees. I know that any mistake I made would
           | not be of malice but just ignorance or other stupid mistake.
           | 
           | I've done the entire compare every file via hashing and then
           | log each of the matches for humans to compare, but never has
           | any of that ever been allowed to mv/rm/link -s anything. I
           | feel my imposter syndrome in this regard is not a bad thing.
        
             | borland wrote:
             | Now you understand why this app costs more than 2x the
             | price of alternatives such as diskDedupe.
             | 
             | Any halfway-competent developer can write some code that
             | does a SHA256 hash of all your files and uses the Apple
             | filesystem API's to replace duplicates with shared-clones.
             | I know swift, I could probably do it in an hour or two.
             | Should you trust my bodgy quick script? Heck no.
             | 
             | The author - John Siracusa - has been a professional
             | programmer for decades and is an exceedingly meticulous
             | kind of person. I've been listening to the ATP podcast
             | where they've talked about it, and the app has undergone an
             | absolute ton of testing. Look at the guardrails on the FAQ
             | page https://hypercritical.co/hyperspace/ for an example of
             | some of the extra steps the app takes to keep things safe.
             | Plus you can review all the proposed file changes before
             | you touch anything.
             | 
             | You're not paying for the functionality, but rather the
             | care and safety that goes around it. Personally, I would
             | trust this app over just about any other on the mac.
        
               | btilly wrote:
               | More than TeX or SQLite?
        
             | criddell wrote:
             | > I'd still have concerns about letting a system make
             | deletion decisions without my involvement
             | 
             | You are involved. You see the list of duplicates and can
             | review them as carefully as you'd like before hitting the
             | button to write the changes.
        
               | dylan604 wrote:
               | Yeah, the lack of involvement was more in response to ZFS
               | doing this not this app. I could have crossed the streams
               | with other threads about ZFS if it's not directly in this
               | thread
        
           | axus wrote:
           | Question for the developer: what's your liability if user
           | files are corrupted?
        
             | codazoda wrote:
             | Most EULA's would disclaim liability for data loss and
             | suggest users keep good backups. I haven't read a EULA for
             | a long time, but I think most of them do so.
        
               | borland wrote:
               | I can't find a specific EULA or disclaimer for the
               | Hyperspace app, but given that the EULA's for major
               | things like Microsoft Office basically say "we offer you
               | no warranty or recourse no matter what this software
               | does" I would hardly expect an indie app to offer
               | anything like that
        
         | albertzeyer wrote:
         | > This got me wondering why the filesystem itself doesn't run a
         | similar kind of deduplication process in the background.
         | 
         | I think that ZFS actually does this.
         | https://www.truenas.com/docs/references/zfsdeduplication/
        
           | pmarreck wrote:
           | It's considered an "expensive" configuration that is only
           | good for certain use-cases, though, due to its memory
           | requirements.
        
             | abrookewood wrote:
             | Yes true, but that page also covers some recent
             | improvements to de-duplication that might assist.
        
         | p_ing wrote:
         | Windows Server does this for NTFS and ReFS volumes. I used it
         | quite a bit on ReFS w/ Hyper-V VMs and it worked _wonders_. Cut
         | my storage usage down by ~45% with a majority of Windows Server
         | VMs running a mix of 2016 /2019 at the time.
        
           | borland wrote:
           | Yep. At a previous job we had a file server that we published
           | Windows build output to.
           | 
           | There were about 1000 copies of the same pre-requisite .NET
           | and VC++ runtimes (each build had one) and we only paid for
           | the cost of storing it once. It was great.
           | 
           | It is worth pointing out though, that on Windows Server this
           | deduplication is a background process; When new duplicate
           | files are created, they genuinely are duplicates and take up
           | extra space, but once in a while the background process comes
           | along and "reclaims" them, much like the Hyperspace app here
           | does.
           | 
           | Because of this (the background sweep process is expensive),
           | it doesn't run all the time and you have to tell it which
           | directories to scan.
           | 
           | If you want "real" de-duplication, where a duplicate file
           | will never get written in the first place, then you need
           | something like ZFS
        
             | sterlind wrote:
             | hey, it's defrag all over again!
             | 
             |  _(not really, since it 's not fragmentation, but
             | conceptually similar)_
        
             | p_ing wrote:
             | Both ZFS and WinSvr offer "real" dedupe. One is on-write,
             | which requires a significant amount of available memory,
             | the other is on a defined schedule, which uses considerably
             | less memory (300MB + 10MB/TB).
             | 
             | ZFS is great if you believe you'll exceed some threshold of
             | space while writing. I don't personally plan my volumes
             | with that in mind but rather make sure I have some amount
             | of excess free space.
             | 
             | WinSvr allows you to disable dedupe if you want (don't know
             | why you would) where as ZFS is a one-way street without
             | exporting the data.
             | 
             | Both have pros and cons. I can live with the WinSvr cons
             | while ZFS cons (memory) would be outside of my budget, or
             | would have been at the particular time with the particular
             | system.
        
         | taneliv wrote:
         | On ZFS it consumes a lot of RAM. In part I think this is
         | because ZFS does it on the block level, and has to keep track
         | of a lot of blocks to compare against when a new one is written
         | out. It might be easier on resources if implemented on the file
         | level. Not sure if the implementation would be simpler or more
         | complex.
         | 
         | It might also be a little unintuitive that modifying one byte
         | of a large file would result in a lot disk activity, as the
         | file system would need to duplicate the file again.
        
           | gmueckl wrote:
           | Files are always represented as lists of blocks or block
           | spans within a file system. Individual blocks could in theory
           | be partially shared between files at the complexity cost of a
           | reference counter for each block. So changing a single byte
           | in a copy on write file could take the same time regardless
           | of file size because only the affected bock would have to be
           | duplicated. I don't know at all how MacOS implements this
           | copynon write scheme, though.
        
             | MBCook wrote:
             | APFS is a copy on write filesystem if you use the right
             | APIs, so it does what you describe but only for entire
             | files.
             | 
             | I believe as soon as you change a single bite you get a
             | complete copy that's your own.
             | 
             | And that's how this program works. It finds perfect
             | duplicates and then effectively deletes and replaces them
             | with a copy of the existing file so in the background
             | there's only one copy of the bits on the disk.
        
               | mintplant wrote:
               | I suppose this means that you could find yourself
               | unexpectedly out of disk space in unintuitive ways, if
               | you're only trying to change one byte in a cloned file
               | but there isn't enough space to copy its entire contents?
        
               | pansa777 wrote:
               | It doesn't work like you think. If you change one byte of
               | duplicated file - the only "byte" will be changed on disk
               | (a "byte", because, technically is not a byte, but a
               | block).
               | 
               | As far as I understand, it works like a reflink feature
               | in the modern linux FSs. If so, thats really cool, and
               | thats also a bit better than the zfs's snapshots. Iam
               | newbie on macos, but it looks amazing
        
               | MBCook wrote:
               | I'm not sure if it works on a file or block level for
               | CoW, but yes.
               | 
               | However APFS gives you a number of space related foot-
               | guns if you want. You can overcommit partitions, for
               | example.
               | 
               | It also means if you have 30 GB of files on disk that
               | could take up anywhere from a few hundred K to 30 GB of
               | actual data depending on how many dupes you have.
               | 
               | It's a crazy world, but it provides some nice features.
        
               | tonyedgecombe wrote:
               | > I believe as soon as you change a single bite you get a
               | complete copy that's your own.
               | 
               | I think it stores a delta:
               | 
               | https://en.m.wikipedia.org/wiki/Apple_File_System#Clones
        
               | alwillis wrote:
               | That's not how this works. Nothing is deleted. It creates
               | zero-space clones of existing files.
               | 
               | https://en.wikipedia.org/wiki/Apple_File_System?wprov=sft
               | i1#...
        
           | amzin wrote:
           | Is there a FS that keeps only diffs in clone files? It would
           | be neat
        
             | rappatic wrote:
             | I wondered that too.
             | 
             | If we only have two files, A and its duplicate B with some
             | changes as a diff, this works pretty well. Even if the user
             | deletes A, the OS could just apply the diff to the file on
             | disk, unlink A, and assign B to that file.
             | 
             | But if we have A and two different diffs B1 and B2, then
             | try to delete A, it gets a little murkier. Either you do
             | the above process and recalculate the diff for B2 to make
             | it a diff of B1; or you keep the original A floating around
             | on disk, not linked to any file.
             | 
             | Similarly, if you try to modify A, you'd need to
             | recalculate the diffs for all the duplicates.
             | Alternatively, you could do version tracking and have the
             | duplicate's diffs be on a specific version of A. Then every
             | file would have a chain of diffs stretching back to the
             | original content of the file. Complex but could be useful.
             | 
             | It's certainly an interesting concept but might be more
             | trouble than it's worth.
        
               | abrookewood wrote:
               | ZFS does this by de-duplicating at the block level, not
               | the file level. It means you can do what you want without
               | needing to keep track of a chain of differences between
               | files. Note that de-duplication on ZFS has had issues in
               | the past, so there is definitely a trade-off. A newer
               | version of de-duplication sounds interesting, but I don't
               | have any experience with it:
               | https://www.truenas.com/docs/references/zfsdeduplication/
        
             | UltraSane wrote:
             | VAST storage does something like this. Unlike how most
             | storage arrays identify the same block by hash and only
             | store it once VAST uses a content aware hash so hashes of
             | similar blocks are also similar. They store a reference
             | block for each unique hash and then when new data comes in
             | and is hashed the most similar block is used to create byte
             | level deltas against. In practice this works extremely
             | well.
             | 
             | https://www.vastdata.com/blog/breaking-data-reduction-
             | trade-...
        
             | abrookewood wrote:
             | ZFS: "The main benefit of deduplication is that, where
             | appropriate, it can greatly reduce the size of a pool and
             | the disk count and cost. For example, if a server stores
             | files with identical blocks, it could store thousands or
             | even millions of copies for almost _no extra disk space_. "
             | (emphasis added)
             | 
             | https://www.truenas.com/docs/references/zfsdeduplication/
        
             | alwillis wrote:
             | That's how APFS works; it uses delta extents for tracking
             | differences in clones: https://en.wikipedia.org/wiki/Delta_
             | encoding?wprov=sfti1#Var...
        
           | abrookewood wrote:
           | In regards to the second point, this isn't correct for ZFS:
           | "If several files contain the same pieces (blocks) of data or
           | any other pool data occurs more than once in the pool, ZFS
           | stores just one copy of it. Instead of storing many copies of
           | a book it stores one copy and an arbitrary number of pointers
           | to that one copy." [0]. So changing one byte of a large file
           | will not suddenly result in writing the whole file to disk
           | again.
           | 
           | [0] https://www.truenas.com/docs/references/zfsdeduplication/
        
             | karparov wrote:
             | Not the whole file but it would duplicate the block. GP
             | didn't claim that the whole file is copied.
        
             | btilly wrote:
             | This applies to modifying a byte. But inserting a byte will
             | change every block from then on, and will force a rewrite.
             | 
             | Of course, that is true of most filesystems.
        
         | asdfman123 wrote:
         | If Apple is anything like where I work, there's probably a
         | three-year-old bug ticket in their system about it and no real
         | mandate from upper management to allocate resources for it.
        
         | ted_dunning wrote:
         | This is commonly done with compression on block storage
         | devices. That fails, of course, if the file system is
         | encrypting the blocks it sends down to the device.
         | 
         | Doing deduplication at this level is nice because you can
         | dedupe across file systems. If you have, say, a thousand
         | systems that all have the same OS files you can save vats of
         | storage. Many times, the only differences will be system
         | specific configurations like host keys and hostnames. No single
         | filesystem could recognize this commonality.
         | 
         | This fails when the deduplication causes you to have fewer
         | replicas of files with intense usage. To take the previous
         | example, if you boot all thousand machines at the same time,
         | you will have a prodigious I/O load on the kernel images.
        
         | UltraSane wrote:
         | NTFS supports deduplication but it is only available on Server
         | versions which is very annoying.
        
         | nielsbot wrote:
         | Disk Utility.app manages to keep the OS running while make the
         | disk exclusive-access.. I wonder how it does that.
        
       | 999900000999 wrote:
       | A 20$ 1 year licence for something that probably has a FOSS
       | equivalent on Linux...
       | 
       | However, considering Apple will never ever ever allow user
       | replaceable storage on a laptop, this might be worth it.
        
         | ezfe wrote:
         | The cost is because of the fact people won't use it regularly.
         | The developer is offering life time unlocks, lower cost levels
         | for shorter timeframes etc.
        
         | p_ing wrote:
         | The developer does need to make up for the $100 yearly
         | privilege of publishing the app to the App Store.
        
         | jeroenhd wrote:
         | I have yet to see a GUI variant of deduplication software for
         | Linux. There are plenty of command line tools, which probably
         | can be ported to macOS, but there's no user friendly tool to
         | just click through as far as I know.
         | 
         | There's value in convenience. I wouldn't pay for a yearly
         | license (that price seems more than fair for a "version
         | lifetime" price to me?) but seeing as this tool will probably
         | need constant maintenance as Apple tweaks and changes APFS over
         | time, combined with the mandatory Apple taxes for publishing
         | software like this, it's not too awful.
        
           | 999900000999 wrote:
           | 50$ for a lifetime license.
           | 
           | Which really means up until the dev gets bored, which can be
           | as short as 18 months.
           | 
           | I wouldn't mind something like this versioned to OS. 20$ for
           | the current OS, and ten dollars for every significant update.
        
             | artimaeis wrote:
             | The Mac App Store (and all of Apple's App Stores) doesn't
             | enable this sort of licensing. It's exactly the sort of
             | thing that drives a lot of developers to independent
             | distribution.
             | 
             | That's why we see so many more subscription-based apps
             | these days, application development is an ongoing process
             | with ongoing costs, so it needs to have ongoing income. But
             | the traditional buy-it-once app pricing doesn't enable that
             | long-term development and support. The app store supports
             | subscriptions though, so now we get way more subscription-
             | based apps.
             | 
             | I really think Siracusa came up with a clever pricing
             | scheme here, given his want to use the app store for
             | distribution.
        
               | 999900000999 wrote:
               | Okay I stand corrected.
        
       | ZedZark wrote:
       | I did this with two scripts - one that produces and cached sha1
       | sums of files, and another that consumes the output of the first
       | (or any of the *sum progs) and produces stats about duplicate
       | files, with options to delete or hard-link them.
        
         | strunz wrote:
         | I wonder how any comments about hard links will be in these
         | comments by people misunderstanding what this app does.
        
           | theamk wrote:
           | if file is not going to be modified (in the low-level sense -
           | open("w") on the filename; as opposed to rename-and-create-
           | new), then reflinks (what this app does) and hardlinks act
           | somewhat identically.
           | 
           | For example if you have multiple node_modules, or app
           | installs, or source photos/videos (ones you don't edit), or
           | music archives, then hardlinks work just fine.
        
       | albertzeyer wrote:
       | I wrote a similar (but simpler) script which would replace a file
       | by a hardlink if it has the same content.
       | 
       | My main motivation was for the packages of Python virtual envs,
       | where I often have similar packages installed, and even if
       | versions are different, many files would still match. Some of the
       | packages are quite huge, e.g. Numpy, PyTorch, TensorFlow, etc. I
       | got quite some disk space savings from this.
       | 
       | https://github.com/albertz/system-tools/blob/master/bin/merg...
        
         | andrewla wrote:
         | This does not use hard links or symlinks; this uses a feature
         | of the filesystem that allows the creation of copy-on-write
         | clones. [1]
         | 
         | [1] https://en.wikipedia.org/wiki/Apple_File_System#Clones
        
           | gurjeet wrote:
           | So albertzeyer's script can be adapted to use `cp -c`
           | command, to achieve the same effect as Hyperspace.
        
       | diggan wrote:
       | > Like all my apps, Hyperspace is a bit difficult to explain.
       | I've attempted to do so, at length, in the Hyperspace
       | documentation. I hope it makes enough sense to enough people that
       | it will be a useful addition to the Mac ecosystem.
       | 
       | Am I missing something, or isn't it a "file de-duplicator" with a
       | nice UI/UX? Sounds pretty simple to describe, and tells you why
       | it's useful with just two words.
        
         | protonbob wrote:
         | No because it isn't getting rid of the duplicate, it's using a
         | feature of APFS that allows for duplicates to exist separately
         | but share the same internal data.
        
           | yayoohooyahoo wrote:
           | Is it not the same as a hard link (which I believe are
           | supported on Mac too)?
        
             | andrewla wrote:
             | My understanding is that it is a copy-on-write clone, not a
             | hard link. [1]
             | 
             | > Q: Are clone files the same thing as symbolic links or
             | hard links?
             | 
             | > A: No. Symbolic links ("symlinks") and hard links are
             | ways to make two entries in the file system that share the
             | same data. This might sound like the same thing as the
             | space-saving clones used by Hyperspace, but there's one
             | important difference. With symlinks and hard links, a
             | change to one of the files affects all the files.
             | 
             | > The space-saving clones made by Hyperspace are different.
             | Changes to one clone file do not affect other files. Cloned
             | files should look and behave exactly the same as they did
             | before they were converted into clones.
             | 
             | [1] https://hypercritical.co/hyperspace/
        
               | dylan604 wrote:
               | What kind of changes could you make to one clone that
               | would still qualify it as a clone? If there are changes,
               | it's no longer the same file. Even after reading the How
               | It Works[0] link, I'm not groking how it works. Is it
               | making some sort of delta/diff that is applied to the
               | original file? That's not possible for every file format
               | like large media files. I could see that being
               | interesting for text based files, but that gets
               | complicated for complex files.
               | 
               | [0] https://hypercritical.co/hyperspace/#how-it-works
        
               | aeontech wrote:
               | If I understand correctly, a COW clone references the
               | same contents (just like a hardlink) as long as all the
               | filesystem references are pointing to identical file
               | contents.
               | 
               | Once you open one of the reference handles and modify the
               | contents, the copy-on-write process is invoked by the
               | filesystem, and the underlying data is copied into a new,
               | separate file with your new changes, breaking the link.
               | 
               | Comparing with a hardlink, there is no copy-on-write, so
               | any changes made to the contents when editing the file
               | opened from one reference would also show up if you open
               | the other hardlinks to the same file contents.
        
               | dylan604 wrote:
               | ah, that's where the copy-on-write takes place.
               | sometimes, just reading it written by someone else is the
               | knock upside the head I need.
        
               | MBCook wrote:
               | That's correct.
        
             | zippergz wrote:
             | A copy-on-write clone is not the same thing as a hard link.
        
             | rahimnathwani wrote:
             | With a hard link, the content of each of the two 'files'
             | are identical in perpetuity.
             | 
             | With APFS Clones, the contents start off identical, but can
             | be changed independently. If you change a small part of a
             | file, those block(s) will need to be created, but the
             | existing blocks will continue to be shared with the clone.
        
             | actionfromafar wrote:
             | Almost, but the difference is that if you change one of
             | hardlinked files, you change "all of them". (It's really
             | the same file but with different paths.)
             | 
             | https://hypercritical.co/hyperspace/#how-it-works
             | 
             | APFS apparently allows for creating "link files" which when
             | changed, start to diverge.
        
             | alwillis wrote:
             | It's not the same because clones can have separate meta
             | data; in addition, if a cloned file changes, it stores a
             | diff of the changes from the original.
        
           | diggan wrote:
           | Right, but the concept is the same, "remove duplicates" in
           | order to save storage space. If it's using reflinks,
           | softlinks, APFS clones or whatever is more or less an
           | implementation detail.
           | 
           | I know that internally it isn't actually "removing" anything,
           | and that it uses fancy new technology from Apple. But in
           | order to explain the project to strangers, I think my tagline
           | gets the point across pretty well.
        
             | CharlesW wrote:
             | > _Right, but the concept is the same, "remove duplicates"
             | in order to save storage space._
             | 
             | The duplicates aren't removed, though. Nothing changes from
             | the POV of users or software that use those files, and you
             | can continue to make changes to them independently.
        
               | vultour wrote:
               | De-duplication does not mean the duplicates completely
               | disappear. If I download a deduplication utility I expect
               | it to create some sort of soft/hard link. I definitely
               | don't want it to completely remove random files on the
               | filesystem, that's just going to wreak havoc.
        
               | sgerenser wrote:
               | But it can still wreak havoc if you use hardlinks or
               | softlinks, because maybe there was a good reason for
               | having a duplicate file! Imagine you have a photo
               | "foo.jpg." You make a copy of it "foo2.jpg" You're
               | planning on editing that file, but right now, it's a
               | duplicate. At this point you run your "deduper" that
               | turns the second file into a hardlink. Then a few days
               | later you go and edit the file, but wait, the original
               | "backup" file is now modified too! You lost your
               | original.
               | 
               | That's why Copy-on-write clones are completely different
               | than hardlinks.
        
           | dingnuts wrote:
           | It does get rid of the duplicate. The duplicate data is
           | deleted and a hard link is created in its place.
        
             | zippergz wrote:
             | It does not make hard links. It makes copy-on-write clones.
        
             | kemayo wrote:
             | No, because it's not actually a hard link -- if you modify
             | one of the files they'll diverge.
        
               | 8n4vidtmkvmk wrote:
               | Sounds like jdupes with -B
        
               | kemayo wrote:
               | Cursory googling suggests that it's using the same
               | filesystem feature, yeah.
        
         | dewey wrote:
         | The author of the software is a file system enthusiast (so much
         | that in the podcast he's a part of they have a dedicated sound
         | effect every time "filesystem" comes up), a long time blogger
         | and macOS reviewer. So you'll have to see it in that context
         | while documenting every bit and the technical details behind it
         | is important to him...even if it's longer than a tag line on a
         | landing page.
         | 
         | In times where documentation is often an afterthought, and
         | technical details get hidden away from users all the time
         | ("Ooops some error occurred") this should be celebrated.
        
         | zerd wrote:
         | I've been using `fclones` [1] to do this, with `dedupe`, which
         | uses reflink/clonefile.
         | 
         | https://github.com/pkolaczk/fclones
        
       | svilen_dobrev wrote:
       | $ rmlint -c sh:link -L -y s -p -T duplicates
       | 
       | will produce a script which, if run, will hardlink duplicates
        
         | Analemma_ wrote:
         | That's not what this app is doing though. APFS clones are copy-
         | on-write pointers to the same data, not hardlinks.
        
           | phiresky wrote:
           | If you replace `sh:link` with `sh:clone` instead, it will.
           | 
           | > clone: reflink-capable filesystems only. Try to clone both
           | files with the FIDEDUPERANGE ioctl(3p) (or
           | BTRFS_IOC_FILE_EXTENT_SAME on older kernels). This will free
           | up duplicate extents while preserving the metadata of both.
           | Needs at least kernel 4.2.
        
         | wpm wrote:
         | On Linux
        
       | jamesfmilne wrote:
       | Would be nice if git could make use of this on macOS.
       | 
       | Each worktree I usually work on is several gigs of (mostly)
       | identical files.
       | 
       | Unfortunately the source files are often deep in a compressed git
       | pack file, so you can't de-duplicate that.
       | 
       | (Of course, the bigger problem is the build artefacts on each
       | branch, which are like 12G per debug/release per product, but
       | they often diverge for boring reasons.)
        
         | diggan wrote:
         | Git is a really poor fit for a project like that since it's
         | snapshot based instead of diff based... Luckily, `git lfs`
         | exists for working around that, I'm assuming you've already
         | investigated that for the large artifacts?
        
         | theamk wrote:
         | "git worktree" shares a .git folder between multiple checkouts.
         | You'll still have multiple files in working copy, but at least
         | the .pack files would be shared. It is great feature, very
         | robust, I use it all the time.
         | 
         | There is also ".git/objects/info/alternates", accessed via "--
         | shared"/"--reference" option of "git clone", that allows only
         | sharing of object storage and not branches etc... but it is has
         | caveats, and I've only used it in some special circumstances.
        
         | globular-toast wrote:
         | Git de-duplicates everything in its store (in the .git
         | directory) already. That's how it can store thousands of
         | commits which are snapshots of the entire repository without
         | eating up tons of disk space. Why do you have duplicated files
         | in the working directory, though?
        
       | andrewla wrote:
       | Many comments here offering similar solutions based on hardlinks
       | or symlinks.
       | 
       | This uses a specific feature of APFS that allows the creation of
       | copy-on-write clones. [1] If a clone is written to, then it is
       | copied on demand and the original file is unmodified. This is
       | distinct from the behavior of hardlinks or symlinks.
       | 
       | [1] https://en.wikipedia.org/wiki/Apple_File_System#Clones
        
         | bombela wrote:
         | Also called reflink on Linux. Which are supported by bcachefs,
         | Btrfs, CIFS, NFS 4.2, OCFS2, overlayfs, XFS, and OpenZFS.
         | 
         | Sources: https://unix.stackexchange.com/questions/631237/in-
         | linux-whi... https://forums.veeam.com/veeam-backup-
         | replication-f2/openzfs...
        
       | radicality wrote:
       | Hopefully doesn't have similar bug like jdupes did
       | 
       | https://web.archive.org/web/20210506130542/https://github.co...
        
       | david_allison wrote:
       | > Hyperspace can't be installed on "Macintosh HD" because macOS
       | version 15 or later is required.
       | 
       | macOS 15 was released in September 2024, this feels far too soon
       | to deprecate older versions.
        
         | kstrauser wrote:
         | He wanted to write it in Swift 6. Does it support older OS
         | versions?
        
           | jjcob wrote:
           | Swift 6 is not the problem. It's backward compatible.
           | 
           | The problem is SwiftUI. It's very new, still barely usable on
           | the Mac, but they are adding lots of new features every macOS
           | release.
           | 
           | If you want to support older versions of macOS you can't use
           | the nice stuff they just released. Eg. pointerStyle() is a
           | brand new macOS 15 API that is very useful.
        
             | MBCook wrote:
             | I can't remember for sure but there may also have been a
             | recent file system API he said he needed. Or a bug that he
             | had to wait for a fix on.
        
             | therockhead wrote:
             | It's been a while since I last looked at SwiftUI on mac, Is
             | it really still that bad ?
        
               | jjcob wrote:
               | It's not bad, just limited. I think it's getting usable,
               | but just barely so.
               | 
               | They are working on it, and making it better every year.
               | I've started using it for small projects and it's pretty
               | neat how fast you can work with it -- but not everything
               | can be done yet.
               | 
               | Since they are still adding pretty basic stuff every
               | year, it really hurts if you target older versions.
               | AppKit is so mature that for most people it doesn't
               | matter if you can't use new features introduced in the
               | last 3 years. For SwiftUI it still makes a big
               | difference.
        
               | therockhead wrote:
               | I wonder why they haven't tried to back port SwiftUI
               | improvements/versions to the older OSs. Seems like this
               | should have been possible.
        
         | tobr wrote:
         | Can it really be seen as deprecating an old version when it's a
         | brand new app?
        
           | borland wrote:
           | +1. He's not taking anything away because you never had it.
        
           | johnmaguire wrote:
           | I'm a bit confused as the Mac App Store says it's over 4
           | years old.
        
             | furyofantares wrote:
             | The 4+ Age rating is like, who can use the app. Not for 3
             | year olds, apparently.
        
               | heywoods wrote:
               | Despite knowing this is the correct interpretation, I
               | still consistently make the same incorrect interpretation
               | as the parent comment. It would be nice if they made this
               | more intuitive. Glad I'm not the only one that's made
               | that mistake.
        
               | throwanem wrote:
               | I feel like that's true for most of the relatively low-
               | level disk and partition management tooling. As unpopular
               | an opinion as it may lately be around here, I'm enough of
               | a pedagogical traditionalist to remain convinced that
               | introductory logical volume management is best left at
               | least till kindergarten.
        
               | pmarreck wrote:
               | The way they specify this has always confused me, because
               | I actually care more about how old the app is than what
               | age range it's aimed for
        
         | ryandrake wrote:
         | Came here to post the same thing. Would love to try the
         | application, but I guess not if the developer is deliberately
         | excluding my device (which cannot run the bleeding edge OS).
        
           | wpm wrote:
           | The developer deliberately chose to write it in Swift 6.
           | Apple is the one who deliberately excluded Swift 6 from your
           | device.
        
             | ryandrake wrote:
             | Yea, too bad :( Everyone involved with macOS and iOS
             | development seems to be (intentionally or unintentionally)
             | keeping us on the hardware treadmill.
        
               | ForOldHack wrote:
               | Expensive. Keeping us on the expensive hardware
               | treadmill. My guess is that it cannot be listed in the
               | Apple store unless its only for Macs released in the last
               | 11 months.
        
           | kstrauser wrote:
           | In fairness, I don't think you can describe it as bleeding
           | edge when we're 5 months into the annual 12 month upgrade
           | cycle. It's recent, but not exactly an early adapter version
           | at this point.
        
       | BWStearns wrote:
       | I have file A that's in two places and I run this.
       | 
       | I modify A_0. Does this modify A_1 as well or just kind of reify
       | the new state of A_0 while leaving A_1 untouched?
        
         | madeofpalk wrote:
         | It's called copy-on-write because when you modify A_0, it'll
         | make a copy of the file if you write to it but not A_1.
         | 
         | https://en.wikipedia.org/wiki/Copy-on-write#In_computer_stor...
        
           | bsimpson wrote:
           | Which means if you actually edited those files, you might
           | fill up your HD much more quickly than you expected.
           | 
           | But if you have the same 500MB of node_modules in each of
           | your dozen projects, this might actually durably save some
           | space.
        
             | _rend wrote:
             | > Which means if you actually edited those files, you might
             | fill up your HD much more quickly than you expected.
             | 
             | I'm not sure if this is what you intended, but just to be
             | sure: writing changes to a cloned file doesn't immediately
             | duplicate the entire file again in order to write those
             | changes -- they're actually written out-of-line, and the
             | identical blocks are only stored once. From [the docs](^1)
             | posted in a sibling comment:
             | 
             | > Modifications to the data are written elsewhere, and both
             | files continue to share the unmodified blocks. You can use
             | this behavior, for example, to reduce storage space
             | required for document revisions and copies. The figure
             | below shows a file named "My file" and its copy "My file
             | copy" that have two blocks in common and one block that
             | varies between them. On file systems like HFS Plus, they'd
             | each need three on-disk blocks, but on an Apple File System
             | volume, the two common blocks are shared.
             | 
             | [^1]: https://developer.apple.com/documentation/foundation/
             | file_sy...
        
           | kdmtctl wrote:
           | What will happen when the original file will be deleted?
           | Often this handled by block reference counters, which just
           | would be decreased. How APFS handles this? Is there any
           | master/copy concepts or just block references?
        
           | BWStearns wrote:
           | Thanks for the clarification. I expected it worked like that
           | but couldn't find it spelled out after a brief perusal of the
           | docs.
        
         | lgdskhglsa wrote:
         | He's using the "copy on write" feature of the file system. So
         | it should leave A_1 untouched, creating a new copy for A_0's
         | modifications. More info:
         | https://developer.apple.com/documentation/foundation/file_sy...
        
       | astennumero wrote:
       | What algorithm does the application use to figure out if two
       | files are identical? There's a lot of interesting algorithms out
       | there. Hashes, bit by bit comparison etc. But these techniques
       | have their own disadvantages. What is the best way to do this for
       | a large amount of files?
        
         | diegs wrote:
         | This reminds me of
         | https://en.wikipedia.org/wiki/Venti_(software) which was a
         | content-addressible filesystem which used hashes for de-
         | duplication. Since the hashes were computed at write time, the
         | performance penalty is amortized.
        
         | w4yai wrote:
         | I'd hash the first 1024 bytes of all files, and starts from
         | there is any collision. That way you don't need to hash the
         | whole (large) files, but only those with same hashes.
        
           | kstrauser wrote:
           | At that point, why hash them instead of just using the first
           | 1024 bytes as-is?
        
             | sedatk wrote:
             | Probably because you need to keep a lot of those in memory.
        
               | kstrauser wrote:
               | I suspect that a computer with so many files that this
               | would be useful probably has a lot of RAM in it, at least
               | in the common case.
        
               | sedatk wrote:
               | But you need to constantly process them too, not just
               | store them.
        
             | borland wrote:
             | In order to check if a file is a duplicate of another, you
             | need to check it against _every other possible file_. You
             | need some kind of "lookup key".
             | 
             | If we took the first 1024 bytes of each file as the lookup
             | key, then our key size would be 1024 bytes. If you have 1
             | million files on your disk, then that's 128MB of ram just
             | to store all the keys. That's not a big deal these days,
             | but it's also annoying if you have a bunch of files that
             | all start with the same 1024 bytes -- e.g. perhaps all the
             | photoshop documents start with the same header. You'd need
             | a 2-stage comparison, where you first match the key (1024
             | bytes) and then do a full comparison to see if it really
             | matches.
             | 
             | Far more efficient - and less work - If you just use a
             | SHA256 of the file's contents. That gets you a much smaller
             | 32 byte key, and you don't need to bother with 2-stage
             | comparisons.
        
               | kstrauser wrote:
               | I understand the concept. My main point is that it's
               | probably not a huge advantage to store hashes of the
               | first 1KB, which requires CPU to calculate, over just the
               | raw bytes, which requires storage. There's a tradeoff
               | either way.
               | 
               | I don't think it would be far more efficient to do hash
               | the entire contents though. If you have a million files
               | storing a terabyte of data, the 2 stage comparison would
               | read at most 1GB (1 million * 1KB) of data, and less for
               | smaller files. If you do a comparison of the whole hashed
               | contents, you have to read the entire 1TB. There are a
               | hundred confounding variables, for sure. I don't think
               | you could confidently estimate which would be more
               | efficient without a lot of experimenting.
        
               | philsnow wrote:
               | If you're going to keep partial hashes in memory, may as
               | well align it on whatever boundary is the minimal
               | block/sector size that your drives give back to you.
               | Hashing (say) 8kB takes less time than it takes to fetch
               | it from SSD (much less disk), so if you only used the
               | first 1kB, you'd (eventually) need to re-fetch the same
               | block to calculate the hash for the rest of the bytes in
               | that block.
               | 
               | ... okay, so as long as you always feed chunks of data
               | into your hash in the same deterministic order, it
               | doesn't matter for the sake of correctness what that
               | order is or even if you process some bytes multiple
               | times. You could hash the first 1kB, then the second-
               | through-last disk blocks, then the entire first disk
               | block again (double-hashing the first 1kB) and it would
               | still tell you whether two files are identical.
               | 
               | If you're reading from an SSD and seek times don't
               | matter, it's in fact probable that on average a lot of
               | files are going to differ near the start and end (file
               | formats with a header and/or footer) more than in the
               | middle, so maybe a good strategy is to use the first 32k
               | and the last 32k, and then if they're still identical,
               | continue with the middle blocks.
               | 
               | In memory, per-file, you can keep something like
               | - the length       - h(block[0:4])       - h(block[0:4] |
               | block[-5:])       - h(block[0:4] | block[-5:] |
               | block[4:32])       - h(block[0:4] | block[-5:] |
               | block[4:128])       - ...       - h(block[0:4] |
               | block[-5:] | block[4:])
               | 
               | etc, and only calculate the latter partial hashes when
               | there is a collision between earlier ones. If you have
               | 10M files and none of them have the same length, you
               | don't need to hash anything. If you have 10M files and 9M
               | of them are copies of each other except for a metadata
               | tweak that resides in the last handful of bytes, you
               | don't need to read the entirety of all 10M files, just a
               | few blocks from each.
               | 
               | A further refinement would be to have per-file-format
               | hashing strategies... but then hashes wouldn't be
               | comparable between different formats, so if you had 1M
               | pngs, 1M zips, and 1M png-but-also-zip quine files, it
               | gets weird. Probably not worth it to go down this road.
        
             | smusamashah wrote:
             | And why first 1024, can pick from predefined points.
        
               | f1shy wrote:
               | Depending on the medium, the penalty of reading single
               | bytes in sparse locations could be comparable with
               | reading the whole file. Maybe not a big win.
        
           | amelius wrote:
           | I suspect that bytes near the end are more likely to be
           | different (even if there may be some padding). For example,
           | imagine you have several versions of the same document.
           | 
           | Also, use the length of the file for a fast check.
        
         | borland wrote:
         | I don't know exactly what Siracusa is doing here, but I can
         | take an educated guess:
         | 
         | For each candidate file, you need some "key" that you can use
         | to check if another candidate file is the same. There can be
         | millions of files so the key needs to be small and quick to
         | generate, but at the same time we don't want any false
         | positives.
         | 
         | The obvious answer today is a SHA256 hash of the file's
         | contents; It's very fast, not too large (32 bytes) and the odds
         | of a false positive/collision are low enough that the world
         | will end before you ever encounter one. SHA256 is the de-facto
         | standard for this kind of thing and I'd be very surprised if
         | he'd done anything else.
        
           | MBCook wrote:
           | You can start with the size, which is probably really unique.
           | That would likely cut down the search space fast.
           | 
           | At that point maybe it's better to just compare byte by byte?
           | You'll have to read the whole file to generate the hash and
           | if you just compare the bytes there is no chance of hash
           | collision no matter how small.
           | 
           | Plus if you find a difference in bytes 1290 you can just stop
           | there instead of reading the whole thing to finish the hash.
           | 
           | I don't think John has said exactly how on ATP (his podcast
           | with Marco and Casey), but knowing him as a longtime
           | listener/reader he's being _very_ careful. And I think he's
           | said that on the podcast too.
        
             | unclebucknasty wrote:
             | > _which is probably really unique_
             | 
             | Wonder what the distribution is here, on average? I know
             | certain file types tend to cluster in specific ranges.
             | 
             | > _maybe it's better to just compare byte by byte? You'll
             | have to read the whole file to generate the hash_
             | 
             | Definitely, for comparing any two files. But, if you're
             | searching for duplicates across the entire disk, then
             | you're theoretically checking each file multiple times, and
             | each file is checked against multiple times. So, hashing
             | them on first pass could _conceivably_ be more efficient.
             | 
             | > _if you just compare the bytes there is no chance of hash
             | collision_
             | 
             | You could then compare hashes and, only in the exceedingly
             | rare case of a collision, do a byte-by-byte comparison to
             | rule out false positives.
             | 
             | But, if your first optimization (the file size comparison)
             | really does dramatically reduce the search space, then
             | you'd also dramatically cut down on the number of re-
             | comparisons, meaning you may be better off not hashing
             | after all.
             | 
             | You could probably run the file size check, then based on
             | how many comparisons you'll have to do for each matched
             | set, decide whether hashing or byte-by-byte is optimal.
        
           | f1shy wrote:
           | I think the prob. is not so low. I remember reading here
           | about a person getting a foto of another chat in a chat
           | application, which was using sha in the background. I do not
           | recall all the details, it is improbable, but possible.
        
             | kittoes wrote:
             | The probability is truly, obscenely, low. If you read about
             | a collision then you surely weren't reading about SHA256.
             | 
             | https://crypto.stackexchange.com/questions/47809/why-
             | havent-...
        
             | sgerenser wrote:
             | LOL nope, I seriously doubt that was the result of a SHA256
             | collision.
        
           | amelius wrote:
           | Or just use whatever algorithm rsync uses.
        
           | rzzzt wrote:
           | I experimented with a similar, "hardlink farm"-style approach
           | for deduplicated, browseable snapshots. It resulted in a
           | small bash script which did the following:
           | 
           | - compute SHA256 hashes for each file on the source side
           | 
           | - copy files which are not already known to a "canonical
           | copies" folder on the destination (this step uses the hash
           | itself as the file name, which makes it easy to check if I
           | had a copy from the same file earlier)
           | 
           | - mirror the source directory structure to the destination
           | 
           | - create hardlinks in the destination directory structure for
           | each source file; these should use the original file name but
           | point to the canonical copy.
           | 
           | Then I got too scared to actually use it :)
        
           | pmarreck wrote:
           | xxHash (or xxh3 which I believe is even faster) is massively
           | faster than SHA256 at the cost of security, which is
           | unnecessary here.
           | 
           | Of course, engineering being what it is, it's possible that
           | only one of these has hardware support and thus might end up
           | actually being faster in realtime.
        
             | PhilipRoman wrote:
             | Blake3 is my favorite for this kind of thing. It's a
             | cryptographic hash (maybe not the world's strongest, but
             | considered secure), and also fast enough that in real world
             | scenarios it performs just as well as non-crypto hashes
             | like xx.
        
           | karparov wrote:
           | This can be done much faster and safer.
           | 
           | You can group all files into buckets, and as soon as a bucket
           | is empty, discard it. If in the end there are still files in
           | the same bucket, they are duplicates.
           | 
           | Initially all files are in the same bucket.
           | 
           | You now iterate over differentiators which given two files
           | tell you whether they are _maybe_ equal or _definitely_ not
           | equal. They become more and more costly but also more and
           | more exact. You run the differentiator on all files in a
           | bucket to split the bucket into finer equivalence classes.
           | 
           | For example:
           | 
           | * Differentiator 1 is the file size. It's really cheap, you
           | only look at metadata, not the file contents.
           | 
           | * Differentiator 2 can be a hash over the first file block.
           | Slower since you need to open every file, but still blazingly
           | fast and O(1) in file size.
           | 
           | * Differentiator 3 can be a hash over the whole file. O(N) in
           | file size but so precise that if you use a cryptographic hash
           | then you're very unlikely to have false positives still.
           | 
           | * Differentiator 4 can compare files bit for bit. Whether
           | that is really needed depends on how much you trust collision
           | resistance of your chosen hash function. Don't discard this
           | though. Git got bitten by this.
        
         | williamsmj wrote:
         | Deleted comment based on a misunderstanding.
        
           | Sohcahtoa82 wrote:
           | > This tool simply identifies files that point at literally
           | the same data on disk because they were duplicated in a copy-
           | on-write setting.
           | 
           | You misunderstood the article, as it's basically doing the
           | opposite of what you said.
           | 
           | This tool finds duplicate data that is specifically _not_
           | duplicated via copy-on-write, and then _turns it into_ a
           | copy-on-write copy.
        
             | williamsmj wrote:
             | Fair. Deleted.
        
       | ziofill wrote:
       | Lovely idea, but way too expensive for me.
        
       | bsimpson wrote:
       | Interesting idea, and I like the idea of people getting paid for
       | making useful things.
       | 
       | Also, I get a data security itch having a random piece of
       | software from the internet scan every file on an HD, particularly
       | on a work machine where some lawyers might care about what's
       | reading your hard drive. It would be nice if it was open source,
       | so you could see what it's doing.
        
         | Nevermark wrote:
         | > I like the idea of people getting paid for making useful
         | things
         | 
         | > It would be nice if it was open source
         | 
         | > I get a data security itch having a random piece of software
         | from the internet scan every file on an HD
         | 
         | With the source it would be easy for others to create freebie
         | versions, with or without respecting license restrictions or
         | security.
         | 
         | I am not arguing anything, except pondering how software
         | economics and security issues are full of unresolved holes, and
         | the world isn't getting default fairer or safer.
         | 
         | --
         | 
         | The app was a great idea, indeed. I am now surprised Apple
         | doesn't automatically reclaim storage like this. Kudos to the
         | author.
        
         | benced wrote:
         | You could download the app, disconnect Wifi and Ethernet, run
         | the app and the reclamation process, remove the app (remember,
         | you have the guarantees of the macOS App Store so no kernel
         | extensions etc), and then reconnect.
         | 
         | Edit: this might not work with the payment option actually. I
         | don't think you can IAP without the internet.
        
       | diimdeep wrote:
       | Requires macOS 15.0 or later. - Oh god, this is so stupid and
       | most irritating thing about macOS "Application development".
       | 
       | It is really unfair to call it "software" it is more like "glued
       | to recent version of OS ware", meanwhile I can still run .exe
       | compiled in 2006, and with wine even on mac or linux.
        
         | kstrauser wrote:
         | However, you can't run an app targeted for Windows 11 on
         | Windows XP. How unfair is that? Curse you, Microsoft.
        
       | DontBreakAlex wrote:
       | Nice, but I'm not getting a subscription for a filesystem
       | utility. Had it been a one-time $5 license, I would have bought
       | it. At the current price, it's literally cheaper to put files in
       | a S3 bucket or outright buy an SSD.
        
         | benced wrote:
         | "I don't value software but that's not a respectable opinion so
         | I'll launder that opinion via subscriptions"
        
           | DontBreakAlex wrote:
           | Well I do value software, I'm paid $86/h to write some! I
           | just find that for $20/year or $50 one time, you can get way
           | more than 12G of hard drive space. I also don't think that
           | this piece of software requires so much maintenance that it
           | wouldn't be worth making at a lower price. I'm not saying
           | that it's bad software, it's really great, just too
           | expensive... Personally, my gut feeling is that the dev would
           | have had more sales with a one time $5, and made more money
           | overall.
        
         | amelius wrote:
         | There are several such tools for Linux, and they are free, so
         | maybe just change operating systems.
        
           | augusto-moura wrote:
           | I'm pretty sure some of them also work on MacOS. rmlint[1],
           | for example can output a script that reflinks duplicates (or
           | run any script for both files):                 rmlint -c
           | sh:handler=reflink .
           | 
           | I'm not sure if reflink works out of the box, but you can
           | write your own alternative script that just links both files
           | 
           | [1]: https://github.com/sahib/rmlint
        
             | dewey wrote:
             | It does not support APFS:
             | https://github.com/sahib/rmlint/issues/421
        
           | dewey wrote:
           | I don't think either of them supports APFS deduplication
           | though?
        
         | botanical76 wrote:
         | I can't even find the price anywhere. Do you have to install
         | the software to see it?
        
           | sbarre wrote:
           | The Mac App Store page has the pricing at the bottom in the
           | In-App Purchases section..
           | 
           | TL;DR - $49 for a lifetime subscription, or $19/year or
           | $9/month.
           | 
           | It could definitely be easier to find.
        
         | dewey wrote:
         | They had long discussions about the pricing on the podcast the
         | author is a part of (atp.fm). It went through a few iterations
         | of one time purchase, fee for each time you free up space and a
         | subscription. There will always be people unhappy about either
         | choice.
         | 
         | Edit: Apparently both is possible in the end:
         | https://hypercritical.co/hyperspace/#purchase
        
           | mrguyorama wrote:
           | Who would be unhappy with $5 owned forever? Other than the
           | author of course for making less money.
        
             | criddell wrote:
             | People who want the app to stick around and continue to be
             | developed.
             | 
             | I worry about that with Procreate. It feels like it's
             | priced too low to be sustainable.
        
         | dewey wrote:
         | > Two kinds of purchases are possible: one-time purchases and
         | subscriptions.
         | 
         | https://hypercritical.co/hyperspace/#purchase
        
         | pmarreck wrote:
         | Claude 3.7 just rewrote the whole thing (just based on reading
         | the webpage description) as a commandline app for me, so
         | there's that.
         | 
         | And because it has no Internet access yet (and because I
         | prompted it to use a workaround like this in that
         | circumstance), the first thing it asked me to do (after
         | hallucinating the functionality first, and then catching
         | itself) was run `curl https://hypercritical.co/hyperspace/ |
         | sed 's/<[^>]*>//g' | grep -v "^$" | clip`
         | 
         | ("clip" is a bash function I wrote to pipe things onto the
         | clipboard or spit them back out in a cross-platform linux/mac
         | way)                   clip() {           if command -v pbcopy
         | > /dev/null; then             [ -t 0 ] && pbpaste || pbcopy;
         | else             if command -v xclip > /dev/null; then
         | [ -t 0 ] && xclip -o -selection clipboard || xclip -selection
         | clipboard;             else               echo "clip function
         | error: Neither pbcopy/pbpaste nor xclip are available." >&2;
         | return 1;             fi;           fi         }
        
         | jacobp100 wrote:
         | The price does seem very high. It's probably a niche product
         | and I'd imagine developers are the ones who would see the
         | biggest savings. Hopefully it works out for them
        
         | criddell wrote:
         | I think it's priced reasonably. A one-time $5 license wouldn't
         | be sustainable.
         | 
         | Since it's the kind of thing you will likely only need every
         | couple of years, $10 each time feels fair.
         | 
         | If putting all your data online or into an SSD makes more
         | sense, then this app isn't for you and that's okay too.
        
       | the_clarence wrote:
       | Its interesting how Linux tools are all free when even trivial
       | mac tools are being sold. Nothing against someone trying to
       | monetize but the linux culture sure is nice!
        
         | dewey wrote:
         | It's not that nice to call someone's work they spent months on
         | "trivial" without knowing anything about the internals and what
         | they ran into.
        
           | MadnessASAP wrote:
           | I don't think they meant it in a disparaging way, except
           | maybe against Apple. Moreso that typically filesystems that
           | can support deduplication include a deduplication tool in
           | it's standard suite of FS tools. I too find it odd that Apple
           | does not do this.
        
       | eikenberry wrote:
       | I don't understand why a simple, closed source de-dup app is at
       | the top of the front page with 160+ comments? What is so
       | interesting about it? I read the blog and the comments here and I
       | still don't get it.
        
         | benced wrote:
         | The developer is popular and APFS cloning is genuinely
         | technically interesting.
         | 
         | (no, it's not a symlink)
        
           | augusto-moura wrote:
           | COW filesystems are older than MacOS, no surprises for me.
           | Maybe people aren't that aware of it?
        
             | ForOldHack wrote:
             | CoW - Copy on Write. Most probably on older mainframes. (
             | Actually newer mainframes ).
             | 
             | "CoW is used as the underlying mechanism in file systems
             | like ZFS, Btrfs, ReFS, and Bcachefs"
             | 
             | Obligatory: https://en.wikipedia.org/wiki/Copy-on-write
        
         | therockhead wrote:
         | I assume it's because it's from John Siracusa, a long-time Mac
         | enthusiast, blogger, and podcaster. If you listen to him on
         | ATP, it's hard not to like him, and anything he does is bound
         | to get more than the usual upvotes on HN.
        
       | dewey wrote:
       | For those mentioning that there's no price listed, it's not that
       | easy as in the App Store the price varies by country. You can
       | open the App Store link and then look at "In App Purchases"
       | though.
       | 
       | For me on the German store it looks like this:
       | Unlock for One Year 22,99 EUR         Unlock for One Month 9,99
       | EUR         Lifetime Unlock 59,99 EUR
       | 
       | So it supports both one time purchases and subscriptions.
       | Depending on what you prefer. More about that here:
       | https://hypercritical.co/hyperspace/#purchase
        
       | twp wrote:
       | CLI tool to find duplicate files unbelievably quickly:
       | 
       | https://github.com/twpayne/find-duplicates
        
       | archagon wrote:
       | I have to confess: it miffs me that a utility that would normally
       | fly completely under the radar is likely to make the creator
       | thousands of dollars just because he runs a popular podcast. (Am
       | I jealous? Oh yes. But only because I tried to sell similar apps
       | in the past and could barely get any downloads no matter how much
       | I marketed them. Selling software without an existing network
       | seems nigh-on impossible these days.)
       | 
       | Anyway, congrats to Siracusa on the release, great idea, etc.
       | etc.
        
         | dewey wrote:
         | I can understand your criticism as it's easy to arrive at that
         | conclusion (Also a common occurrence when levelsio launches a
         | new product, as his Twitter following is large) but it's also
         | not fair to discount it as "just because he runs a popular
         | podcast".
         | 
         | The author is a "household" name in the macOS / Apple scene for
         | a long time even before the podcast. If someone is spending all
         | their life blogging about all things Apple on outlets like
         | ArsTechnica and is consistently putting out new content on
         | podcasts for decades they will naturally have a better
         | distribution.
         | 
         | How many years did you spend on building up your marketing and
         | distribution reach?
        
           | archagon wrote:
           | I know! I actually like him and wish him the best. I just get
           | a bit annoyed when one of the ATP folks releases some small
           | utility with an unclear niche and then later talks about how
           | they've "merely" earned thousands of dollars from it. When I
           | was an app developer, I would have counted myself lucky to
           | have made just a hundred bucks from a similar release. The
           | gang's popularity gives them a distorted view of the market
           | sometimes, IMHO.
        
       | karparov wrote:
       | TL;DR: He wrote an OS X dedup app which finds files with the same
       | contents and tells the filesystem that their contents are
       | identical, so it can save space (using copy-on-write features).
       | 
       | He points out its dangerous but could be worth it cause space
       | savings.
       | 
       | I wonder if the implementation is using a hash only or does an
       | additional step to actually compare the contents to avoid hash
       | collision issues.
       | 
       | It's not open source, so we'll never know. He chose a pay model
       | instead.
       | 
       | Also, some files might not be identical but have identical
       | blocks. Something that could be explored too. Other filesystems
       | have that either in their tooling or do it online or both.
        
       | sgt wrote:
       | Any way it can be built for 14? It requires macOS 15.
        
       | rusinov wrote:
       | John is a the legend.
        
       | re wrote:
       | On a related note: are there any utilities that can measure disk
       | usage of a folder taking (APFS) cloned files into account?
        
       | Take8435 wrote:
       | Downloaded. Ran it. Tells me "900" files can be cleaned. No
       | summary, no list. But I was at least asked to buy the app. Why
       | would I buy the app if I have no idea if it'll help?
        
         | crb wrote:
         | From the FAQ:
         | 
         | > If some eligible files were found, the amount of disk space
         | that can be reclaimed is shown next to the "Potential Savings"
         | label. To proceed any further, you will have to make a
         | purchase. Once the app's full functionality is unlocked, a
         | "Review Files" button will become available after a successful
         | scan. This will open the Review Window.
         | 
         | I half remember this being discussed on ATP; the logic being
         | that if you have the list of files, you will just go and de-
         | dupe them yourself.
        
           | AyyEye wrote:
           | > the logic being that if you have the list of files, you
           | will just go and de-dupe them yourself.
           | 
           | If you can do that, you can check for duplicates yourself
           | anyway. It's not like there aren't already dozens of great
           | apps that dedupe.
        
         | eps wrote:
         | This reminds me -
         | 
         | Back in the MS-DOS days, when the RAM was sparse, there was a
         | class of so-called "memory optimization" programs. They all
         | inevitably found at least few KB to be reclaimed through their
         | magic even if the same optimizer was run back to back with
         | itself and allowed to "optimize" things. That is, on each run
         | they always find extra memory to be freed. They ultimately did
         | nothing but claim they did the work. Must've sold pretty well
         | nonetheless.
        
       | galaxyLogic wrote:
       | On Windows there is "Dev Drive" which I believe does a similar
       | "copy-on-write" -thing.
       | 
       | If it works it's a no-brainer so why isn't it the default?
       | 
       | https://learn.microsoft.com/en-us/windows/dev-drive/#dev-dri...
        
         | siranachronist wrote:
         | requires refs, which still isnt supported on the system drive
         | on windows, iirc
        
       | o10449366 wrote:
       | What would an equivalent tool be on linux? I guess it depends on
       | the filesystem?
        
       | JackYoustra wrote:
       | What's the difference with jdupes?
        
       | mattgreenrocks wrote:
       | What jumped out to me:
       | 
       | > Finally, at WWDC 2017, Apple announced Apple File System (APFS)
       | for macOS (after secretly test-converting everyone's iPhones to
       | APFS and then reverting them back to HFS+ as part of an earlier
       | iOS 10.x update in one of the most audacious technological
       | gambits in history).
       | 
       | How can you revert a FS change like that if it goes south? You'd
       | certainly exercise the code well but also it seems like you
       | wouldn't be able to back out of it if something was wrong.
        
         | quux wrote:
         | IIRC migrating from HFS+ to APFS can be done without touching
         | any of the data blocks and a parallel set of APFS metadata
         | blocks and superblocks are written to disk. In the test
         | migrations Apple did the entire migration including generating
         | APFS superblocks but held short of committing the change that
         | would permanently replace the HFS+ superblocks with APFS ones.
         | To roll back they "just" needed clean up all the generated APFS
         | superblocks and metadata blocks.
        
           | MBCook wrote:
           | I think that's what they did too. And it was a genius way of
           | testing. They did it more than once too I think.
           | 
           | Run the real thing, throw away the results, report all
           | problems back to the mothership so you have a high chance of
           | catching them all even on their multi-hundred million device
           | fleet.
        
           | k1t wrote:
           | Yes, that's how it's described in this talk transcript:
           | 
           | https://asciiwwdc.com/2017/sessions/715
           | 
           |  _Let's say for simplification we have three metadata regions
           | that report all the entirety of what the file system might be
           | tracking, things like file names, time stamps, where the
           | blocks actually live on disk, and that we also have two
           | regions labeled file data, and if you recall during the
           | conversion process the goal is to only replace the metadata
           | and not touch the file data._
           | 
           |  _We want that to stay exactly where it is as if nothing had
           | happened to it._
           | 
           |  _So the first thing that we're going to do is identify
           | exactly where the metadata is, and as we're walking through
           | it we'll start writing it into the free space of the HFS+
           | volume._
           | 
           |  _And what this gives us is crash protection and the ability
           | to recover in the event that conversion doesn't actually
           | succeed._
           | 
           |  _Now the metadata is identified._
           | 
           |  _We'll then start to write it out to disk, and at this
           | point, if we were doing a dry-run conversion, we'd end here._
           | 
           |  _If we're completing the process, we will write the new
           | superblock on top of the old one, and now we have an APFS
           | volume._
        
       ___________________________________________________________________
       (page generated 2025-02-25 23:00 UTC)