[HN Gopher] The hunt for the M1's neural engine
       ___________________________________________________________________
        
       The hunt for the M1's neural engine
        
       Author : ingve
       Score  : 185 points
       Date   : 2022-03-30 08:52 UTC (14 hours ago)
        
 (HTM) web link (eclecticlight.co)
 (TXT) w3m dump (eclecticlight.co)
        
       | marcan_42 wrote:
       | The Load Balancer is there because the M1 Max has _two_
       | independent Neural Engines (unlike the GPU cores, which are load
       | balanced in hardware and the OS sees as a single one even on the
       | Ultra)... but one ANE is inexplicably disabled on all production
       | systems. The M1 Ultra, logically, has four... and only the first
       | in each die is enabled.
       | 
       | I was waiting for the Mac Studio to drop to come to a conclusion,
       | since it's plausible one ANE could've been disabled for power
       | reasons on the laptops... but with both secondary ANEs in each
       | die in the M1 Ultra off, and with no reports of anyone seeing the
       | _first_ ANE being disabled instead (which could mean it 's a
       | yield thing), I'm going to go ahead and say there was a silicon
       | bug or other issue that made the second ANE inoperable or
       | problematic, and they just decided to fuse it off and leave it as
       | dead silicon on all M1 Max/Ultra systems.
        
         | dannyw wrote:
         | It's just an yields thing I bet; particularly given the
         | relative lack of usage of Neural Engine. If Apple engineers
         | realize that, disabling half the Neural Engines makes no user-
         | visible difference and improves yields by ~1% (hence decreasing
         | silicon costs by ~1%), that's an easy hundreds of millions in
         | saved wafers.
        
           | ricardobeat wrote:
           | Doesn't the fact that it's always the first chip that gets
           | disabled disprove that theory? If it was to improve yield
           | you'd see the other one being disabled at least some of the
           | time.
        
             | fredoralive wrote:
             | It probably depends on the chip yield, and the sample size
             | of whatever survey was used for the assertion it was always
             | the second engine disabled.
             | 
             | It would be reasonable to assume that if both engines work,
             | then the second is always the one to be disabled. Therefore
             | to have the second enabled you'd need to find one where the
             | first engine has failed, and has no other chip killing
             | faults. Depending on the yield TSMC gets, these could be
             | quite rare, so you'd have to have quite a large survey to
             | find them.
             | 
             | Or as other people have noted, it could be an errata
             | meaning the second core is broken, as this isn't the only
             | possible reason.
        
               | WithinReason wrote:
               | This would be a reasonable theory, except the neural
               | engine is a small part of the total chip area and thus
               | unlikely to contribute significantly to total chip yield
        
               | marcan_42 wrote:
               | I know it's not the largest sample size, but I did ask
               | twitter and nobody found one with ane1 enabled, it's
               | always ane0.
        
         | grishka wrote:
         | Is the second ANE disabled in hardware or is it possible to
         | reenable it through software somehow?
        
           | exikyut wrote:
           | Just make sure there are no SkyNet singularities hiding
           | inside first.
           | 
           | ...maybe Apple disabled it for a reason, y'know?
        
             | thisNeeds2BeSad wrote:
             | Maybe protect against lawsuits from a chip patch slowdown
             | after architectural migations are necessary, because of
             | inherent design speed optimization safety flaws?
        
           | marcan_42 wrote:
           | It's disabled and locked in boot firmware, and the firmware
           | is signed.
        
             | unicornfinder wrote:
             | Which implies that, at least theoretically, that Apple
             | _could_ enable the other neural engine at a later date (not
             | that they would).
        
         | runnerup wrote:
         | > it's plausible one ANE could've been disabled for power
         | reasons on the laptops
         | 
         | The linked posting notes that they were able to get the ANE to
         | draw 49 mW. Is this such a significant amount of power that its
         | worth permanently disabling for laptop power draw? Or is there
         | likely much more power being used elsewhere to support ANE in
         | addition to the 49 mW that can be measured directly?
        
         | IshKebab wrote:
         | My guess would be either a power consumption issue - with both
         | ANEs enabled you could get voltage droops below the acceptable
         | limit. Or it requires software support that they haven't
         | implemented yet. Software always takes waaaay longer than
         | hardware people expect.
        
           | marcan_42 wrote:
           | They sold the chips as having only on me ANE, and software
           | support is there since it's used in the M1 Ultra...
        
         | erwincoumans wrote:
         | That is interesting, do you have any references / articles that
         | describe that some ANE's are disabled?
         | 
         | Could it be overheating if the ANE's are not used the right
         | way?
        
           | Jcowell wrote:
           | Probably not but he's the one the highest divers into the M1
           | Chip due to his Linux Project so he's probably the reputable
           | source. Any article written would probably be referencing
           | him.
        
             | erwincoumans wrote:
             | Ah, delightful to have a/the master behind Asahi
             | development/reverse engineering efforts here!
        
               | bschwindHN wrote:
               | Delightful to see the creator of Bullet Physics here in
               | the comments too!
        
         | greggsy wrote:
         | Perhaps they'll allow you to use it through subscription, like
         | z/OS mainframes locking down cores?
        
         | bayindirh wrote:
         | Maybe they want to push one as further as possible before
         | enabling the other, hence seeing the limits and increasing the
         | life of the systems by forcing developers to optimize their
         | code?
         | 
         | Sony pulled a similar trick on their A7 series cameras, and
         | enabled more advanced AF features just with a firmware upgrade.
         | It made the bodies "new" and pushed them at least half a
         | generation forward. It's not the same thing, I know, but it
         | feels _similar enough_ for me.
        
           | klysm wrote:
           | That's a very interesting tactic that I haven't heard of
           | before - almost the opposite of built in obsolescence?
        
             | bayindirh wrote:
             | Yes. Actually professional photographic equipment doesn't
             | get obsolete. Lenses are expensive and making them forward
             | and backward compatible makes sure the user stays inside
             | the ecosystem. Also, you want higher end bodies to be
             | dependable, so you don't obsolete them, but supersede them
             | with better capabilities.
             | 
             | I can take my old D70s today and take beautiful photos,
             | even with it's 6MP sensor, however a newer body would be
             | much more flexible.
        
               | giobox wrote:
               | > Actually professional photographic equipment doesn't
               | get obsolete
               | 
               | > I can take my old D70s today and take beautiful photos,
               | even with it's 6MP sensor
               | 
               | I suspect if you do a wedding shoot with a 6mp
               | interchangeable lens camera, some customers are rightly
               | going to ask questions when you hand over the work... Of
               | course professional photographic equipment gets obsolete
               | - even lens systems get deprecated every 20-30 years too.
               | Newer sensors have vastly more dynamic range than the
               | d70s among other image quality benefits.
               | 
               | I think you argument holds water much more strongly in
               | the context of amateur users, where for sure you can keep
               | getting nice images from old gear for a long time.
        
               | bayindirh wrote:
               | > I suspect if you do a wedding shoot with a 6mp
               | interchangeable lens camera, some customers are rightly
               | going to ask questions when you hand over the work...
               | 
               | Unless you're printing A3 pages, getting gigantic
               | pictures, or cropping aggressively, D70s can still hold
               | up pretty well [0].
               | 
               | > even lens systems get deprecated every 20-30 years too.
               | 
               | Nikon F mount is being deprecated in favor of Z because
               | of mirrorless geometries, not because the lenses or the
               | designs are inferior (given the geometry constraints).
               | Many people still use their old lenses, or nifty fifties
               | are still produced with stellar sharpness levels. I'm not
               | entering into "N" or "L" category of lenses of their
               | respective mounts. Not all of them are post 2000 designs,
               | or redesigns, and they produce extremely good images.
               | 
               | > Newer sensors have vastly more dynamic range than the
               | d70s among other image quality benefits.
               | 
               | As a user of both D70s and A7III I can say that, if
               | there's good enough light (e.g day), one can take pretty
               | nice pictures with a D70s, even today. Yes, it dies
               | pretty fast when light goes low, or it can't focus as
               | fast, or can't take single shot (almost) HDR images
               | (A7III can do that honestly, and that's insane [4]), but
               | unless you're chasing something moving, older cameras are
               | not _that_ bad. [1][2][3]
               | 
               | > I think you argument holds water much more strongly in
               | the context of amateur users, where for sure you can keep
               | getting nice images from old gear for a long time.
               | 
               | Higher end, action oriented professional cameras are not
               | actually built with resolution in mind, especially at the
               | top end. All of the action DSLRs and mirrorless cameras
               | up to a certain point are designed with speed and focus
               | in mind. You can't see A7R or Fuji GFX series in weddings
               | or in stadiums. You'll see A9s, Canon 1D or Nikon D1
               | series cameras. They're built to be fast. Not high res.
               | 
               | A wedding is more forgiving, but again a high MP camera
               | is not preferred since it's more prone to vibration
               | blurring.
               | 
               | [0]: https://www.youtube.com/watch?v=ku3lT8MjyFM
               | 
               | [1]: https://www.flickr.com/photos/zerocoder/41901384135/
               | in/album...
               | 
               | [2]: https://www.flickr.com/photos/zerocoder/28459579257/
               | in/album...
               | 
               | [3]: https://www.flickr.com/photos/zerocoder/39910477633/
               | in/album...
               | 
               | [4]: https://www.flickr.com/photos/zerocoder/33984196648/
               | in/album...
        
               | inferiorhuman wrote:
               | No.
               | 
               | > Unless you're printing A3 pages, getting gigantic
               | pictures, or cropping aggressively, D70s can still hold
               | up pretty well
               | 
               | You even qualified that later with "if there's good
               | enough light" and "unless you're chasing something
               | moving". No, a D70 won't work well for wedding
               | photography. Yes, people shot weddings with much slower
               | film. They don't anymore because, like the D70, slow film
               | is obsolete. People shot weddings with manual focus
               | lenses too and the D70 is awful for MF lenses from the
               | tiny viewfinder to the lack of support for non-CPU
               | lenses. When the D70 was a current product some people
               | did (make no mistake the D70 was never marketed as a pro
               | body) simply because the D70 was on par with its
               | contemporaries.
               | 
               | > Nikon F mount is being deprecated in favor of Z because
               | of mirrorless geometries
               | 
               | Even within the scope of the F mount the D70 is obsolete
               | -- it's incompatible with new E and AF-P lenses.
               | 
               | > A wedding is more forgiving
               | 
               | Wedding photography is about the most technically
               | challenging, least forgiving (low light, constant motion,
               | spontaneous behavior) type of photography out there. The
               | point you were responding to still stands - older digital
               | photographic equipment is obsolete in a professional
               | context while having some utility for hobbyists. Nobody's
               | taking a D1 out to shoot sports these days. In fact most
               | people didn't when it was new because Nikon's autofocus
               | was so far behind Canon's.
        
           | therein wrote:
           | A7 SII has continuous autofocus now via a firmware update?
           | That would be very exciting to me.
        
             | bayindirh wrote:
             | I was talking about A7III. Since I don't have the A7SII, I
             | don't follow its firmware updates. A7III got Animal Eye-AF
             | and much better tracking via a firmware upgrade.
        
           | AgentOrange1234 wrote:
           | Another possible reason would be that there's a hardware bug
           | which only occurs when both are enabled. (Not saying this is
           | what's happening here, but it's very common to ship partly
           | disabled chips to work around bugs.)
        
             | vatys wrote:
             | For chips, that's commonly referred to as "binning" as in
             | the sorting machine drops chips into different bins based
             | on a test result.
             | 
             | A big design with many cores, such as CPU or GPU cores, may
             | have manufacturing defects that makes one or more cores
             | bad. Or it may be on one side or another of a tolerance
             | range and not be able to work at higher power or higher
             | frequency. These parts may get "binned" into a lower
             | performance category, with some cores disabled (because a
             | flaw prevents the core from working) or with reduced
             | maximum performance states.
             | 
             | These are still "good" parts, and can be sold at a lower
             | cost with lower performance, while the "better" and "best"
             | parts will pass more tests and be able to have more or all
             | portions of the chip enabled.
             | 
             | So it's not so much to work around a "bug" which might be a
             | common flaw to all part designs, rather to work around
             | manufacturing tolerance and allow more built parts to be
             | useful rather than garbage.
        
               | brigade wrote:
               | Manufacturing defects are not hardware bugs.
               | 
               | Binning is irrelevant to hardware bugs.
        
               | vatys wrote:
               | That's what I'm saying, in response to the previous
               | comment saying this is to work around a bug. Bugs are
               | common to all parts, where defects are unique per part.
               | Binning works around manufacturing defects and turns a
               | yield problem into different grades or SKUs of parts.
        
               | ghettoimp wrote:
               | Binning is definitely a possibility. Separately from
               | binning, there are often just features that don't work
               | right and get disabled with "chicken switches" or
               | "feature enable bits."
               | 
               | Any two-ANE design would have a lot of control logic that
               | has to be right, e.g., to manage which work gets sent to
               | which ANE, which cache lines get loaded, etc. It's easy
               | to imagine bugs in this logic which would only show up
               | when both ANEs are enabled. So it's likely that there is
               | a chicken bit that you could use to disable one of the
               | ANEs and run in single-ANE mode.
        
       | OskarS wrote:
       | This is slightly beside the point, but those stack traces are C++
       | functions. I was pretty surprised by that (though, granted, I
       | don't know anything about macOS internals). I would have expected
       | either pure C for the kernel, or maybe Objective-C if they wanted
       | a higher level language. They don't really have any C++ in their
       | operating system APIs, right? Like, if you interface with Core
       | <Whatever>, isn't that all C, Objective-C and Swift? Is there a
       | lot of C++ in the macOS kernel/userspace?
        
         | saagarjha wrote:
         | IOKit in the kernel is a C++ API; the userspace version is C.
        
         | irae wrote:
         | They are still a fork of BSD, so it stands to reason that the
         | team working for years on kernel and lower level parts of the
         | OS still uses C++.
        
           | TickleSteve wrote:
           | Mach is not a fork of BSD. The kernel is Mach (Hybrid
           | microkernel) supporting a BSD API.
        
             | astrange wrote:
             | There is a lot of FreeBSD in the kernel/libc/userland. BSD
             | doesn't use C++ though, they use C.
        
         | galad87 wrote:
         | I don't know how many, but a good number of frameworks are C++
         | internally, for example WebKit, the Objective-C runtime itself,
         | IOKit uses a subset of C++, Metal Shading Language is a subset
         | of C++.
        
           | gurkendoktor wrote:
           | There is (was?) always a lot of C++ in Core Animation
           | stacktraces, and Core Animation was really foundational to
           | the iPhone UI.
        
         | jjoonathan wrote:
         | Yeah, the "driver" parts of Mac OS X -- the bits that aren't
         | Mach or BSD -- use a restricted subset of C++. It doesn't carry
         | over into userland. Not much, anyway.
         | 
         | I know that C++ is out of vogue, but last time I wrote drivers
         | in linux and OSX (10 years ago) I left with the distinct
         | impression that it was oversold, at least compared to C. C
         | clunks hard and C++ addresses the worst of it. I've never had
         | to corral a bunch of overzealous junior C++ programmers, which
         | I suspect is where the C++ reputation comes from, but Apple
         | went down that path and wound up with something pretty decent.
         | 
         | IMO today it's a shrug and 25 years ago it was forward looking.
        
           | zozbot234 wrote:
           | Linux devs are working to support Rust, which is a lot
           | cleaner than C++ (essentially, no implementation inheritance)
           | and has more straightforward interfacing with C components.
        
             | jjoonathan wrote:
             | I'm not convinced that "no inheritance" is better --
             | drivers seem to be one place where it's actually put to
             | good use -- but I also don't think it really matters that
             | much. In contrast, Rust definitely has fewer foot-guns, and
             | that matters a lot.
             | 
             | Of course, the OSX driver code is going on 25 years old, so
             | it's not evidence of anyone's opinion that C++ beats Rust
             | for OS dev.
        
             | StillBored wrote:
             | Selective implementation inheritance is actually quite
             | useful for kernel/driver development if its used where
             | function pointers in C tend to be used. Pluggable
             | interfaces. It standardizes the syntax, provides default
             | implementations, and forces implementation of mandatory
             | methods.
             | 
             | C++ when used as a better C, really is better.
        
               | zozbot234 wrote:
               | Pluggable interfaces are ok, and Rust supports them just
               | fine via traits. Actual implementation inheritance is
               | inherently anti-modular.
        
       | servytor wrote:
       | If you are interested in the M1 neural engine, I highly recommend
       | you check out this[0].
       | 
       | [0]: https://github.com/geohot/tinygrad/tree/master/accel/ane
        
         | erwincoumans wrote:
         | Yes, George Hotz (geohot) reverse engineered the neural engine
         | and could make it work for tinygrad, the videos posted in the
         | other reply describe the reverse engineering process.
         | 
         | I wonder why Apple didn't provide low-level API's to access the
         | hardware? It may have various restrictions. I recall Apple also
         | didn't provide proper API's to access OpenCL frameworks on iOS,
         | but some people found workarounds to access that as well. Maybe
         | they only integrate with a few limited but important use cases,
         | TensorFlow, Adobe that they can control.
         | 
         | Could it be that using the ANE in the wrong way overheats the
         | M1?
        
           | exikyut wrote:
           | The likeliest reason is long-term ABI ossification.
        
           | fredoralive wrote:
           | Possibly just to avoid having programs that rely too much on
           | specific implementation details of the current engine causing
           | issues in the future if they decide to change the hardware
           | design? An obvious comparison is graphics cards where you
           | don't get low level access to the GPU[1], so they can change
           | architecture details across generations.
           | 
           | Using a high level API probably makes it easier to implement
           | a software version for hardware that doesn't have the neural
           | engine, like Intel Macs or older A-cores.
           | 
           | [1] Although this probably starts a long conversation about
           | various GPU and ML core APIs and quite how low level they
           | get.
        
           | xenadu02 wrote:
           | CoreML is the API to use the ANE.
        
             | erwincoumans wrote:
             | Thanks, that's right there is a high level API. I meant
             | low-level API's, and to clarify changed my post.
        
           | mhh__ wrote:
           | Apple don't want to let people get used to the internals
           | _and_ spiritually like to enforce a very clear us versus them
           | philosophy when it comes to their new toys. They open source
           | things they want other people to standardize around but if it
           | 's their new toy then its usually closed.
        
             | aseipp wrote:
             | In general I kind of agree with this, but this move isn't
             | anything specific to Apple. Every company designing ML
             | accelerators is doing it. None of them expose anything but
             | the most high level framework they can get away with to
             | users.
             | 
             | I honestly don't know of a single company offering custom
             | machine learning accelerators that let you do anything
             | _except_ use Tensorflow /PyTorch to interface with them,
             | not a chance in hell any they actually will give you the
             | underlying ISA specifics. _Maybe_ the closest is, like, the
             | Xilinx Versal devices or GPUs, but I don 't quite put them
             | in the same category as something like Habana, Groq,
             | GraphCore, where the architecture is bespoke for exactly
             | this use case, and the high level tools are there to
             | insulate you from architectural changes.
             | 
             | If there are any actual productionized, in-use accelerators
             | with low level details available that weren't RE'd from the
             | source components, I'd be very interested in seeing it. But
             | the trend here is very clear unless I'm missing something.
        
               | my123 wrote:
               | Habana has their own SynapseAI layer that their
               | TF/PyTorch port runs on. Custom ops are supported too,
               | via a compiler targeting the TPCs, using a C language
               | variant.
               | 
               | Oh, and they have an open-source UM software stack for
               | those but it's really not usable. Doesn't allow access to
               | the systolic arrays (MME), only using the TPCs is just
               | _starting_ to enumerate what it doesn't have. (but, it
               | made the Linux kernel maintainers happy so...):
               | 
               | https://github.com/HabanaAI/SynapseAI_Core#limitations
               | (not to be confused with the closed-source SynapseAI)
        
               | aseipp wrote:
               | Well, that's good to hear at least! I knew there was some
               | back and forth between the kernel maintainers recently
               | due to all these accelerator drivers going in without any
               | usermode support; Habana's case was kind of interesting
               | because they got accepted into accel/ early by Greg, but
               | they wouldn't have passed the merge criteria used later
               | on for most others like Qualcomm.
               | 
               | Frankly I kind of expected the whole result of that
               | kerfuffle to just be that Habana would let the driver get
               | deleted from upstream and go on their merry way shipping
               | drivers to customers, but I'm happy to be proven wrong!
        
           | aseipp wrote:
           | Because machine learning accelerators are, in the broadest
           | sense, not "done" and rapidly evolving every year. Exposing
           | too many details of the underlying architecture is a prime
           | way to ossify your design, making it impossible to change,
           | and as a result you will fall behind. It is possible the
           | Neural Engine of 2022 will look very different to the one of
           | 2025, as far as the specifics of the design, opcode set, etc
           | all go.
           | 
           | One of the earliest lessons along this line was Itanium.
           | Itanium exposing so much of the underlying architecture as a
           | binary format and binary ABI made evolution of the design
           | extremely difficult later on, even if you could have
           | magically solved all the compiler problems back in 2000. Most
           | machine learning accelerators are some combination of a VLIW
           | and/or systolic array design. Most VLIW designers have
           | learned that exposing the raw instruction pipeline to your
           | users is a bad idea not because it's impossibly difficult to
           | use (compilers do in fact keep getting better), but because
           | it makes change impossible later on. This is also why we got
           | rid of delay slots in scalar ISAs, by the way; yes they are
           | annoying but they also expose too much of the implementation
           | pipeline, which is the much bigger issue.
           | 
           | Many machine learning companies take similar approaches where
           | you can only use high-level frameworks like Tensorflow to
           | interact with the accelerator. This isn't something from
           | Apple's playbook, it's common sense once you begin to design
           | these things. In the case of Other Corporations, there's also
           | the benefit that it helps keep competitors away from their
           | design secrets, but mostly it's for the same reason: exposing
           | too much of the implementation details makes evolution and
           | support extremely difficult.
           | 
           | It sounds crass but my bet is that if Apple exposed the
           | internal details of the ANE and later changed it (which they
           | will, 100% it is not "done") the only "outcome" would be a
           | bunch of rageposting on internet forums like this one.
           | Something like: "DAE Apple mothershitting STUPID for breaking
           | backwards compatibility? This choice has caused US TO SUFFER,
           | all because of their BAD ENGINEERING! If I was responsible I
           | would have already open sourced macOS and designed 10
           | completely open source ML accelerators and named them all
           | 'Linus "Freakin Epic" Torvalds #1-10' where you could program
           | them directly with 1s and 0s and have backwards compatibility
           | for 500 years, but people are SHEEP and so apple doesn't LET
           | US!" This will be posted by a bunch of people who compiled
           | "Hello world" for it one time six months ago and then are mad
           | it doesn't "work" anymore on a computer they do not yet own.
           | 
           | > Could it be that using the ANE in the wrong way overheats
           | the M1?
           | 
           | No.
        
             | smoldesu wrote:
             | Was it really necessary to expand the fourth paragraph
             | post-script to get your point across? Before it was a
             | fairly holistic look at the difference between people who
             | want flexibility and people who want stability, where
             | neither party was necessarily right. Now it just reads like
             | you're mocking people for desiring transparency in their
             | hardware, which... seems hard to demonize?
        
               | aseipp wrote:
               | There are other replies talking about Apple or whatever
               | but I'll be honest: because 2 decades of online forum
               | experience and FOSS development tells me that the final
               | paragraph is exactly what happens anytime you change
               | things like this and they are exposed to turbo-nerds,
               | despite the fact they are often poorly educated and
               | incredibly ill-informed about the topics at hand. You see
               | it here in spades on HN. It doesn't have anything to do
               | with Apple, either; plenty of FOSS maintainers could tell
               | you similar horror stories. I mean it's literally just a
               | paraphrase of an old XKCD.
               | 
               | To be fair though, I mean. I'm mostly a bitchy nerd, too.
               | And broadly speaking, taking the piss is just good fun
               | sometimes. That's the truth, at least for me.
               | 
               | If it helps, simply close your eyes and imagine a very
               | amped up YouTuber saying what I wrote above. But they're
               | doing it while doing weird camera transitions, slow-mo
               | shots of panning up the side of some Mac Mini or
               | whatever. They are standing at a desk with 4 computers
               | that are open-mobo with no case, and 14 GPUs on a shelf
               | behind them. Also the video is like 18 minutes long for
               | some reason. It's pretty funny then, if you ask me.
        
               | smoldesu wrote:
               | For sure, I don't think I disagree with anything you've
               | written here. Where I take umbrage is when there is no
               | _choice_ involved though. Apple could very well provide
               | both a high-level, stable library while _also_ exposing
               | lower-level bindings that are expected to break
               | constantly. If the low-level library is as bad and broken
               | as people say it is, then they should have no problem
               | marketing their high-level bindings as a solution. This
               | is a mentality that frustrates me on many levels of their
               | stack; their choice of graphics API and build systems
               | being just a few other examples.
               | 
               | Maybe this works for some people. I can't knock someone
               | for an opinionated implementation of a complicated
               | system. At the same time though, we can't be surprised
               | when other people have differing opinions, and in a
               | perfect society we wouldn't try to crucify people for
               | making those opinions clear. Apple notoriously lacks a
               | dialogue with their community about this stuff, which is
               | what starts all of this pointless infighting in the first
               | place. Apple does what Apple does, and nerds will fight
               | over it until the heat death of the universe. There
               | really is nothing new under the sun. Mocking the ongoing
               | discussion is almost as phyrric as claiming victory for
               | either side.
        
               | ben174 wrote:
               | Meh, it's okay to be grumpy sometimes. He got his point
               | across and clearly knows what he's talking about. Let him
               | be passionate :)
        
               | gjsman-1000 wrote:
               | He's not wrong - that's absolutely what YouTube and
               | online Linux commentators would do. They have their own
               | echo chamber, just as much as any tech community. Heck,
               | considering your past posts, it's probably something
               | _you_ would do.
               | 
               | As for transparency in hardware, it probably will become
               | more transparent once Apple feels that it is done and a
               | finished science. They don't want to repeat Itanium.
        
               | nebula8804 wrote:
               | Absolutely. It provided a visualization reminder of so
               | many people that come out of their holes to argue
               | whenever there is some criticism of open source. Its one
               | thing to desire freedom but the reality of the situation
               | is that community is toxic for some reason and just not
               | fun to even converse with.
        
               | EricE wrote:
               | I think it was absolutely appropriate because I have seen
               | that cycle happen many, many times over the years.
               | 
               | Especially when Apple is involved. Hell there are still
               | people who see them as beleaguered and about to go out of
               | business at any moment :p
        
               | smoldesu wrote:
               | I get where you're coming from. It's par for the course
               | on Apple's behalf to push this stuff away in lieu of
               | their own, high-level implementation, but I also think
               | that behavior puts them at an impasse. People who want to
               | use this hardware for arbitrary purposes are unable to do
               | so. Apple is unwilling to do it because they want their
               | hand on the "API valve" so to speak. In a case where
               | absolutist rhetoric is being used on either side, I think
               | this is pretty expected. If we're ultimately boiling this
               | down to "having choices" vs "not having choices" though,
               | I think it's perfectly reasonable to expect the most
               | valuable company in the world to go the extra mile and
               | offer both choices to their users and developers.
               | 
               | Or not. It's their hardware, they just won't be selling
               | any Macs to me with that mindset. The only thing that
               | irks me is when people take the bullet for Apple like a
               | multi-trillion dollar corporation needs more people
               | justifying their lack of interoperability.
        
           | irae wrote:
           | All the sibling comments are better guesses, but I would also
           | guess there could be security implications on exposing lower
           | level access. Having it all proprietary and undocumented is
           | itself a way of making it harder to exploit. Albeit, as
           | mentioned, not having to settle ABI is way more likely the
           | primary reason.
        
             | kmeisthax wrote:
             | Apple Silicon has IOMMUs on everything - you generally
             | can't exploit a bug in a coprocessor to gain more access on
             | the main application processor (or another coprocessor).
             | The only hardware bugs with security implications we've
             | found was stuff like M1RACLES, which is merely a covert
             | channel (and it's discoverer doesn't even think it's a
             | problem). Apple does a pretty good job of making sure even
             | their private/internal stuff is secure.
        
           | WithinReason wrote:
           | A high level API needs _much_ less support effort.
        
         | rickdeveloper wrote:
         | He live streamed himself writing a lot of that:
         | 
         | https://www.youtube.com/watch?v=mwmke957ki4
         | 
         | https://www.youtube.com/watch?v=H6ZpMMDvB1M
         | 
         | https://www.youtube.com/watch?v=JAyw7OAcXDE
         | 
         | https://www.youtube.com/watch?v=Cb2KwcnDKrk
        
       | pedro_hab wrote:
       | As a developer I am bit ashamed of this question, but I gotta
       | ask:
       | 
       | What consumer apps actually use Neural Engines?
       | 
       | I think something like Photoshop, maybe. But wouldn't it just
       | train a model and use it as regular code?
       | 
       | I'm interested in AI, but it's a joke to me about startups and
       | jargon more often than not.
       | 
       | I feels weird to add this to all chips when I can't see that much
       | usage.
        
         | sharkjacobs wrote:
         | Anything "predictive" probably uses Neural Engine. These are
         | iOS features, but a lot of them apply to MacOS too
         | 
         | - Visual Lookup - Animoji - Face ID - recognizing "accidental"
         | palm input while using Apple Pencil - monitoring users' usage
         | habits to optimize device battery life and charging - app
         | recommendations - Siri - curating photos into galleries,
         | selecting "good" photos to show in the photos widget -
         | identifying people's faces in photos - creating "good" photos
         | with input from tiny camera lenses and sensors - portrait mode
         | - language translation - on-device dictation - AR plane
         | detection
         | 
         | Core ML API allows third party developers to use Neural Engine
         | to run models
        
           | FinalBriefing wrote:
           | But which consumer apps use it? I know of a handful of photo
           | apps that use it to enhance photos, but I'm not aware of any
           | other types of apps.
        
         | kitsunesoba wrote:
         | I think there may be more indirect usage than direct usage.
         | Little bits of neural engine usage are peppered through the
         | native APIs.
         | 
         | Of course if you're using third party libraries that don't rely
         | on macOS APIs this won't be happening in your app.
        
         | miohtama wrote:
         | Maybe the neural engine is ideal to scan your local image
         | library to find kiddy porn:
         | 
         | https://www.theverge.com/2021/12/15/22837631/apple-csam-dete...
        
         | gfody wrote:
         | could the M1 itself be using it for branch prediction?
        
         | OberstKrueger wrote:
         | Pixelmator Pro uses it for some of its ML functionality. Image
         | scaling can use it, and it provides a cleaner image when
         | upscaling, removing some compression artifacts and just
         | smoothing it out more naturally. I've found it can work well
         | downsizing too, although less of an effect. They also have an
         | ML auto-crop tool and ML denoiser. All of these will hit the
         | Neural Engine pretty good.
        
       | noduerme wrote:
       | This is a bizarre result but so.. what's the conclusion? That
       | only a few things like Apple's proprietary image lookup are able
       | to tap into the ANE so far? Or that it's actually just a
       | marketing gimmick?
       | 
       | Reading this makes me wonder if it's not just a placeholder for
       | some kind of intrusive system that will neural-hash everything
       | you own, but I'm sure I'm just being paranoid.
        
         | bayindirh wrote:
         | Tensorflow has a CoreML enabled version which run on ANE.
         | 
         | https://github.com/apple/tensorflow_macos
        
           | my123 wrote:
           | AFAIK that doesn't run on ANE but on the GPU. ANE is used for
           | inference only w/ CoreML.
        
         | masklinn wrote:
         | > Reading this makes me wonder if it's not just a placeholder
         | for some kind of intrusive system that will neural-hash
         | everything you own, but I'm sure I'm just being paranoid.
         | 
         | It's also actively counter-productive: if they wanted to do
         | this sort of tracking, they could just have done what all of
         | their competitors do and send the data straight to their
         | servers. This is hardware (and thus expenses) which is _only_
         | necessary because of their stance on privacy, and avoiding off-
         | device work.
        
           | mikotodomo wrote:
           | Apple has made huge compromises to try and give some privacy
           | back to the consumer, who lost it all from paying for cheap
           | products where the product is you. And people just ignore
           | this progress because they want to be anti-Apple. It's sad.
        
         | avianlyric wrote:
         | I would guess that the ANE has some very specialised hardware
         | (E.g. INT8 or FP16 only), and there's isn't a huge amount of
         | it. So many nets either don't fit completely, or aren't using
         | the right types of operation to match the ANE. Either can't run
         | on the ANE, or only have a subset of layers that can run on the
         | ANE.
         | 
         | So when running a neural net iOS / macOS needs to make a
         | decision about where to run each net. Even if a net has layers
         | in it that are a perfect match for the ANE, there's still a
         | trade off from having move the workload back and forth between
         | a CPU core and the ANE (although the unified memory should
         | eliminate a big chunk of this cost).
         | 
         | It might be that in the general case it's not worth the latency
         | hit from using mixed processors when running net that isn't
         | 100% ANE compatible, or could just be that Apple haven't got
         | round to implementing the logic needed to gracefully spilt
         | workloads across the ANE and a CPU core. Which would make
         | sense, because they've got the time and expertise to ensure all
         | their nets fit within the ANE. Something that's difficult for
         | 3rd party devs to do, because they don't have access to
         | detailed ANE docs.
        
         | [deleted]
        
         | londons_explore wrote:
         | > That only a few things like Apple's proprietary image lookup
         | are able to tap into the ANE so far?
         | 
         | That would seem like the logical conclusion. Perhaps there are
         | hardware bugs/shortcomings that makes it very hard to use for
         | the neural network API. Perhaps the software team is just
         | behind and still building that.
        
           | my123 wrote:
           | Using CoreML has some catches, like having to use FP16
           | instead of integer formats notably.
        
           | noduerme wrote:
           | My initial understanding was that Adobe was leveraging it in
           | their experimental "AI" Photoshop plugins. Although a
           | majority of those seem to require a live internet connection
           | to work. Some of the newer core Photoshop functionality for
           | intelligent selection is ridiculously fast on an M1 Max,
           | though, which makes me think it's probably using the neural
           | chips.
        
             | sharikous wrote:
             | It might be, but there is also another matrix
             | multiplication accelerator that could be responsible
        
         | my123 wrote:
         | The ANE is accessible via the CoreML framework, it's a high
         | level interface for ML inference.
         | 
         | It however turns out that a _lot_ of customer apps today don't
         | use those accelerators at all.
         | 
         | (and about the attempt at using BNNS functions, that's not
         | offloaded, it runs on the host CPU cores w/ the AMX tightly
         | bound accelerator)
        
           | galangalalgol wrote:
           | Would it be possible for someone ro write a onnx runtime
           | utilizing coreml? That would open it up to a lot more
           | applications instantly.
        
             | ianai wrote:
             | ONNX lists CoreML support.
        
         | sharikous wrote:
         | I have the same anxiety but I don't think it is still there.
         | 
         | The Asahi project is putting a deliberate low priority on the
         | ANE but I have seen some other small reverse engineering
         | attempts.
         | 
         | I think some use of the ANE outside of Apple APIs will be
         | possible soon.
        
       | modeless wrote:
       | Isn't BNNS documented to run on the CPU? Why would you expect it
       | to use the neural engine? Apple also has Metal Performance
       | Shaders which of course run on the GPU only. The user accessible
       | API for the neural engine is Core ML. Very high level
       | unfortunately.
       | 
       | Hmm, it seems like there's also a new API that can use the neural
       | engine sometimes, "ML Compute". But only for inference?
       | https://developer.apple.com/documentation/mlcompute/mlcdevic...
        
       | endorphine wrote:
       | For the totally uninitiated, what's a neural engine in general?
       | What are they used for, and why Apple added this to their
       | products?
        
         | justusthane wrote:
         | His previous blog post has more general information about the
         | neural engine: https://eclecticlight.co/2022/03/29/live-text-
         | visual-look-up...
         | 
         | Basically it's used by anything involving machine learning:
         | 
         | - Speech recognition
         | 
         | - Face recognition
         | 
         | - Visual lookup (image recognition)
         | 
         | - Live Text (OCR)
         | 
         | It allows all these functions to be performed efficiently on-
         | device rather than shipping data off to the cloud.
        
         | jlouis wrote:
         | It's an area of the chip suited for operations on low
         | precision/range floating point numbers. Neural networks don't
         | generally require high precision in floating point
         | computations, but require a lot of them. This means your data
         | paths can be smaller (16 bit wide rather than 32 bit wide); the
         | consequence of which is you can do far more computation per mm2
         | die space.
         | 
         | The second part of the chain is that you also tailor the
         | operations the die area support to those operations necessary
         | in a typical neural network, further optimizing the chip.
         | 
         | The end result is very power-efficient execution of neural
         | networks, which allows you to beat a GPU or CPUs power curve,
         | improving thermals of the core, and in the case of mobile
         | devices, optimizes battery usage.
        
           | acdha wrote:
           | It's probably also worth noting that the last part is fairly
           | important to Apple since they have based a lot of their
           | privacy stance around on-device processing of things like
           | Siri commands, photo/video analysis, image recognition
           | (VoiceOver can attempt to automatically describe images for
           | blind people), speech to text dictation, the various
           | audio/video enhancements like Center Stage or the voice
           | emphasis features, etc. and all of that means they're running
           | a lot of networks on battery.
           | 
           | Those efficiency wins are almost certainly worth it even if
           | third-party developers don't use it much.
        
       ___________________________________________________________________
       (page generated 2022-03-30 23:01 UTC)