[HN Gopher] The hunt for the M1's neural engine
___________________________________________________________________
The hunt for the M1's neural engine
Author : ingve
Score : 185 points
Date : 2022-03-30 08:52 UTC (14 hours ago)
(HTM) web link (eclecticlight.co)
(TXT) w3m dump (eclecticlight.co)
| marcan_42 wrote:
| The Load Balancer is there because the M1 Max has _two_
| independent Neural Engines (unlike the GPU cores, which are load
| balanced in hardware and the OS sees as a single one even on the
| Ultra)... but one ANE is inexplicably disabled on all production
| systems. The M1 Ultra, logically, has four... and only the first
| in each die is enabled.
|
| I was waiting for the Mac Studio to drop to come to a conclusion,
| since it's plausible one ANE could've been disabled for power
| reasons on the laptops... but with both secondary ANEs in each
| die in the M1 Ultra off, and with no reports of anyone seeing the
| _first_ ANE being disabled instead (which could mean it 's a
| yield thing), I'm going to go ahead and say there was a silicon
| bug or other issue that made the second ANE inoperable or
| problematic, and they just decided to fuse it off and leave it as
| dead silicon on all M1 Max/Ultra systems.
| dannyw wrote:
| It's just an yields thing I bet; particularly given the
| relative lack of usage of Neural Engine. If Apple engineers
| realize that, disabling half the Neural Engines makes no user-
| visible difference and improves yields by ~1% (hence decreasing
| silicon costs by ~1%), that's an easy hundreds of millions in
| saved wafers.
| ricardobeat wrote:
| Doesn't the fact that it's always the first chip that gets
| disabled disprove that theory? If it was to improve yield
| you'd see the other one being disabled at least some of the
| time.
| fredoralive wrote:
| It probably depends on the chip yield, and the sample size
| of whatever survey was used for the assertion it was always
| the second engine disabled.
|
| It would be reasonable to assume that if both engines work,
| then the second is always the one to be disabled. Therefore
| to have the second enabled you'd need to find one where the
| first engine has failed, and has no other chip killing
| faults. Depending on the yield TSMC gets, these could be
| quite rare, so you'd have to have quite a large survey to
| find them.
|
| Or as other people have noted, it could be an errata
| meaning the second core is broken, as this isn't the only
| possible reason.
| WithinReason wrote:
| This would be a reasonable theory, except the neural
| engine is a small part of the total chip area and thus
| unlikely to contribute significantly to total chip yield
| marcan_42 wrote:
| I know it's not the largest sample size, but I did ask
| twitter and nobody found one with ane1 enabled, it's
| always ane0.
| grishka wrote:
| Is the second ANE disabled in hardware or is it possible to
| reenable it through software somehow?
| exikyut wrote:
| Just make sure there are no SkyNet singularities hiding
| inside first.
|
| ...maybe Apple disabled it for a reason, y'know?
| thisNeeds2BeSad wrote:
| Maybe protect against lawsuits from a chip patch slowdown
| after architectural migations are necessary, because of
| inherent design speed optimization safety flaws?
| marcan_42 wrote:
| It's disabled and locked in boot firmware, and the firmware
| is signed.
| unicornfinder wrote:
| Which implies that, at least theoretically, that Apple
| _could_ enable the other neural engine at a later date (not
| that they would).
| runnerup wrote:
| > it's plausible one ANE could've been disabled for power
| reasons on the laptops
|
| The linked posting notes that they were able to get the ANE to
| draw 49 mW. Is this such a significant amount of power that its
| worth permanently disabling for laptop power draw? Or is there
| likely much more power being used elsewhere to support ANE in
| addition to the 49 mW that can be measured directly?
| IshKebab wrote:
| My guess would be either a power consumption issue - with both
| ANEs enabled you could get voltage droops below the acceptable
| limit. Or it requires software support that they haven't
| implemented yet. Software always takes waaaay longer than
| hardware people expect.
| marcan_42 wrote:
| They sold the chips as having only on me ANE, and software
| support is there since it's used in the M1 Ultra...
| erwincoumans wrote:
| That is interesting, do you have any references / articles that
| describe that some ANE's are disabled?
|
| Could it be overheating if the ANE's are not used the right
| way?
| Jcowell wrote:
| Probably not but he's the one the highest divers into the M1
| Chip due to his Linux Project so he's probably the reputable
| source. Any article written would probably be referencing
| him.
| erwincoumans wrote:
| Ah, delightful to have a/the master behind Asahi
| development/reverse engineering efforts here!
| bschwindHN wrote:
| Delightful to see the creator of Bullet Physics here in
| the comments too!
| greggsy wrote:
| Perhaps they'll allow you to use it through subscription, like
| z/OS mainframes locking down cores?
| bayindirh wrote:
| Maybe they want to push one as further as possible before
| enabling the other, hence seeing the limits and increasing the
| life of the systems by forcing developers to optimize their
| code?
|
| Sony pulled a similar trick on their A7 series cameras, and
| enabled more advanced AF features just with a firmware upgrade.
| It made the bodies "new" and pushed them at least half a
| generation forward. It's not the same thing, I know, but it
| feels _similar enough_ for me.
| klysm wrote:
| That's a very interesting tactic that I haven't heard of
| before - almost the opposite of built in obsolescence?
| bayindirh wrote:
| Yes. Actually professional photographic equipment doesn't
| get obsolete. Lenses are expensive and making them forward
| and backward compatible makes sure the user stays inside
| the ecosystem. Also, you want higher end bodies to be
| dependable, so you don't obsolete them, but supersede them
| with better capabilities.
|
| I can take my old D70s today and take beautiful photos,
| even with it's 6MP sensor, however a newer body would be
| much more flexible.
| giobox wrote:
| > Actually professional photographic equipment doesn't
| get obsolete
|
| > I can take my old D70s today and take beautiful photos,
| even with it's 6MP sensor
|
| I suspect if you do a wedding shoot with a 6mp
| interchangeable lens camera, some customers are rightly
| going to ask questions when you hand over the work... Of
| course professional photographic equipment gets obsolete
| - even lens systems get deprecated every 20-30 years too.
| Newer sensors have vastly more dynamic range than the
| d70s among other image quality benefits.
|
| I think you argument holds water much more strongly in
| the context of amateur users, where for sure you can keep
| getting nice images from old gear for a long time.
| bayindirh wrote:
| > I suspect if you do a wedding shoot with a 6mp
| interchangeable lens camera, some customers are rightly
| going to ask questions when you hand over the work...
|
| Unless you're printing A3 pages, getting gigantic
| pictures, or cropping aggressively, D70s can still hold
| up pretty well [0].
|
| > even lens systems get deprecated every 20-30 years too.
|
| Nikon F mount is being deprecated in favor of Z because
| of mirrorless geometries, not because the lenses or the
| designs are inferior (given the geometry constraints).
| Many people still use their old lenses, or nifty fifties
| are still produced with stellar sharpness levels. I'm not
| entering into "N" or "L" category of lenses of their
| respective mounts. Not all of them are post 2000 designs,
| or redesigns, and they produce extremely good images.
|
| > Newer sensors have vastly more dynamic range than the
| d70s among other image quality benefits.
|
| As a user of both D70s and A7III I can say that, if
| there's good enough light (e.g day), one can take pretty
| nice pictures with a D70s, even today. Yes, it dies
| pretty fast when light goes low, or it can't focus as
| fast, or can't take single shot (almost) HDR images
| (A7III can do that honestly, and that's insane [4]), but
| unless you're chasing something moving, older cameras are
| not _that_ bad. [1][2][3]
|
| > I think you argument holds water much more strongly in
| the context of amateur users, where for sure you can keep
| getting nice images from old gear for a long time.
|
| Higher end, action oriented professional cameras are not
| actually built with resolution in mind, especially at the
| top end. All of the action DSLRs and mirrorless cameras
| up to a certain point are designed with speed and focus
| in mind. You can't see A7R or Fuji GFX series in weddings
| or in stadiums. You'll see A9s, Canon 1D or Nikon D1
| series cameras. They're built to be fast. Not high res.
|
| A wedding is more forgiving, but again a high MP camera
| is not preferred since it's more prone to vibration
| blurring.
|
| [0]: https://www.youtube.com/watch?v=ku3lT8MjyFM
|
| [1]: https://www.flickr.com/photos/zerocoder/41901384135/
| in/album...
|
| [2]: https://www.flickr.com/photos/zerocoder/28459579257/
| in/album...
|
| [3]: https://www.flickr.com/photos/zerocoder/39910477633/
| in/album...
|
| [4]: https://www.flickr.com/photos/zerocoder/33984196648/
| in/album...
| inferiorhuman wrote:
| No.
|
| > Unless you're printing A3 pages, getting gigantic
| pictures, or cropping aggressively, D70s can still hold
| up pretty well
|
| You even qualified that later with "if there's good
| enough light" and "unless you're chasing something
| moving". No, a D70 won't work well for wedding
| photography. Yes, people shot weddings with much slower
| film. They don't anymore because, like the D70, slow film
| is obsolete. People shot weddings with manual focus
| lenses too and the D70 is awful for MF lenses from the
| tiny viewfinder to the lack of support for non-CPU
| lenses. When the D70 was a current product some people
| did (make no mistake the D70 was never marketed as a pro
| body) simply because the D70 was on par with its
| contemporaries.
|
| > Nikon F mount is being deprecated in favor of Z because
| of mirrorless geometries
|
| Even within the scope of the F mount the D70 is obsolete
| -- it's incompatible with new E and AF-P lenses.
|
| > A wedding is more forgiving
|
| Wedding photography is about the most technically
| challenging, least forgiving (low light, constant motion,
| spontaneous behavior) type of photography out there. The
| point you were responding to still stands - older digital
| photographic equipment is obsolete in a professional
| context while having some utility for hobbyists. Nobody's
| taking a D1 out to shoot sports these days. In fact most
| people didn't when it was new because Nikon's autofocus
| was so far behind Canon's.
| therein wrote:
| A7 SII has continuous autofocus now via a firmware update?
| That would be very exciting to me.
| bayindirh wrote:
| I was talking about A7III. Since I don't have the A7SII, I
| don't follow its firmware updates. A7III got Animal Eye-AF
| and much better tracking via a firmware upgrade.
| AgentOrange1234 wrote:
| Another possible reason would be that there's a hardware bug
| which only occurs when both are enabled. (Not saying this is
| what's happening here, but it's very common to ship partly
| disabled chips to work around bugs.)
| vatys wrote:
| For chips, that's commonly referred to as "binning" as in
| the sorting machine drops chips into different bins based
| on a test result.
|
| A big design with many cores, such as CPU or GPU cores, may
| have manufacturing defects that makes one or more cores
| bad. Or it may be on one side or another of a tolerance
| range and not be able to work at higher power or higher
| frequency. These parts may get "binned" into a lower
| performance category, with some cores disabled (because a
| flaw prevents the core from working) or with reduced
| maximum performance states.
|
| These are still "good" parts, and can be sold at a lower
| cost with lower performance, while the "better" and "best"
| parts will pass more tests and be able to have more or all
| portions of the chip enabled.
|
| So it's not so much to work around a "bug" which might be a
| common flaw to all part designs, rather to work around
| manufacturing tolerance and allow more built parts to be
| useful rather than garbage.
| brigade wrote:
| Manufacturing defects are not hardware bugs.
|
| Binning is irrelevant to hardware bugs.
| vatys wrote:
| That's what I'm saying, in response to the previous
| comment saying this is to work around a bug. Bugs are
| common to all parts, where defects are unique per part.
| Binning works around manufacturing defects and turns a
| yield problem into different grades or SKUs of parts.
| ghettoimp wrote:
| Binning is definitely a possibility. Separately from
| binning, there are often just features that don't work
| right and get disabled with "chicken switches" or
| "feature enable bits."
|
| Any two-ANE design would have a lot of control logic that
| has to be right, e.g., to manage which work gets sent to
| which ANE, which cache lines get loaded, etc. It's easy
| to imagine bugs in this logic which would only show up
| when both ANEs are enabled. So it's likely that there is
| a chicken bit that you could use to disable one of the
| ANEs and run in single-ANE mode.
| OskarS wrote:
| This is slightly beside the point, but those stack traces are C++
| functions. I was pretty surprised by that (though, granted, I
| don't know anything about macOS internals). I would have expected
| either pure C for the kernel, or maybe Objective-C if they wanted
| a higher level language. They don't really have any C++ in their
| operating system APIs, right? Like, if you interface with Core
| <Whatever>, isn't that all C, Objective-C and Swift? Is there a
| lot of C++ in the macOS kernel/userspace?
| saagarjha wrote:
| IOKit in the kernel is a C++ API; the userspace version is C.
| irae wrote:
| They are still a fork of BSD, so it stands to reason that the
| team working for years on kernel and lower level parts of the
| OS still uses C++.
| TickleSteve wrote:
| Mach is not a fork of BSD. The kernel is Mach (Hybrid
| microkernel) supporting a BSD API.
| astrange wrote:
| There is a lot of FreeBSD in the kernel/libc/userland. BSD
| doesn't use C++ though, they use C.
| galad87 wrote:
| I don't know how many, but a good number of frameworks are C++
| internally, for example WebKit, the Objective-C runtime itself,
| IOKit uses a subset of C++, Metal Shading Language is a subset
| of C++.
| gurkendoktor wrote:
| There is (was?) always a lot of C++ in Core Animation
| stacktraces, and Core Animation was really foundational to
| the iPhone UI.
| jjoonathan wrote:
| Yeah, the "driver" parts of Mac OS X -- the bits that aren't
| Mach or BSD -- use a restricted subset of C++. It doesn't carry
| over into userland. Not much, anyway.
|
| I know that C++ is out of vogue, but last time I wrote drivers
| in linux and OSX (10 years ago) I left with the distinct
| impression that it was oversold, at least compared to C. C
| clunks hard and C++ addresses the worst of it. I've never had
| to corral a bunch of overzealous junior C++ programmers, which
| I suspect is where the C++ reputation comes from, but Apple
| went down that path and wound up with something pretty decent.
|
| IMO today it's a shrug and 25 years ago it was forward looking.
| zozbot234 wrote:
| Linux devs are working to support Rust, which is a lot
| cleaner than C++ (essentially, no implementation inheritance)
| and has more straightforward interfacing with C components.
| jjoonathan wrote:
| I'm not convinced that "no inheritance" is better --
| drivers seem to be one place where it's actually put to
| good use -- but I also don't think it really matters that
| much. In contrast, Rust definitely has fewer foot-guns, and
| that matters a lot.
|
| Of course, the OSX driver code is going on 25 years old, so
| it's not evidence of anyone's opinion that C++ beats Rust
| for OS dev.
| StillBored wrote:
| Selective implementation inheritance is actually quite
| useful for kernel/driver development if its used where
| function pointers in C tend to be used. Pluggable
| interfaces. It standardizes the syntax, provides default
| implementations, and forces implementation of mandatory
| methods.
|
| C++ when used as a better C, really is better.
| zozbot234 wrote:
| Pluggable interfaces are ok, and Rust supports them just
| fine via traits. Actual implementation inheritance is
| inherently anti-modular.
| servytor wrote:
| If you are interested in the M1 neural engine, I highly recommend
| you check out this[0].
|
| [0]: https://github.com/geohot/tinygrad/tree/master/accel/ane
| erwincoumans wrote:
| Yes, George Hotz (geohot) reverse engineered the neural engine
| and could make it work for tinygrad, the videos posted in the
| other reply describe the reverse engineering process.
|
| I wonder why Apple didn't provide low-level API's to access the
| hardware? It may have various restrictions. I recall Apple also
| didn't provide proper API's to access OpenCL frameworks on iOS,
| but some people found workarounds to access that as well. Maybe
| they only integrate with a few limited but important use cases,
| TensorFlow, Adobe that they can control.
|
| Could it be that using the ANE in the wrong way overheats the
| M1?
| exikyut wrote:
| The likeliest reason is long-term ABI ossification.
| fredoralive wrote:
| Possibly just to avoid having programs that rely too much on
| specific implementation details of the current engine causing
| issues in the future if they decide to change the hardware
| design? An obvious comparison is graphics cards where you
| don't get low level access to the GPU[1], so they can change
| architecture details across generations.
|
| Using a high level API probably makes it easier to implement
| a software version for hardware that doesn't have the neural
| engine, like Intel Macs or older A-cores.
|
| [1] Although this probably starts a long conversation about
| various GPU and ML core APIs and quite how low level they
| get.
| xenadu02 wrote:
| CoreML is the API to use the ANE.
| erwincoumans wrote:
| Thanks, that's right there is a high level API. I meant
| low-level API's, and to clarify changed my post.
| mhh__ wrote:
| Apple don't want to let people get used to the internals
| _and_ spiritually like to enforce a very clear us versus them
| philosophy when it comes to their new toys. They open source
| things they want other people to standardize around but if it
| 's their new toy then its usually closed.
| aseipp wrote:
| In general I kind of agree with this, but this move isn't
| anything specific to Apple. Every company designing ML
| accelerators is doing it. None of them expose anything but
| the most high level framework they can get away with to
| users.
|
| I honestly don't know of a single company offering custom
| machine learning accelerators that let you do anything
| _except_ use Tensorflow /PyTorch to interface with them,
| not a chance in hell any they actually will give you the
| underlying ISA specifics. _Maybe_ the closest is, like, the
| Xilinx Versal devices or GPUs, but I don 't quite put them
| in the same category as something like Habana, Groq,
| GraphCore, where the architecture is bespoke for exactly
| this use case, and the high level tools are there to
| insulate you from architectural changes.
|
| If there are any actual productionized, in-use accelerators
| with low level details available that weren't RE'd from the
| source components, I'd be very interested in seeing it. But
| the trend here is very clear unless I'm missing something.
| my123 wrote:
| Habana has their own SynapseAI layer that their
| TF/PyTorch port runs on. Custom ops are supported too,
| via a compiler targeting the TPCs, using a C language
| variant.
|
| Oh, and they have an open-source UM software stack for
| those but it's really not usable. Doesn't allow access to
| the systolic arrays (MME), only using the TPCs is just
| _starting_ to enumerate what it doesn't have. (but, it
| made the Linux kernel maintainers happy so...):
|
| https://github.com/HabanaAI/SynapseAI_Core#limitations
| (not to be confused with the closed-source SynapseAI)
| aseipp wrote:
| Well, that's good to hear at least! I knew there was some
| back and forth between the kernel maintainers recently
| due to all these accelerator drivers going in without any
| usermode support; Habana's case was kind of interesting
| because they got accepted into accel/ early by Greg, but
| they wouldn't have passed the merge criteria used later
| on for most others like Qualcomm.
|
| Frankly I kind of expected the whole result of that
| kerfuffle to just be that Habana would let the driver get
| deleted from upstream and go on their merry way shipping
| drivers to customers, but I'm happy to be proven wrong!
| aseipp wrote:
| Because machine learning accelerators are, in the broadest
| sense, not "done" and rapidly evolving every year. Exposing
| too many details of the underlying architecture is a prime
| way to ossify your design, making it impossible to change,
| and as a result you will fall behind. It is possible the
| Neural Engine of 2022 will look very different to the one of
| 2025, as far as the specifics of the design, opcode set, etc
| all go.
|
| One of the earliest lessons along this line was Itanium.
| Itanium exposing so much of the underlying architecture as a
| binary format and binary ABI made evolution of the design
| extremely difficult later on, even if you could have
| magically solved all the compiler problems back in 2000. Most
| machine learning accelerators are some combination of a VLIW
| and/or systolic array design. Most VLIW designers have
| learned that exposing the raw instruction pipeline to your
| users is a bad idea not because it's impossibly difficult to
| use (compilers do in fact keep getting better), but because
| it makes change impossible later on. This is also why we got
| rid of delay slots in scalar ISAs, by the way; yes they are
| annoying but they also expose too much of the implementation
| pipeline, which is the much bigger issue.
|
| Many machine learning companies take similar approaches where
| you can only use high-level frameworks like Tensorflow to
| interact with the accelerator. This isn't something from
| Apple's playbook, it's common sense once you begin to design
| these things. In the case of Other Corporations, there's also
| the benefit that it helps keep competitors away from their
| design secrets, but mostly it's for the same reason: exposing
| too much of the implementation details makes evolution and
| support extremely difficult.
|
| It sounds crass but my bet is that if Apple exposed the
| internal details of the ANE and later changed it (which they
| will, 100% it is not "done") the only "outcome" would be a
| bunch of rageposting on internet forums like this one.
| Something like: "DAE Apple mothershitting STUPID for breaking
| backwards compatibility? This choice has caused US TO SUFFER,
| all because of their BAD ENGINEERING! If I was responsible I
| would have already open sourced macOS and designed 10
| completely open source ML accelerators and named them all
| 'Linus "Freakin Epic" Torvalds #1-10' where you could program
| them directly with 1s and 0s and have backwards compatibility
| for 500 years, but people are SHEEP and so apple doesn't LET
| US!" This will be posted by a bunch of people who compiled
| "Hello world" for it one time six months ago and then are mad
| it doesn't "work" anymore on a computer they do not yet own.
|
| > Could it be that using the ANE in the wrong way overheats
| the M1?
|
| No.
| smoldesu wrote:
| Was it really necessary to expand the fourth paragraph
| post-script to get your point across? Before it was a
| fairly holistic look at the difference between people who
| want flexibility and people who want stability, where
| neither party was necessarily right. Now it just reads like
| you're mocking people for desiring transparency in their
| hardware, which... seems hard to demonize?
| aseipp wrote:
| There are other replies talking about Apple or whatever
| but I'll be honest: because 2 decades of online forum
| experience and FOSS development tells me that the final
| paragraph is exactly what happens anytime you change
| things like this and they are exposed to turbo-nerds,
| despite the fact they are often poorly educated and
| incredibly ill-informed about the topics at hand. You see
| it here in spades on HN. It doesn't have anything to do
| with Apple, either; plenty of FOSS maintainers could tell
| you similar horror stories. I mean it's literally just a
| paraphrase of an old XKCD.
|
| To be fair though, I mean. I'm mostly a bitchy nerd, too.
| And broadly speaking, taking the piss is just good fun
| sometimes. That's the truth, at least for me.
|
| If it helps, simply close your eyes and imagine a very
| amped up YouTuber saying what I wrote above. But they're
| doing it while doing weird camera transitions, slow-mo
| shots of panning up the side of some Mac Mini or
| whatever. They are standing at a desk with 4 computers
| that are open-mobo with no case, and 14 GPUs on a shelf
| behind them. Also the video is like 18 minutes long for
| some reason. It's pretty funny then, if you ask me.
| smoldesu wrote:
| For sure, I don't think I disagree with anything you've
| written here. Where I take umbrage is when there is no
| _choice_ involved though. Apple could very well provide
| both a high-level, stable library while _also_ exposing
| lower-level bindings that are expected to break
| constantly. If the low-level library is as bad and broken
| as people say it is, then they should have no problem
| marketing their high-level bindings as a solution. This
| is a mentality that frustrates me on many levels of their
| stack; their choice of graphics API and build systems
| being just a few other examples.
|
| Maybe this works for some people. I can't knock someone
| for an opinionated implementation of a complicated
| system. At the same time though, we can't be surprised
| when other people have differing opinions, and in a
| perfect society we wouldn't try to crucify people for
| making those opinions clear. Apple notoriously lacks a
| dialogue with their community about this stuff, which is
| what starts all of this pointless infighting in the first
| place. Apple does what Apple does, and nerds will fight
| over it until the heat death of the universe. There
| really is nothing new under the sun. Mocking the ongoing
| discussion is almost as phyrric as claiming victory for
| either side.
| ben174 wrote:
| Meh, it's okay to be grumpy sometimes. He got his point
| across and clearly knows what he's talking about. Let him
| be passionate :)
| gjsman-1000 wrote:
| He's not wrong - that's absolutely what YouTube and
| online Linux commentators would do. They have their own
| echo chamber, just as much as any tech community. Heck,
| considering your past posts, it's probably something
| _you_ would do.
|
| As for transparency in hardware, it probably will become
| more transparent once Apple feels that it is done and a
| finished science. They don't want to repeat Itanium.
| nebula8804 wrote:
| Absolutely. It provided a visualization reminder of so
| many people that come out of their holes to argue
| whenever there is some criticism of open source. Its one
| thing to desire freedom but the reality of the situation
| is that community is toxic for some reason and just not
| fun to even converse with.
| EricE wrote:
| I think it was absolutely appropriate because I have seen
| that cycle happen many, many times over the years.
|
| Especially when Apple is involved. Hell there are still
| people who see them as beleaguered and about to go out of
| business at any moment :p
| smoldesu wrote:
| I get where you're coming from. It's par for the course
| on Apple's behalf to push this stuff away in lieu of
| their own, high-level implementation, but I also think
| that behavior puts them at an impasse. People who want to
| use this hardware for arbitrary purposes are unable to do
| so. Apple is unwilling to do it because they want their
| hand on the "API valve" so to speak. In a case where
| absolutist rhetoric is being used on either side, I think
| this is pretty expected. If we're ultimately boiling this
| down to "having choices" vs "not having choices" though,
| I think it's perfectly reasonable to expect the most
| valuable company in the world to go the extra mile and
| offer both choices to their users and developers.
|
| Or not. It's their hardware, they just won't be selling
| any Macs to me with that mindset. The only thing that
| irks me is when people take the bullet for Apple like a
| multi-trillion dollar corporation needs more people
| justifying their lack of interoperability.
| irae wrote:
| All the sibling comments are better guesses, but I would also
| guess there could be security implications on exposing lower
| level access. Having it all proprietary and undocumented is
| itself a way of making it harder to exploit. Albeit, as
| mentioned, not having to settle ABI is way more likely the
| primary reason.
| kmeisthax wrote:
| Apple Silicon has IOMMUs on everything - you generally
| can't exploit a bug in a coprocessor to gain more access on
| the main application processor (or another coprocessor).
| The only hardware bugs with security implications we've
| found was stuff like M1RACLES, which is merely a covert
| channel (and it's discoverer doesn't even think it's a
| problem). Apple does a pretty good job of making sure even
| their private/internal stuff is secure.
| WithinReason wrote:
| A high level API needs _much_ less support effort.
| rickdeveloper wrote:
| He live streamed himself writing a lot of that:
|
| https://www.youtube.com/watch?v=mwmke957ki4
|
| https://www.youtube.com/watch?v=H6ZpMMDvB1M
|
| https://www.youtube.com/watch?v=JAyw7OAcXDE
|
| https://www.youtube.com/watch?v=Cb2KwcnDKrk
| pedro_hab wrote:
| As a developer I am bit ashamed of this question, but I gotta
| ask:
|
| What consumer apps actually use Neural Engines?
|
| I think something like Photoshop, maybe. But wouldn't it just
| train a model and use it as regular code?
|
| I'm interested in AI, but it's a joke to me about startups and
| jargon more often than not.
|
| I feels weird to add this to all chips when I can't see that much
| usage.
| sharkjacobs wrote:
| Anything "predictive" probably uses Neural Engine. These are
| iOS features, but a lot of them apply to MacOS too
|
| - Visual Lookup - Animoji - Face ID - recognizing "accidental"
| palm input while using Apple Pencil - monitoring users' usage
| habits to optimize device battery life and charging - app
| recommendations - Siri - curating photos into galleries,
| selecting "good" photos to show in the photos widget -
| identifying people's faces in photos - creating "good" photos
| with input from tiny camera lenses and sensors - portrait mode
| - language translation - on-device dictation - AR plane
| detection
|
| Core ML API allows third party developers to use Neural Engine
| to run models
| FinalBriefing wrote:
| But which consumer apps use it? I know of a handful of photo
| apps that use it to enhance photos, but I'm not aware of any
| other types of apps.
| kitsunesoba wrote:
| I think there may be more indirect usage than direct usage.
| Little bits of neural engine usage are peppered through the
| native APIs.
|
| Of course if you're using third party libraries that don't rely
| on macOS APIs this won't be happening in your app.
| miohtama wrote:
| Maybe the neural engine is ideal to scan your local image
| library to find kiddy porn:
|
| https://www.theverge.com/2021/12/15/22837631/apple-csam-dete...
| gfody wrote:
| could the M1 itself be using it for branch prediction?
| OberstKrueger wrote:
| Pixelmator Pro uses it for some of its ML functionality. Image
| scaling can use it, and it provides a cleaner image when
| upscaling, removing some compression artifacts and just
| smoothing it out more naturally. I've found it can work well
| downsizing too, although less of an effect. They also have an
| ML auto-crop tool and ML denoiser. All of these will hit the
| Neural Engine pretty good.
| noduerme wrote:
| This is a bizarre result but so.. what's the conclusion? That
| only a few things like Apple's proprietary image lookup are able
| to tap into the ANE so far? Or that it's actually just a
| marketing gimmick?
|
| Reading this makes me wonder if it's not just a placeholder for
| some kind of intrusive system that will neural-hash everything
| you own, but I'm sure I'm just being paranoid.
| bayindirh wrote:
| Tensorflow has a CoreML enabled version which run on ANE.
|
| https://github.com/apple/tensorflow_macos
| my123 wrote:
| AFAIK that doesn't run on ANE but on the GPU. ANE is used for
| inference only w/ CoreML.
| masklinn wrote:
| > Reading this makes me wonder if it's not just a placeholder
| for some kind of intrusive system that will neural-hash
| everything you own, but I'm sure I'm just being paranoid.
|
| It's also actively counter-productive: if they wanted to do
| this sort of tracking, they could just have done what all of
| their competitors do and send the data straight to their
| servers. This is hardware (and thus expenses) which is _only_
| necessary because of their stance on privacy, and avoiding off-
| device work.
| mikotodomo wrote:
| Apple has made huge compromises to try and give some privacy
| back to the consumer, who lost it all from paying for cheap
| products where the product is you. And people just ignore
| this progress because they want to be anti-Apple. It's sad.
| avianlyric wrote:
| I would guess that the ANE has some very specialised hardware
| (E.g. INT8 or FP16 only), and there's isn't a huge amount of
| it. So many nets either don't fit completely, or aren't using
| the right types of operation to match the ANE. Either can't run
| on the ANE, or only have a subset of layers that can run on the
| ANE.
|
| So when running a neural net iOS / macOS needs to make a
| decision about where to run each net. Even if a net has layers
| in it that are a perfect match for the ANE, there's still a
| trade off from having move the workload back and forth between
| a CPU core and the ANE (although the unified memory should
| eliminate a big chunk of this cost).
|
| It might be that in the general case it's not worth the latency
| hit from using mixed processors when running net that isn't
| 100% ANE compatible, or could just be that Apple haven't got
| round to implementing the logic needed to gracefully spilt
| workloads across the ANE and a CPU core. Which would make
| sense, because they've got the time and expertise to ensure all
| their nets fit within the ANE. Something that's difficult for
| 3rd party devs to do, because they don't have access to
| detailed ANE docs.
| [deleted]
| londons_explore wrote:
| > That only a few things like Apple's proprietary image lookup
| are able to tap into the ANE so far?
|
| That would seem like the logical conclusion. Perhaps there are
| hardware bugs/shortcomings that makes it very hard to use for
| the neural network API. Perhaps the software team is just
| behind and still building that.
| my123 wrote:
| Using CoreML has some catches, like having to use FP16
| instead of integer formats notably.
| noduerme wrote:
| My initial understanding was that Adobe was leveraging it in
| their experimental "AI" Photoshop plugins. Although a
| majority of those seem to require a live internet connection
| to work. Some of the newer core Photoshop functionality for
| intelligent selection is ridiculously fast on an M1 Max,
| though, which makes me think it's probably using the neural
| chips.
| sharikous wrote:
| It might be, but there is also another matrix
| multiplication accelerator that could be responsible
| my123 wrote:
| The ANE is accessible via the CoreML framework, it's a high
| level interface for ML inference.
|
| It however turns out that a _lot_ of customer apps today don't
| use those accelerators at all.
|
| (and about the attempt at using BNNS functions, that's not
| offloaded, it runs on the host CPU cores w/ the AMX tightly
| bound accelerator)
| galangalalgol wrote:
| Would it be possible for someone ro write a onnx runtime
| utilizing coreml? That would open it up to a lot more
| applications instantly.
| ianai wrote:
| ONNX lists CoreML support.
| sharikous wrote:
| I have the same anxiety but I don't think it is still there.
|
| The Asahi project is putting a deliberate low priority on the
| ANE but I have seen some other small reverse engineering
| attempts.
|
| I think some use of the ANE outside of Apple APIs will be
| possible soon.
| modeless wrote:
| Isn't BNNS documented to run on the CPU? Why would you expect it
| to use the neural engine? Apple also has Metal Performance
| Shaders which of course run on the GPU only. The user accessible
| API for the neural engine is Core ML. Very high level
| unfortunately.
|
| Hmm, it seems like there's also a new API that can use the neural
| engine sometimes, "ML Compute". But only for inference?
| https://developer.apple.com/documentation/mlcompute/mlcdevic...
| endorphine wrote:
| For the totally uninitiated, what's a neural engine in general?
| What are they used for, and why Apple added this to their
| products?
| justusthane wrote:
| His previous blog post has more general information about the
| neural engine: https://eclecticlight.co/2022/03/29/live-text-
| visual-look-up...
|
| Basically it's used by anything involving machine learning:
|
| - Speech recognition
|
| - Face recognition
|
| - Visual lookup (image recognition)
|
| - Live Text (OCR)
|
| It allows all these functions to be performed efficiently on-
| device rather than shipping data off to the cloud.
| jlouis wrote:
| It's an area of the chip suited for operations on low
| precision/range floating point numbers. Neural networks don't
| generally require high precision in floating point
| computations, but require a lot of them. This means your data
| paths can be smaller (16 bit wide rather than 32 bit wide); the
| consequence of which is you can do far more computation per mm2
| die space.
|
| The second part of the chain is that you also tailor the
| operations the die area support to those operations necessary
| in a typical neural network, further optimizing the chip.
|
| The end result is very power-efficient execution of neural
| networks, which allows you to beat a GPU or CPUs power curve,
| improving thermals of the core, and in the case of mobile
| devices, optimizes battery usage.
| acdha wrote:
| It's probably also worth noting that the last part is fairly
| important to Apple since they have based a lot of their
| privacy stance around on-device processing of things like
| Siri commands, photo/video analysis, image recognition
| (VoiceOver can attempt to automatically describe images for
| blind people), speech to text dictation, the various
| audio/video enhancements like Center Stage or the voice
| emphasis features, etc. and all of that means they're running
| a lot of networks on battery.
|
| Those efficiency wins are almost certainly worth it even if
| third-party developers don't use it much.
___________________________________________________________________
(page generated 2022-03-30 23:01 UTC)