[HN Gopher] Apple Core AI Framework
       ___________________________________________________________________
        
       Apple Core AI Framework
        
       Author : hmokiguess
       Score  : 248 points
       Date   : 2026-06-08 18:47 UTC (11 hours ago)
        
 (HTM) web link (developer.apple.com)
 (TXT) w3m dump (developer.apple.com)
        
       | bensyverson wrote:
       | Wow, this seems to be a new way to convert PyTorch models to a
       | format that runs across CPU, GPU & Apple's Neural Engine (ANE).
       | [0]
       | 
       | Does this completely replace the previous API, CoreML? [1]
       | [0]: https://apple.github.io/coreai-optimization/       [1]:
       | https://developer.apple.com/documentation/coreml/
        
         | earthnail wrote:
         | Yes. From the CoreAI docs:
         | 
         | "If your app uses model types other than neural networks, such
         | as decision trees or tabular feature engineering, see Core ML."
        
         | pzo wrote:
         | seems they planning to replace it but overall now I'm really
         | confused about this and mlx and coremltools. They should do
         | better work explaining the benefits (and cons) of it and any
         | feature parity between coreai, coreml and mlx.
        
           | LoganDark wrote:
           | My reading of it is:
           | 
           | - Core ML is for models designed only for Apple platforms
           | 
           | - MLX is for models that don't need to be fast
           | 
           | - Core AI is for models that run everywhere already and also
           | need to be fast
        
         | wahnfrieden wrote:
         | Requires OS 27+, so CoreML is still useful for backwards
         | compatibility.
        
         | trollbridge wrote:
         | This is just a bit exciting, although I wonder how the
         | performance of this will stack up next to the stuff we already
         | do with, e.g., a metal-optimised model which we then load into
         | llama-cpp or whatever. (unsloth is a good example of doing this
         | for you "batteries included").
        
           | ElFitz wrote:
           | A few months back someone reverse-engineered private ANE APIs
           | and shown some significant performance improvements compared
           | to CoreML and Metal, on both inference and training.
           | 
           | - https://maderix.substack.com/p/inside-the-m4-apple-neural-
           | en...
           | 
           | - https://news.ycombinator.com/item?id=47257931
        
       | MysticOracle wrote:
       | WWDC 2026 Core AI videos
       | 
       | Meet Core AI -
       | https://developer.apple.com/videos/play/wwdc2026/324/
       | 
       | Dive into Core AI model authoring and optimization -
       | https://developer.apple.com/videos/play/wwdc2026/325/
       | 
       | Integrate on-device AI models into your app using Core AI -
       | https://developer.apple.com/videos/play/wwdc2026/326/
        
       | franze wrote:
       | i am more excited about the ondevice foundation model update that
       | is coming
       | https://developer.apple.com/documentation/updates/foundation...
       | (not much info yet)
       | 
       | but i maintain https://github.com/Arthur-Ficial/apfel so i might
       | be biased
        
         | crancher wrote:
         | Apfel is very useful, thanks for the effort.
        
           | cat5e wrote:
           | I second this, I'm more excited about dumb local models than
           | something I could never run locally.
        
         | trollbridge wrote:
         | Thanks for building this! Something I grab on a regular basis,
         | especially for doing simple education of folks about the basics
         | of using LLMs by showing something that's not just a chatbot.
        
         | robgough wrote:
         | Have you seen that they've added an `fm` tool? It was mentioned
         | in the Platforms State of the Union.
         | 
         | Here's what you get when you run it...
         | https://gist.github.com/robgough/7893602895e7580117475076198...
        
           | ElFitz wrote:
           | Oh, neat. Totally missed it, thanks!
        
         | mips_avatar wrote:
         | Seems like they still won't let you run models on GPU while the
         | phone is closed or the user switches apps
        
           | tyre wrote:
           | This is good. Apps would not be respectful and end up
           | draining users' batteries to zero in no time.
        
       | an0malous wrote:
       | This is why the AI companies are rushing to IPO. By the end of
       | next year you'll be running most of your AI on device. They have
       | no moat, they've reached the limits of scaling, most of the magic
       | can be distilled into smaller models, and they know it
        
         | sealeck wrote:
         | Have we reached the limits of scaling? Sadly it appears that
         | larger model still equals better model
        
           | stogot wrote:
           | It's still diminishing returns yes? It isn't Moore's Law
        
           | pixelready wrote:
           | I think there's still an open question around are the ultra-
           | large next-gen models worth it? For those of us without early
           | access to Mythos, it's hard to verify whether it's been held
           | back from the public due to actually being "too dangerously
           | powerful to release yet" as implied or because the gains
           | aren't outpacing the costs.
        
           | mikestorrent wrote:
           | Well, let's not forget that text models are not the only
           | models! Video models are much slower and need comparatively
           | more resources, and all they can do even at that size is
           | generate videos a few seconds long. Clearly a ton more work
           | is going to go into those, and demand for them will probably
           | increase as more creative tools get authored using them as a
           | central part of the workflow. Low-res local rendering for
           | preview might be a thing, but the lion's share of the work
           | for high-res, near-realtime rendering is going to be done on
           | huge clusters for a long time yet.
        
           | mindwok wrote:
           | I think GPT 4.5 showed that there is indeed a practical limit
           | we're close too. That was supposedly a high-trillions of
           | parameter model that was deprecated almost immediately
           | because it was slow, insanely expensive, and had questionable
           | benefits over the smaller models. Though apparently the new
           | Mythos and whatever GPT Spud is (if it wasn't 5.5) are back
           | up in the high trillions.
        
             | XenophileJKO wrote:
             | Actually having used it a bit, I'm quite excited to see a
             | modern model of similar size.
             | 
             | I think what people didn't realize was, just because the
             | GPT-4.5 model didn't get better on the benchmarks, didn't
             | mean the model wasn't different than the earlier models. It
             | was being compared to thinking models that were being
             | developed at the same time.
             | 
             | The GPT 4.5 model still has some of the most "human" like
             | abilities in communication even though it isn't
             | particularly good a problem solving. It hadn't under gone
             | the same type of reinforcement training.
             | 
             | I still use GPT 4.5 sometimes, in creative exercises it can
             | be surprisingly effective. The model is still available.
        
           | adgjlsfhk1 wrote:
           | yes and no. We've reached the point where larger models are
           | higher quality, but they're also too expensive and slow to be
           | used broadly. The giant models, however are still useful for
           | training smaller models that are actually deployable.
        
         | hadlock wrote:
         | Qwen's ~30B-class models are genuinely good enough for use if
         | you can find a machine with enough memory bandwidth to run them
         | at 30-90 tokens/second. It's been extremely telling that Qwen
         | stopped releasing 120b class models. At some point in the next
         | 10 years (maybe 3?) someone is going to release an Opus 4.5
         | class 256B model you can run locally. Right now our engineers
         | use about $800/mo worth of opus tokens; at that rate the ROI
         | for local LLM is ~10 months
        
           | strictnein wrote:
           | Didn't Qwen stop releasing their more powerful models because
           | they're commercializing them?
        
             | mswphd wrote:
             | Yes and no.
             | 
             | Qwen 3.5 was released 3/2/2026. It includes models up to a
             | 397B-A17B model
             | 
             | https://huggingface.co/collections/Qwen/qwen35
             | 
             | A day afterwards, a high-up technical leader working on
             | Qwen was let go
             | 
             | https://techcrunch.com/2026/03/03/alibabas-qwen-tech-lead-
             | st...
             | 
             | The more recent Qwen 3.6 was released on 4/16
             | 
             | https://huggingface.co/collections/Qwen/qwen36
             | 
             | This does not include any particularly large models. But
             | the models it contains (Qwen3.6 27B and Qwen3.6 35B-A3B)
             | are the local models people have been very excited about
             | lately. So they didn't release any larger models, and the
             | models people praise so much are from this most recent
             | release.
        
               | tyre wrote:
               | If they stop releasing their larger models because they
               | want to monetize, would we expect them to release better
               | small models that can outcompete those?
        
           | horsawlarway wrote:
           | I want to echo this.
           | 
           | I've been on claude's opus 4.5/6/7 for work for a couple
           | months, and I finally got back to running Qwen A3B 35B...
           | it's incredibly performant and quite capable on semi-
           | reasonable local hardware.
           | 
           | I get ~150 tokens/s on dual nvidia RTX 3090s and can fit the
           | whole 300k context into gpu on a UD-Q4-K-XL quant gguf.
           | 
           | Combined with Pi as a harness, and I'm surprised to find that
           | it feels about as capable as claude did 8 months ago (their
           | 3.x models).
           | 
           | It's not Opus 4.5 levels yet, but it's good enough for a LOT
           | of basic work. I actually downgraded my personal anthropic
           | subscription because Qwen is absolutely fine for
           | implementation work. I still let a better model write a plan,
           | but then I can just switch over to Qwen to implement.
           | 
           | I don't think we're 10 years away from opus 4.5 levels
           | running on cheap consumer hardware. I think we're probably
           | closer to 18 months away, and I suspect it'll be in the
           | 30-60b range, not the 256b range.
           | 
           | PC manufacturers also seem to be betting on local, with a LOT
           | of focus on 64 to 128gb unified RAM machines.
        
             | maxdo wrote:
             | Majority of my agentic setup is pi / Claude code where
             | every single Chinese models are not as good except
             | commercial 1T models .
             | 
             | Local is a pipe dream . If you can run it cheap
             | occasionally why commercial companies can't run it cheaper
             | 24/7 and lower the costs ? The answer is simple. Use cases
             | are more demanding and hence you need more from model not
             | less .
             | 
             | Sure if you task is to do a narrow labeling task on 1m
             | records small optimized model is good . If you want to do
             | complex things , it shifts with models advancements
        
               | hparadiz wrote:
               | This sounds like something someone at IBM in 1986 would
               | say trying to sell their mainframes. "PCs will never be a
               | thing. No one's gonna want a computer."
               | 
               | I'm seeing some impressive results from folks that can
               | afford 10k+ GPUs right now. But those GPUs will all be
               | hand me downs in 10 years. So pipe dream? Hmmm......
               | that's not how this industry works.
        
               | tyre wrote:
               | Those are not GPUs available on iPhones. Will we get
               | there eventually? Maybe! Maybe we end up with GPU
               | clusters built on the edge (e.g. cell towers) for
               | offloading, maybe it's never economical, maybe a
               | different model architecture makes it simpler, who knows.
               | 
               | But it doesn't seem anywhere imminent with our current
               | world state.
        
               | hparadiz wrote:
               | My computer is 15,000 times faster and costs in inflation
               | adjusted dollars half that of my computer in 1995.
               | There's zero reason to think that won't happen over the
               | next 30 years again.
               | 
               | For whatever reason every generations thinks they are the
               | peak. Naw man. You're just a blip at the bottom of the
               | logarithmic chart.
        
         | cat5e wrote:
         | Huzzah, they've lost their stranglehold. Viva la revolution!
        
         | ActorNightly wrote:
         | Very false.
         | 
         | I use small models exclusively. They aren't a replacement for
         | large models. You need decent hardware to run those models
         | efficiently, as smaller parameter models plain suck and are
         | still slow on macbooks. And affordability of higher end
         | hardware is very limited.
         | 
         | Even at non VC subsidized $/token prices, its still much
         | cheaper to run cloud based models.
        
           | davnicwil wrote:
           | well to be fair that's right now, I think the question is
           | what about in 6 months, 12 months, 2 years?
           | 
           | Where do these improvement curves go? Does the gap close, do
           | they intersect for practical purposes (factoring in cost
           | etc)? Or is the local curve always just a translation of the
           | hosted, lagging behind, or indeed does hosted just pull
           | ahead?
           | 
           | Nobody knows, but it's a very open question I feel, and it
           | certainly appears like the answer might quite reasonably be
           | that yes they intersect on that kind of short-ish term time
           | horizon.
        
             | ActorNightly wrote:
             | >Where do these improvement curves go?
             | 
             | Nowhere.
             | 
             | Large models haven't seen that much improvement, just small
             | unique tasks performance which is all special cased RLed to
             | game metrics
             | 
             | For local models, its the same story. You can download
             | Gemma 3 QAT from last year, and it will be just as good as
             | Gemma:31b on the average. Qwen also boasts that its better,
             | because again, they RLed it to game some metrics. Its
             | better in coding then Gemma, but Gemma is better in more
             | creative thinking (again, all RL)
             | 
             | Fundamentally, you need detail in the gradients for the
             | models to pick up on the smaller details. If you don't have
             | those, your output is gonna suck. No amount of clever
             | architecture is going to fix this.
             | 
             | The only way to improve local models by training them to
             | fetch context, and then their job becomes much simpler
             | because all they need to do is reinterpret the fetched
             | content and provide an answer. But fundamentally, if you
             | are trying to keep things in house for advertising purposes
             | like what all companies do with search, you want them to go
             | to your service, which means running on your servers. And
             | its not really that much extra per invocation (i.e
             | excluding initial hardware costs) to instead just offer a
             | large model as a service, which will be way better than any
             | small models.
        
           | dvt wrote:
           | > Even at non VC subsidized $/token prices, its still much
           | cheaper to run cloud based models.
           | 
           | On a price-per-wattage level, this is not true, people have
           | done the math on /r/LocalLLaMA many times over[1]. Local
           | models, while not as good as premier models (GPT 5.5, etc.),
           | are like ~80%+ of the way there, and often converge to a
           | similar solution after a few dead ends.
           | 
           | [1] https://www.reddit.com/r/LocalLLM/comments/1kshq4f/electr
           | ici...
        
             | fwip wrote:
             | Maybe not per watt, but unless you already happen to own a
             | 3900 cited by that post, you'd have to buy that as well,
             | which is currently selling for around $1400 used.
        
               | dvt wrote:
               | I do have a 3090 Ti on my gaming PC, but even my old M1
               | MBP (with a mere 32gb of RAM) is quite competent and can
               | run a quantized `Gemma4-26B-A4B` in the background while
               | I do other stuff.
        
               | strictnein wrote:
               | 3090s are running $1400 now? Wowsers. I thought I was
               | overspending when I bought 6x of them for around $800 a
               | pop.
               | 
               | Might be time to sell, to be honest. It's fun to have
               | that at home, but I can't justify having $10k (with
               | memory, mobo, cpu, etc) sitting in my basement without
               | being fully utilized.
        
               | karim79 wrote:
               | I'll take two of them. A thousand a piece.
        
         | viccis wrote:
         | I just want a tiny tiny model that runs on device that knows
         | for autocomplete that, for example, I want to say "I'll be
         | right back" instead of "I'll be right Brian". That's my #1 AI
         | ask right now. Please, Apple.
        
           | cush wrote:
           | I want Siri to let me "add to my calendar, dinner Peter's
           | house Sunday at 5pm" and not assume the location is the
           | restaurant called Peter's House in another state. It's
           | astounding how poor Siri is at using the data I've given it
           | access to
        
         | wyager wrote:
         | > By the end of next year you'll be running most of your AI on
         | device.
         | 
         | I expect I'll probably keep paying for whatever badass high IQ
         | model is running on inference servers at that point
        
         | maxdo wrote:
         | Why on earth I should switch from a top tier model to much
         | worse local model ? Why do I need to suffer my battery ?
        
           | truncate wrote:
           | You can switch to local models for tasks/use-cases where you
           | don't need top tier models.
        
       | criddell wrote:
       | Is there something like this on Linux? For example, if I'm an
       | application developer can I assume GNU Core AI (or whatever it is
       | or would be called) will be there if the kernel is >= some
       | particular version?
        
         | wtallis wrote:
         | On non-Apple platforms, you generally have at least 2+(number
         | of supported silicon vendors) different AI frameworks to worry
         | about. I guess Apple's there now too, between Core ML, MLX,
         | Core AI.
         | 
         | I haven't seen any sign that the framework fragmentation
         | problem is going away anytime soon. NVIDIA wants everyone to do
         | all training and inference with CUDA and to deny that NPUs have
         | any usefulness. Everybody making an NPU has a different
         | framework tailored to their architecture and the limitations
         | they inherited from hardware designed before LLMs existed, and
         | most of them have a _another_ framework for targeting a GPU.
         | And the OS vendor has one or two frameworks they would prefer
         | you use rather than something hardware-specific.
        
         | nl wrote:
         | For practical purposes llama.cpp is this. You can link to it or
         | use the network API.
        
         | halJordan wrote:
         | No there isn't. RedHat and IBM do though, for their distros
        
         | teravor wrote:
         | onnxruntime, llama.cpp (more specifically, ggml), iree.dev is
         | also trying
        
       | dvt wrote:
       | AI future is clearly local, and my recent pitch has been
       | "infinite tokens." Because that's what my M1 MBP can do; and
       | that's what my RTX3090 can do. I don't need to pay hundreds of
       | dollars a month and no one else does either.
        
         | ip26 wrote:
         | Infinite tokens rate-limited to 10 tok/s is 26MTok per month.
        
           | doctorpangloss wrote:
           | 10? think closer to 5. 13M is like ~7 codex sessions...
        
       | connectsnk wrote:
       | Do we know what is the underlying model? Is it a custome model
       | developed by Apple or one of gemma/deepseeks under the hood
        
         | jacobr1 wrote:
         | The new siri models will be some variant of the gemini models.
         | This framework seems to be more generalized than that though.
        
       | ankit219 wrote:
       | they are also working on activations (w4a8, w4a16 from what i
       | know). if they deliver (and a big if), it means that given their
       | market reach, they can dictate the way sub 100b parameter models
       | are trained and served to a large extent, given their major
       | usecase would be on device (macos and not ios for most of them).
        
       | scosman wrote:
       | Free server-size model access for apps with <2M downloads,
       | getting the same privacy guarantees. Hopefully they scale this up
       | to all apps in time (I assume hardware/cost constrained, but
       | larger devs would pay).
       | 
       | https://developer.apple.com/private-cloud-compute/
        
       | JV00 wrote:
       | Does it mean I can run whatever I want on ANE? Last time I tried
       | it seemed it could only be used by first party features such as
       | Face ID
        
       ___________________________________________________________________
       (page generated 2026-06-09 06:00 UTC)