[HN Gopher] Apple Core AI Framework
___________________________________________________________________
Apple Core AI Framework
Author : hmokiguess
Score : 248 points
Date : 2026-06-08 18:47 UTC (11 hours ago)
(HTM) web link (developer.apple.com)
(TXT) w3m dump (developer.apple.com)
| bensyverson wrote:
| Wow, this seems to be a new way to convert PyTorch models to a
| format that runs across CPU, GPU & Apple's Neural Engine (ANE).
| [0]
|
| Does this completely replace the previous API, CoreML? [1]
| [0]: https://apple.github.io/coreai-optimization/ [1]:
| https://developer.apple.com/documentation/coreml/
| earthnail wrote:
| Yes. From the CoreAI docs:
|
| "If your app uses model types other than neural networks, such
| as decision trees or tabular feature engineering, see Core ML."
| pzo wrote:
| seems they planning to replace it but overall now I'm really
| confused about this and mlx and coremltools. They should do
| better work explaining the benefits (and cons) of it and any
| feature parity between coreai, coreml and mlx.
| LoganDark wrote:
| My reading of it is:
|
| - Core ML is for models designed only for Apple platforms
|
| - MLX is for models that don't need to be fast
|
| - Core AI is for models that run everywhere already and also
| need to be fast
| wahnfrieden wrote:
| Requires OS 27+, so CoreML is still useful for backwards
| compatibility.
| trollbridge wrote:
| This is just a bit exciting, although I wonder how the
| performance of this will stack up next to the stuff we already
| do with, e.g., a metal-optimised model which we then load into
| llama-cpp or whatever. (unsloth is a good example of doing this
| for you "batteries included").
| ElFitz wrote:
| A few months back someone reverse-engineered private ANE APIs
| and shown some significant performance improvements compared
| to CoreML and Metal, on both inference and training.
|
| - https://maderix.substack.com/p/inside-the-m4-apple-neural-
| en...
|
| - https://news.ycombinator.com/item?id=47257931
| MysticOracle wrote:
| WWDC 2026 Core AI videos
|
| Meet Core AI -
| https://developer.apple.com/videos/play/wwdc2026/324/
|
| Dive into Core AI model authoring and optimization -
| https://developer.apple.com/videos/play/wwdc2026/325/
|
| Integrate on-device AI models into your app using Core AI -
| https://developer.apple.com/videos/play/wwdc2026/326/
| franze wrote:
| i am more excited about the ondevice foundation model update that
| is coming
| https://developer.apple.com/documentation/updates/foundation...
| (not much info yet)
|
| but i maintain https://github.com/Arthur-Ficial/apfel so i might
| be biased
| crancher wrote:
| Apfel is very useful, thanks for the effort.
| cat5e wrote:
| I second this, I'm more excited about dumb local models than
| something I could never run locally.
| trollbridge wrote:
| Thanks for building this! Something I grab on a regular basis,
| especially for doing simple education of folks about the basics
| of using LLMs by showing something that's not just a chatbot.
| robgough wrote:
| Have you seen that they've added an `fm` tool? It was mentioned
| in the Platforms State of the Union.
|
| Here's what you get when you run it...
| https://gist.github.com/robgough/7893602895e7580117475076198...
| ElFitz wrote:
| Oh, neat. Totally missed it, thanks!
| mips_avatar wrote:
| Seems like they still won't let you run models on GPU while the
| phone is closed or the user switches apps
| tyre wrote:
| This is good. Apps would not be respectful and end up
| draining users' batteries to zero in no time.
| an0malous wrote:
| This is why the AI companies are rushing to IPO. By the end of
| next year you'll be running most of your AI on device. They have
| no moat, they've reached the limits of scaling, most of the magic
| can be distilled into smaller models, and they know it
| sealeck wrote:
| Have we reached the limits of scaling? Sadly it appears that
| larger model still equals better model
| stogot wrote:
| It's still diminishing returns yes? It isn't Moore's Law
| pixelready wrote:
| I think there's still an open question around are the ultra-
| large next-gen models worth it? For those of us without early
| access to Mythos, it's hard to verify whether it's been held
| back from the public due to actually being "too dangerously
| powerful to release yet" as implied or because the gains
| aren't outpacing the costs.
| mikestorrent wrote:
| Well, let's not forget that text models are not the only
| models! Video models are much slower and need comparatively
| more resources, and all they can do even at that size is
| generate videos a few seconds long. Clearly a ton more work
| is going to go into those, and demand for them will probably
| increase as more creative tools get authored using them as a
| central part of the workflow. Low-res local rendering for
| preview might be a thing, but the lion's share of the work
| for high-res, near-realtime rendering is going to be done on
| huge clusters for a long time yet.
| mindwok wrote:
| I think GPT 4.5 showed that there is indeed a practical limit
| we're close too. That was supposedly a high-trillions of
| parameter model that was deprecated almost immediately
| because it was slow, insanely expensive, and had questionable
| benefits over the smaller models. Though apparently the new
| Mythos and whatever GPT Spud is (if it wasn't 5.5) are back
| up in the high trillions.
| XenophileJKO wrote:
| Actually having used it a bit, I'm quite excited to see a
| modern model of similar size.
|
| I think what people didn't realize was, just because the
| GPT-4.5 model didn't get better on the benchmarks, didn't
| mean the model wasn't different than the earlier models. It
| was being compared to thinking models that were being
| developed at the same time.
|
| The GPT 4.5 model still has some of the most "human" like
| abilities in communication even though it isn't
| particularly good a problem solving. It hadn't under gone
| the same type of reinforcement training.
|
| I still use GPT 4.5 sometimes, in creative exercises it can
| be surprisingly effective. The model is still available.
| adgjlsfhk1 wrote:
| yes and no. We've reached the point where larger models are
| higher quality, but they're also too expensive and slow to be
| used broadly. The giant models, however are still useful for
| training smaller models that are actually deployable.
| hadlock wrote:
| Qwen's ~30B-class models are genuinely good enough for use if
| you can find a machine with enough memory bandwidth to run them
| at 30-90 tokens/second. It's been extremely telling that Qwen
| stopped releasing 120b class models. At some point in the next
| 10 years (maybe 3?) someone is going to release an Opus 4.5
| class 256B model you can run locally. Right now our engineers
| use about $800/mo worth of opus tokens; at that rate the ROI
| for local LLM is ~10 months
| strictnein wrote:
| Didn't Qwen stop releasing their more powerful models because
| they're commercializing them?
| mswphd wrote:
| Yes and no.
|
| Qwen 3.5 was released 3/2/2026. It includes models up to a
| 397B-A17B model
|
| https://huggingface.co/collections/Qwen/qwen35
|
| A day afterwards, a high-up technical leader working on
| Qwen was let go
|
| https://techcrunch.com/2026/03/03/alibabas-qwen-tech-lead-
| st...
|
| The more recent Qwen 3.6 was released on 4/16
|
| https://huggingface.co/collections/Qwen/qwen36
|
| This does not include any particularly large models. But
| the models it contains (Qwen3.6 27B and Qwen3.6 35B-A3B)
| are the local models people have been very excited about
| lately. So they didn't release any larger models, and the
| models people praise so much are from this most recent
| release.
| tyre wrote:
| If they stop releasing their larger models because they
| want to monetize, would we expect them to release better
| small models that can outcompete those?
| horsawlarway wrote:
| I want to echo this.
|
| I've been on claude's opus 4.5/6/7 for work for a couple
| months, and I finally got back to running Qwen A3B 35B...
| it's incredibly performant and quite capable on semi-
| reasonable local hardware.
|
| I get ~150 tokens/s on dual nvidia RTX 3090s and can fit the
| whole 300k context into gpu on a UD-Q4-K-XL quant gguf.
|
| Combined with Pi as a harness, and I'm surprised to find that
| it feels about as capable as claude did 8 months ago (their
| 3.x models).
|
| It's not Opus 4.5 levels yet, but it's good enough for a LOT
| of basic work. I actually downgraded my personal anthropic
| subscription because Qwen is absolutely fine for
| implementation work. I still let a better model write a plan,
| but then I can just switch over to Qwen to implement.
|
| I don't think we're 10 years away from opus 4.5 levels
| running on cheap consumer hardware. I think we're probably
| closer to 18 months away, and I suspect it'll be in the
| 30-60b range, not the 256b range.
|
| PC manufacturers also seem to be betting on local, with a LOT
| of focus on 64 to 128gb unified RAM machines.
| maxdo wrote:
| Majority of my agentic setup is pi / Claude code where
| every single Chinese models are not as good except
| commercial 1T models .
|
| Local is a pipe dream . If you can run it cheap
| occasionally why commercial companies can't run it cheaper
| 24/7 and lower the costs ? The answer is simple. Use cases
| are more demanding and hence you need more from model not
| less .
|
| Sure if you task is to do a narrow labeling task on 1m
| records small optimized model is good . If you want to do
| complex things , it shifts with models advancements
| hparadiz wrote:
| This sounds like something someone at IBM in 1986 would
| say trying to sell their mainframes. "PCs will never be a
| thing. No one's gonna want a computer."
|
| I'm seeing some impressive results from folks that can
| afford 10k+ GPUs right now. But those GPUs will all be
| hand me downs in 10 years. So pipe dream? Hmmm......
| that's not how this industry works.
| tyre wrote:
| Those are not GPUs available on iPhones. Will we get
| there eventually? Maybe! Maybe we end up with GPU
| clusters built on the edge (e.g. cell towers) for
| offloading, maybe it's never economical, maybe a
| different model architecture makes it simpler, who knows.
|
| But it doesn't seem anywhere imminent with our current
| world state.
| hparadiz wrote:
| My computer is 15,000 times faster and costs in inflation
| adjusted dollars half that of my computer in 1995.
| There's zero reason to think that won't happen over the
| next 30 years again.
|
| For whatever reason every generations thinks they are the
| peak. Naw man. You're just a blip at the bottom of the
| logarithmic chart.
| cat5e wrote:
| Huzzah, they've lost their stranglehold. Viva la revolution!
| ActorNightly wrote:
| Very false.
|
| I use small models exclusively. They aren't a replacement for
| large models. You need decent hardware to run those models
| efficiently, as smaller parameter models plain suck and are
| still slow on macbooks. And affordability of higher end
| hardware is very limited.
|
| Even at non VC subsidized $/token prices, its still much
| cheaper to run cloud based models.
| davnicwil wrote:
| well to be fair that's right now, I think the question is
| what about in 6 months, 12 months, 2 years?
|
| Where do these improvement curves go? Does the gap close, do
| they intersect for practical purposes (factoring in cost
| etc)? Or is the local curve always just a translation of the
| hosted, lagging behind, or indeed does hosted just pull
| ahead?
|
| Nobody knows, but it's a very open question I feel, and it
| certainly appears like the answer might quite reasonably be
| that yes they intersect on that kind of short-ish term time
| horizon.
| ActorNightly wrote:
| >Where do these improvement curves go?
|
| Nowhere.
|
| Large models haven't seen that much improvement, just small
| unique tasks performance which is all special cased RLed to
| game metrics
|
| For local models, its the same story. You can download
| Gemma 3 QAT from last year, and it will be just as good as
| Gemma:31b on the average. Qwen also boasts that its better,
| because again, they RLed it to game some metrics. Its
| better in coding then Gemma, but Gemma is better in more
| creative thinking (again, all RL)
|
| Fundamentally, you need detail in the gradients for the
| models to pick up on the smaller details. If you don't have
| those, your output is gonna suck. No amount of clever
| architecture is going to fix this.
|
| The only way to improve local models by training them to
| fetch context, and then their job becomes much simpler
| because all they need to do is reinterpret the fetched
| content and provide an answer. But fundamentally, if you
| are trying to keep things in house for advertising purposes
| like what all companies do with search, you want them to go
| to your service, which means running on your servers. And
| its not really that much extra per invocation (i.e
| excluding initial hardware costs) to instead just offer a
| large model as a service, which will be way better than any
| small models.
| dvt wrote:
| > Even at non VC subsidized $/token prices, its still much
| cheaper to run cloud based models.
|
| On a price-per-wattage level, this is not true, people have
| done the math on /r/LocalLLaMA many times over[1]. Local
| models, while not as good as premier models (GPT 5.5, etc.),
| are like ~80%+ of the way there, and often converge to a
| similar solution after a few dead ends.
|
| [1] https://www.reddit.com/r/LocalLLM/comments/1kshq4f/electr
| ici...
| fwip wrote:
| Maybe not per watt, but unless you already happen to own a
| 3900 cited by that post, you'd have to buy that as well,
| which is currently selling for around $1400 used.
| dvt wrote:
| I do have a 3090 Ti on my gaming PC, but even my old M1
| MBP (with a mere 32gb of RAM) is quite competent and can
| run a quantized `Gemma4-26B-A4B` in the background while
| I do other stuff.
| strictnein wrote:
| 3090s are running $1400 now? Wowsers. I thought I was
| overspending when I bought 6x of them for around $800 a
| pop.
|
| Might be time to sell, to be honest. It's fun to have
| that at home, but I can't justify having $10k (with
| memory, mobo, cpu, etc) sitting in my basement without
| being fully utilized.
| karim79 wrote:
| I'll take two of them. A thousand a piece.
| viccis wrote:
| I just want a tiny tiny model that runs on device that knows
| for autocomplete that, for example, I want to say "I'll be
| right back" instead of "I'll be right Brian". That's my #1 AI
| ask right now. Please, Apple.
| cush wrote:
| I want Siri to let me "add to my calendar, dinner Peter's
| house Sunday at 5pm" and not assume the location is the
| restaurant called Peter's House in another state. It's
| astounding how poor Siri is at using the data I've given it
| access to
| wyager wrote:
| > By the end of next year you'll be running most of your AI on
| device.
|
| I expect I'll probably keep paying for whatever badass high IQ
| model is running on inference servers at that point
| maxdo wrote:
| Why on earth I should switch from a top tier model to much
| worse local model ? Why do I need to suffer my battery ?
| truncate wrote:
| You can switch to local models for tasks/use-cases where you
| don't need top tier models.
| criddell wrote:
| Is there something like this on Linux? For example, if I'm an
| application developer can I assume GNU Core AI (or whatever it is
| or would be called) will be there if the kernel is >= some
| particular version?
| wtallis wrote:
| On non-Apple platforms, you generally have at least 2+(number
| of supported silicon vendors) different AI frameworks to worry
| about. I guess Apple's there now too, between Core ML, MLX,
| Core AI.
|
| I haven't seen any sign that the framework fragmentation
| problem is going away anytime soon. NVIDIA wants everyone to do
| all training and inference with CUDA and to deny that NPUs have
| any usefulness. Everybody making an NPU has a different
| framework tailored to their architecture and the limitations
| they inherited from hardware designed before LLMs existed, and
| most of them have a _another_ framework for targeting a GPU.
| And the OS vendor has one or two frameworks they would prefer
| you use rather than something hardware-specific.
| nl wrote:
| For practical purposes llama.cpp is this. You can link to it or
| use the network API.
| halJordan wrote:
| No there isn't. RedHat and IBM do though, for their distros
| teravor wrote:
| onnxruntime, llama.cpp (more specifically, ggml), iree.dev is
| also trying
| dvt wrote:
| AI future is clearly local, and my recent pitch has been
| "infinite tokens." Because that's what my M1 MBP can do; and
| that's what my RTX3090 can do. I don't need to pay hundreds of
| dollars a month and no one else does either.
| ip26 wrote:
| Infinite tokens rate-limited to 10 tok/s is 26MTok per month.
| doctorpangloss wrote:
| 10? think closer to 5. 13M is like ~7 codex sessions...
| connectsnk wrote:
| Do we know what is the underlying model? Is it a custome model
| developed by Apple or one of gemma/deepseeks under the hood
| jacobr1 wrote:
| The new siri models will be some variant of the gemini models.
| This framework seems to be more generalized than that though.
| ankit219 wrote:
| they are also working on activations (w4a8, w4a16 from what i
| know). if they deliver (and a big if), it means that given their
| market reach, they can dictate the way sub 100b parameter models
| are trained and served to a large extent, given their major
| usecase would be on device (macos and not ios for most of them).
| scosman wrote:
| Free server-size model access for apps with <2M downloads,
| getting the same privacy guarantees. Hopefully they scale this up
| to all apps in time (I assume hardware/cost constrained, but
| larger devs would pay).
|
| https://developer.apple.com/private-cloud-compute/
| JV00 wrote:
| Does it mean I can run whatever I want on ANE? Last time I tried
| it seemed it could only be used by first party features such as
| Face ID
___________________________________________________________________
(page generated 2026-06-09 06:00 UTC)